AI Integration Skill

Comprehensive patterns for integrating the Anthropic Claude API into production systems — from basic API calls to full multi-agent orchestration with state management, memory, and evaluation.

When to Use This Skill

Activate when:

- Building a Claude API integration or wrapper
Implementing structured outputs, tool calling, or streaming
Setting up multi-provider LLM routing (LiteLLM, fallbacks)
Designing multi-agent orchestration or agentic loops
Implementing RAG or persistent agent memory
Evaluating LLM output quality or building evals
Deploying agents to Next.js, Python FastAPI, or Docker

Don't use this skill for:

- Kubernetes/Terraform config unrelated to AI infra
General React/Next.js features not involving LLM calls

Core Principles

1. Single Agent vs Multi-Agent

Pattern	When to Use	Cost
Single agent	Linear tasks, simple I/O, <5 steps	Low
Subagent delegation

Infrastructure math (2026): Multi-agent compute costs jump ~3x when moving from single to orchestrated swarms. Budget before you build.

2. Agent Communication Patterns

Hub-and-spoke (most common): Orchestrator delegates to specialist agents.
CODEBLOCK0

Pipeline: Output of one agent is input to next (linear, predictable).

Swarm: Agents with shared memory, no single orchestrator. Use for exploration tasks.

3. Context Window Management

CODEBLOCK1

Structured Outputs

Pydantic binding with `instructor` (recommended)

CODEBLOCK2

Schema enforcement without instructor (TypeScript)

CODEBLOCK3

Tool Calling (Function Calling)

Parallel tool calls + agentic loop (TypeScript)

CODEBLOCK4

Streaming Responses

Python streaming

CODEBLOCK5

TypeScript streaming

CODEBLOCK6

Next.js SSE streaming route

// app/api/chat/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextRequest } from "next/server";

const client = new Anthropic();

export async function POST(req: NextRequest) {
  const { messages } = await req.json();
  const stream = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    stream: true,
    messages,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
          controller.enqueue(encoder.encode(chunk.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Multi-Provider Routing (LiteLLM)

CODEBLOCK8

Prompt Versioning

CODEBLOCK9

Multi-Agent Orchestration

Orchestrator pattern (Python)

import anthropic, asyncio

client = anthropic.Anthropic()

AGENTS = {
    "planner":     "Break this task into subtasks. Output JSON: {\"research_tasks\": [], \"code_tasks\": []}",
    "researcher":  "Research the provided topics. Be concise and factual.",
    "coder":       "Write clean, tested Python code for the provided specs.",
    "synthesizer": "Combine these results into a final cohesive answer.",
}

def call_agent(role: str, content: str, model: str = "claude-sonnet-4-6") -> str:
    resp = client.messages.create(
        model=model,
        max_tokens=2048,
        system=AGENTS[role],
        messages=[{"role": "user", "content": content}],
    )
    return resp.content[0].text

async def orchestrate(task: str) -> str:
    """Hub-and-spoke orchestrator: plan → parallel execute → synthesize."""
    import json
    plan = json.loads(call_agent("planner", task))

    research, code = await asyncio.gather(
        asyncio.to_thread(call_agent, "researcher", str(plan.get("research_tasks", []))),
        asyncio.to_thread(call_agent, "coder",      str(plan.get("code_tasks", []))),
    )
    return call_agent("synthesizer", f"Research:\n{research}\n\nCode:\n{code}")

Agent Memory Patterns

Medium-term: SQLite (cross-session)

CODEBLOCK11

Long-term: Vector DB (semantic search / RAG)

from qdrant_client import QdrantClient
import anthropic

qdrant = QdrantClient(":memory:")
claude = anthropic.Anthropic()

def rag_query(query: str, context_collection: str = "memory") -> str:
    hits = qdrant.search(collection_name=context_collection,
                         query_vector=get_embedding(query), limit=5)
    context = "\n".join(h.payload["text"] for h in hits)
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=f"Answer using this context:\n{context}",
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

LLM Evaluation

Basic eval harness

CODEBLOCK13

LLM-as-judge

import json

def llm_judge(question: str, answer: str, rubric: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5",
        messages=[{"role": "user", "content": f"""Rate this answer 1-5.

Question: {question}
Answer: {answer}
Rubric: {rubric}

Output JSON: {{"score": int, "reasoning": str}}"""}],
        max_tokens=256,
    )
    return json.loads(response.content[0].text)

Production Deployment

Error handling and retries

CODEBLOCK15

Cost tracking

def track_cost(response: anthropic.types.Message) -> float:
    PRICES = {
        "claude-opus-4-6":    (0.015, 0.075),   # (input, output) per 1k tokens
        "claude-sonnet-4-6":  (0.003, 0.015),
        "claude-haiku-4-5":   (0.00025, 0.00125),
    }
    model = response.model
    if model not in PRICES:
        return 0.0
    in_cost  = response.usage.input_tokens  / 1000 * PRICES[model][0]
    out_cost = response.usage.output_tokens / 1000 * PRICES[model][1]
    return in_cost + out_cost

Prompt Engineering

- Chain-of-thought: Prefix with Think step by step: or enumerate reasoning steps before the answer.
Output format pinning: Specify format in system prompt AND show a concrete example. Never rely on defaults for structured data.
Temperature: 0 = deterministic (evals, extraction) | 0.3-0.7 = balanced | 1.0 = creative/diverse.

Related Skills

- temporal-testing — test async agent workflows
INLINECODE6 — give agents web browsing capability
INLINECODE7 — build AI-powered Next.js UIs
INLINECODE8 — agent-driven data analysis pipelines
INLINECODE9 — trace and monitor LLM calls in production

AI集成技能

将Anthropic Claude API集成到生产系统中的全面模式——从基础API调用到具有状态管理、记忆和评估功能的完整多智能体编排。

何时使用此技能

在以下情况下激活：

- 构建Claude API集成或封装器
实现结构化输出、工具调用或流式传输
设置多提供商LLM路由（LiteLLM、回退机制）
设计多智能体编排或智能体循环
实现RAG或持久化智能体记忆
评估LLM输出质量或构建评估系统
将智能体部署到Next.js、Python FastAPI或Docker

不要将此技能用于：

- 与AI基础设施无关的Kubernetes/Terraform配置
不涉及LLM调用的通用React/Next.js功能

核心原则

1. 单智能体 vs 多智能体

模式	使用场景	成本
单智能体	线性任务、简单I/O、<5步	低
子智能体委派

基础设施成本计算（2026年）： 从单智能体转向编排群体时，多智能体计算成本约增加3倍。构建前先做好预算。

2. 智能体通信模式

中心辐射模式（最常见）： 编排器将任务委派给专业智能体。

编排器
├── 研究智能体（网络搜索、文档）
├── 编码智能体（代码生成、测试）
└── 审查智能体（质量、安全检查）

流水线模式： 一个智能体的输出是下一个智能体的输入（线性、可预测）。

群体模式： 共享记忆的智能体，无单一编排器。用于探索性任务。

3. 上下文窗口管理

python
import anthropic

client = anthropic.Anthropic()

def slidingwindow(messages: list[dict], maxtokens: int = 150_000) -> list[dict]:
丢弃最早的消息以保持在token预算内。
# 粗略估计：1 token ≈ 4个字符
while len(messages) > 2:
total = sum(len(m[content]) // 4 for m in messages)
if total <= max_tokens:
break
messages = messages[1:] # 丢弃最早的非系统消息
return messages

def summarize_history(messages: list[dict]) -> list[dict]:
将旧轮次压缩为摘要以回收上下文预算。
if len(messages) <= 4:
return messages
history = \n.join(f{m[role]}: {m[content]} for m in messages[:-2])
summary = client.messages.create(
model=claude-haiku-4-5, max_tokens=512,
messages=[{role: user, content: f简洁地总结：\n{history}}],
).content[0].text
return [{role: user, content: f[先前上下文]\n{summary}}] + messages[-2:]

结构化输出

使用instructor的Pydantic绑定（推荐）

python
import anthropic
import instructor
from pydantic import BaseModel

class Entity(BaseModel):
name: str
type: str # person | org | location | concept
description: str

class ExtractionResult(BaseModel):
entities: list[Entity]
summary: str

instructor修补客户端——返回经过验证的Pydantic模型

client = instructor.from_anthropic(anthropic.Anthropic())

result: ExtractionResult = client.messages.create(
model=claude-sonnet-4-6,
max_tokens=1024,
messages=[{role: user, content: f从以下文本中提取所有实体：\n{text}}],
response_model=ExtractionResult,
)
print(result.entities[0].name) # 完全类型化、已验证

不使用instructor的Schema强制（TypeScript）

typescript
import Anthropic from @anthropic-ai/sdk;
import { z } from zod;

const client = new Anthropic();

const EntitySchema = z.object({
entities: z.array(z.object({ name: z.string(), type: z.string() })),
summary: z.string(),
});

const response = await client.messages.create({
model: claude-sonnet-4-6,
max_tokens: 1024,
messages: [{
role: user,
content: 提取实体。仅使用与此schema匹配的有效JSON响应：
{entities: [{name: string, type: string}], summary: string}

文本：${inputText},
}],
});

const parsed = EntitySchema.parse(JSON.parse(response.content[0].text));

工具调用（函数调用）

并行工具调用 + 智能体循环（TypeScript）

typescript
import Anthropic from @anthropic-ai/sdk;

const client = new Anthropic();

const tools: Anthropic.Tool[] = [
{ name: search_web, description: 搜索网络,
input_schema: { type: object as const, properties: { query: { type: string } }, required: [query] } },
{ name: read_file, description: 读取本地文件,
input_schema: { type: object as const, properties: { path: { type: string } }, required: [path] } },
];

async function runAgentLoop(userMessage: string): Promise {
const messages: Anthropic.MessageParam[] = [{ role: user, content: userMessage }];

while (true) {
const response = await client.messages.create({
model: claude-sonnet-4-6, max_tokens: 4096,
tools, toolchoice: { type: auto }, // 或 { type: tool, name: searchweb }
messages,
});

if (response.stopreason === endturn) {
return response.content.filter((b) => b.type === text).map((b) => b.text).join();
}

// Claude可能并行调用多个工具——一次性处理所有
const toolUses = response.content.filter((b) => b.type === tool_use);
const toolResults = await Promise.all(
toolUses.map(async (tu) => ({
type: tool_result as const,
tooluseid: (tu as Anthropic.ToolUseBlock).id,
content: await executeTool((tu as Anthropic.ToolUseBlock).name, (tu as Anthropic.ToolUseBlock).input),
}))
);

messages.push({ role: assistant, content: response.content });
messages.push({ role: user, content: toolResults });
}
}

流式响应

Python流式传输

python import anthropic

client = anthropic.Anthropic()

流式文本

with client.messages.stream( model=claude-sonnet-4-6, max_tokens=1024, messages=[{role: user, content: prompt}], ) as stream: for text in stream.text_stream: print(text, end=, flush=True) final = stream.getfinalmessage() print(f\n[{final.usage.inputtokens} 输入 / {final.usage.outputtokens} 输出 tokens])

TypeScript流式传输

typescript import Anthropic from @anthropic-ai/sdk;

const client = new Anthropic();

const stream = await client.messages.create({
model: claude-opus-4-6,
max_tokens: 4096,
stream: true,
messages: [{ role: user, content: prompt }],
});

for await (const chunk of stream) {
if (chunk.type === contentblockdelta && chunk.delta.type === text_delta) {
process.stdout.write(chunk.delta.text);
}
}

Next.js SSE流式路由

typescript // app/api/chat/route.ts import Anthropic from @anthropic-ai/sdk; import { NextRequest } from next/server;

const client = new Anthropic();

export async function POST(req: NextRequest) {
const { messages } = await req.json();
const stream = await client.messages.create({
model: claude-sonnet-4-6,
max_tokens: 2048,
stream: true,
messages,
});

const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type

ai-integrationAI集成

ai-integration

AI Integration Skill

When to Use This Skill

Core Principles

1. Single Agent vs Multi-Agent

2. Agent Communication Patterns

3. Context Window Management

Structured Outputs

Pydantic binding with instructor (recommended)

Schema enforcement without instructor (TypeScript)

Tool Calling (Function Calling)

Parallel tool calls + agentic loop (TypeScript)

Streaming Responses

Python streaming

TypeScript streaming

Next.js SSE streaming route

Multi-Provider Routing (LiteLLM)

Prompt Versioning

Multi-Agent Orchestration

Orchestrator pattern (Python)

Agent Memory Patterns

Medium-term: SQLite (cross-session)

Long-term: Vector DB (semantic search / RAG)

LLM Evaluation

Basic eval harness

LLM-as-judge

Production Deployment

Error handling and retries

Cost tracking

Prompt Engineering

Related Skills

AI集成技能

何时使用此技能

核心原则

1. 单智能体 vs 多智能体

2. 智能体通信模式

3. 上下文窗口管理

结构化输出

使用instructor的Pydantic绑定（推荐）

instructor修补客户端——返回经过验证的Pydantic模型

不使用instructor的Schema强制（TypeScript）

工具调用（函数调用）

并行工具调用 + 智能体循环（TypeScript）

流式响应

Python流式传输

流式文本

TypeScript流式传输

Next.js SSE流式路由

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Pydantic binding with `instructor` (recommended)