AI Integration Skill
Comprehensive patterns for integrating the Anthropic Claude API into production systems — from basic API calls to full multi-agent orchestration with state management, memory, and evaluation.
When to Use This Skill
Activate when:
- - Building a Claude API integration or wrapper
- Implementing structured outputs, tool calling, or streaming
- Setting up multi-provider LLM routing (LiteLLM, fallbacks)
- Designing multi-agent orchestration or agentic loops
- Implementing RAG or persistent agent memory
- Evaluating LLM output quality or building evals
- Deploying agents to Next.js, Python FastAPI, or Docker
Don't use this skill for:
- - Kubernetes/Terraform config unrelated to AI infra
- General React/Next.js features not involving LLM calls
Core Principles
1. Single Agent vs Multi-Agent
| Pattern | When to Use | Cost |
|---|
| Single agent | Linear tasks, simple I/O, <5 steps | Low |
| Subagent delegation |
Parallel tasks, specialized expertise needed | Medium |
|
Multi-agent swarm | Complex autonomous workflows, >10 steps | High — budget like a team |
Infrastructure math (2026): Multi-agent compute costs jump ~3x when moving from single to orchestrated swarms. Budget before you build.
2. Agent Communication Patterns
Hub-and-spoke (most common): Orchestrator delegates to specialist agents.
CODEBLOCK0
Pipeline: Output of one agent is input to next (linear, predictable).
Swarm: Agents with shared memory, no single orchestrator. Use for exploration tasks.
3. Context Window Management
CODEBLOCK1
Structured Outputs
Pydantic binding with instructor (recommended)
CODEBLOCK2
Schema enforcement without instructor (TypeScript)
CODEBLOCK3
Tool Calling (Function Calling)
Parallel tool calls + agentic loop (TypeScript)
CODEBLOCK4
Streaming Responses
Python streaming
CODEBLOCK5
TypeScript streaming
CODEBLOCK6
Next.js SSE streaming route
// app/api/chat/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextRequest } from "next/server";
const client = new Anthropic();
export async function POST(req: NextRequest) {
const { messages } = await req.json();
const stream = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 2048,
stream: true,
messages,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type === "content_block_delta" && chunk.delta.type === "text_delta") {
controller.enqueue(encoder.encode(chunk.delta.text));
}
}
controller.close();
},
});
return new Response(readable, {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
}
Multi-Provider Routing (LiteLLM)
CODEBLOCK8
Prompt Versioning
CODEBLOCK9
Multi-Agent Orchestration
Orchestrator pattern (Python)
import anthropic, asyncio
client = anthropic.Anthropic()
AGENTS = {
"planner": "Break this task into subtasks. Output JSON: {\"research_tasks\": [], \"code_tasks\": []}",
"researcher": "Research the provided topics. Be concise and factual.",
"coder": "Write clean, tested Python code for the provided specs.",
"synthesizer": "Combine these results into a final cohesive answer.",
}
def call_agent(role: str, content: str, model: str = "claude-sonnet-4-6") -> str:
resp = client.messages.create(
model=model,
max_tokens=2048,
system=AGENTS[role],
messages=[{"role": "user", "content": content}],
)
return resp.content[0].text
async def orchestrate(task: str) -> str:
"""Hub-and-spoke orchestrator: plan → parallel execute → synthesize."""
import json
plan = json.loads(call_agent("planner", task))
research, code = await asyncio.gather(
asyncio.to_thread(call_agent, "researcher", str(plan.get("research_tasks", []))),
asyncio.to_thread(call_agent, "coder", str(plan.get("code_tasks", []))),
)
return call_agent("synthesizer", f"Research:\n{research}\n\nCode:\n{code}")
Agent Memory Patterns
Medium-term: SQLite (cross-session)
CODEBLOCK11
Long-term: Vector DB (semantic search / RAG)
from qdrant_client import QdrantClient
import anthropic
qdrant = QdrantClient(":memory:")
claude = anthropic.Anthropic()
def rag_query(query: str, context_collection: str = "memory") -> str:
hits = qdrant.search(collection_name=context_collection,
query_vector=get_embedding(query), limit=5)
context = "\n".join(h.payload["text"] for h in hits)
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=f"Answer using this context:\n{context}",
messages=[{"role": "user", "content": query}],
)
return response.content[0].text
LLM Evaluation
Basic eval harness
CODEBLOCK13
LLM-as-judge
import json
def llm_judge(question: str, answer: str, rubric: str) -> dict:
response = client.messages.create(
model="claude-haiku-4-5",
messages=[{"role": "user", "content": f"""Rate this answer 1-5.
Question: {question}
Answer: {answer}
Rubric: {rubric}
Output JSON: {{"score": int, "reasoning": str}}"""}],
max_tokens=256,
)
return json.loads(response.content[0].text)
Production Deployment
Error handling and retries
CODEBLOCK15
Cost tracking
def track_cost(response: anthropic.types.Message) -> float:
PRICES = {
"claude-opus-4-6": (0.015, 0.075), # (input, output) per 1k tokens
"claude-sonnet-4-6": (0.003, 0.015),
"claude-haiku-4-5": (0.00025, 0.00125),
}
model = response.model
if model not in PRICES:
return 0.0
in_cost = response.usage.input_tokens / 1000 * PRICES[model][0]
out_cost = response.usage.output_tokens / 1000 * PRICES[model][1]
return in_cost + out_cost
Prompt Engineering
- - Chain-of-thought: Prefix with
Think step by step: or enumerate reasoning steps before the answer. - Output format pinning: Specify format in system prompt AND show a concrete example. Never rely on defaults for structured data.
- Temperature:
0 = deterministic (evals, extraction) | 0.3-0.7 = balanced | 1.0 = creative/diverse.
Related Skills
- -
temporal-testing — test async agent workflows - INLINECODE6 — give agents web browsing capability
- INLINECODE7 — build AI-powered Next.js UIs
- INLINECODE8 — agent-driven data analysis pipelines
- INLINECODE9 — trace and monitor LLM calls in production
AI集成技能
将Anthropic Claude API集成到生产系统中的全面模式——从基础API调用到具有状态管理、记忆和评估功能的完整多智能体编排。
何时使用此技能
在以下情况下激活:
- - 构建Claude API集成或封装器
- 实现结构化输出、工具调用或流式传输
- 设置多提供商LLM路由(LiteLLM、回退机制)
- 设计多智能体编排或智能体循环
- 实现RAG或持久化智能体记忆
- 评估LLM输出质量或构建评估系统
- 将智能体部署到Next.js、Python FastAPI或Docker
不要将此技能用于:
- - 与AI基础设施无关的Kubernetes/Terraform配置
- 不涉及LLM调用的通用React/Next.js功能
核心原则
1. 单智能体 vs 多智能体
| 模式 | 使用场景 | 成本 |
|---|
| 单智能体 | 线性任务、简单I/O、<5步 | 低 |
| 子智能体委派 |
并行任务、需要专业领域知识 | 中 |
|
多智能体群体 | 复杂自主工作流、>10步 | 高——按团队预算 |
基础设施成本计算(2026年): 从单智能体转向编排群体时,多智能体计算成本约增加3倍。构建前先做好预算。
2. 智能体通信模式
中心辐射模式(最常见): 编排器将任务委派给专业智能体。
编排器
├── 研究智能体 (网络搜索、文档)
├── 编码智能体 (代码生成、测试)
└── 审查智能体 (质量、安全检查)
流水线模式: 一个智能体的输出是下一个智能体的输入(线性、可预测)。
群体模式: 共享记忆的智能体,无单一编排器。用于探索性任务。
3. 上下文窗口管理
python
import anthropic
client = anthropic.Anthropic()
def slidingwindow(messages: list[dict], maxtokens: int = 150_000) -> list[dict]:
丢弃最早的消息以保持在token预算内。
# 粗略估计:1 token ≈ 4个字符
while len(messages) > 2:
total = sum(len(m[content]) // 4 for m in messages)
if total <= max_tokens:
break
messages = messages[1:] # 丢弃最早的非系统消息
return messages
def summarize_history(messages: list[dict]) -> list[dict]:
将旧轮次压缩为摘要以回收上下文预算。
if len(messages) <= 4:
return messages
history = \n.join(f{m[role]}: {m[content]} for m in messages[:-2])
summary = client.messages.create(
model=claude-haiku-4-5, max_tokens=512,
messages=[{role: user, content: f简洁地总结:\n{history}}],
).content[0].text
return [{role: user, content: f[先前上下文]\n{summary}}] + messages[-2:]
结构化输出
使用instructor的Pydantic绑定(推荐)
python
import anthropic
import instructor
from pydantic import BaseModel
class Entity(BaseModel):
name: str
type: str # person | org | location | concept
description: str
class ExtractionResult(BaseModel):
entities: list[Entity]
summary: str
instructor修补客户端——返回经过验证的Pydantic模型
client = instructor.from_anthropic(anthropic.Anthropic())
result: ExtractionResult = client.messages.create(
model=claude-sonnet-4-6,
max_tokens=1024,
messages=[{role: user, content: f从以下文本中提取所有实体:\n{text}}],
response_model=ExtractionResult,
)
print(result.entities[0].name) # 完全类型化、已验证
不使用instructor的Schema强制(TypeScript)
typescript
import Anthropic from @anthropic-ai/sdk;
import { z } from zod;
const client = new Anthropic();
const EntitySchema = z.object({
entities: z.array(z.object({ name: z.string(), type: z.string() })),
summary: z.string(),
});
const response = await client.messages.create({
model: claude-sonnet-4-6,
max_tokens: 1024,
messages: [{
role: user,
content: 提取实体。仅使用与此schema匹配的有效JSON响应:
{entities: [{name: string, type: string}], summary: string}
文本:${inputText},
}],
});
const parsed = EntitySchema.parse(JSON.parse(response.content[0].text));
工具调用(函数调用)
并行工具调用 + 智能体循环(TypeScript)
typescript
import Anthropic from @anthropic-ai/sdk;
const client = new Anthropic();
const tools: Anthropic.Tool[] = [
{ name: search_web, description: 搜索网络,
input_schema: { type: object as const, properties: { query: { type: string } }, required: [query] } },
{ name: read_file, description: 读取本地文件,
input_schema: { type: object as const, properties: { path: { type: string } }, required: [path] } },
];
async function runAgentLoop(userMessage: string): Promise {
const messages: Anthropic.MessageParam[] = [{ role: user, content: userMessage }];
while (true) {
const response = await client.messages.create({
model: claude-sonnet-4-6, max_tokens: 4096,
tools, toolchoice: { type: auto }, // 或 { type: tool, name: searchweb }
messages,
});
if (response.stopreason === endturn) {
return response.content.filter((b) => b.type === text).map((b) => b.text).join();
}
// Claude可能并行调用多个工具——一次性处理所有
const toolUses = response.content.filter((b) => b.type === tool_use);
const toolResults = await Promise.all(
toolUses.map(async (tu) => ({
type: tool_result as const,
tooluseid: (tu as Anthropic.ToolUseBlock).id,
content: await executeTool((tu as Anthropic.ToolUseBlock).name, (tu as Anthropic.ToolUseBlock).input),
}))
);
messages.push({ role: assistant, content: response.content });
messages.push({ role: user, content: toolResults });
}
}
流式响应
Python流式传输
python
import anthropic
client = anthropic.Anthropic()
流式文本
with client.messages.stream(
model=claude-sonnet-4-6,
max_tokens=1024,
messages=[{role: user, content: prompt}],
) as stream:
for text in stream.text_stream:
print(text, end=, flush=True)
final = stream.get
finalmessage()
print(f\n[{final.usage.input
tokens} 输入 / {final.usage.outputtokens} 输出 tokens])
TypeScript流式传输
typescript
import Anthropic from @anthropic-ai/sdk;
const client = new Anthropic();
const stream = await client.messages.create({
model: claude-opus-4-6,
max_tokens: 4096,
stream: true,
messages: [{ role: user, content: prompt }],
});
for await (const chunk of stream) {
if (chunk.type === contentblockdelta && chunk.delta.type === text_delta) {
process.stdout.write(chunk.delta.text);
}
}
Next.js SSE流式路由
typescript
// app/api/chat/route.ts
import Anthropic from @anthropic-ai/sdk;
import { NextRequest } from next/server;
const client = new Anthropic();
export async function POST(req: NextRequest) {
const { messages } = await req.json();
const stream = await client.messages.create({
model: claude-sonnet-4-6,
max_tokens: 2048,
stream: true,
messages,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type