LangGraph Architecture Decisions

When to Use LangGraph

Use LangGraph When You Need:

- Stateful conversations - Multi-turn interactions with memory
Human-in-the-loop - Approval gates, corrections, interventions
Complex control flow - Loops, branches, conditional routing
Multi-agent coordination - Multiple LLMs working together
Persistence - Resume from checkpoints, time travel debugging
Streaming - Real-time token streaming, progress updates
Reliability - Retries, error recovery, durability guarantees

Consider Alternatives When:

Scenario	Alternative	Why
Single LLM call	Direct API call	Overhead not justified
Linear pipeline

State Schema Decisions

TypedDict vs Pydantic

TypedDict	Pydantic
Lightweight, faster	Runtime validation
Dict-like access

Recommendation: Use TypedDict for most cases. Use Pydantic when you need validation or complex nested structures.

Reducer Selection

Use Case	Reducer	Example
Chat messages	INLINECODE0	Handles IDs, RemoveMessage
Simple append

State Size Considerations

CODEBLOCK0

Graph Structure Decisions

Single Graph vs Subgraphs

Single Graph when:

- All nodes share the same state schema
Simple linear or branching flow
< 10 nodes

Subgraphs when:

- Different state schemas needed
Reusable components across graphs
Team separation of concerns
Complex hierarchical workflows

Conditional Edges vs Command

Conditional Edges	Command
Routing based on state	Routing + state update
Separate router function

CODEBLOCK1

Static vs Dynamic Routing

Static Edges (add_edge):

- Fixed flow known at build time
Clearer graph visualization
Easier to reason about

Dynamic Routing (add_conditional_edges, Command, Send):

- Runtime decisions based on state
Agent-driven navigation
Fan-out patterns

Persistence Strategy

Checkpointer Selection

Checkpointer	Use Case	Characteristics
INLINECODE10	Testing only	Lost on restart
INLINECODE11

Checkpointing Scope

CODEBLOCK2

When to Disable Checkpointing

- Short-lived subgraphs that should be atomic
Subgraphs with incompatible state schemas
Performance-critical paths without need for resume

Multi-Agent Architecture

Supervisor Pattern

Best for:

- Clear hierarchy
Centralized decision making
Different agent specializations

CODEBLOCK3

Peer-to-Peer Pattern

Best for:

- Collaborative agents
No clear hierarchy
Flexible communication

CODEBLOCK4

Handoff Pattern

Best for:

- Sequential specialization
Clear stage transitions
Different capabilities per stage

CODEBLOCK5

Streaming Strategy

Stream Mode Selection

Mode	Use Case	Data
INLINECODE13	UI updates	Node outputs only
INLINECODE14

Subgraph Streaming

CODEBLOCK6

Human-in-the-Loop Design

Interrupt Placement

Strategy	Use Case
INLINECODE18	Approval before action
INLINECODE19

Review after completion | | interrupt() in node | Dynamic, contextual pauses |

Resume Patterns

CODEBLOCK7

Error Handling Strategy

Retry Configuration

CODEBLOCK8

Fallback Patterns

CODEBLOCK9

Scaling Considerations

Horizontal Scaling

- Use PostgresSaver for shared state
Consider LangGraph Platform for managed infrastructure
Use stores for large data outside checkpoints

Performance Optimization

1. Minimize state size - Use references for large data
Parallel nodes - Fan out when possible
Cache expensive operations - Use CachePolicy
Async everywhere - Use ainvoke, astream

Resource Limits

CODEBLOCK10

Decision Checklist

Before implementing:

1. [ ] Is LangGraph the right tool? (vs simpler alternatives)
[ ] State schema defined with appropriate reducers?
[ ] Persistence strategy chosen? (dev vs prod checkpointer)
[ ] Streaming needs identified?
[ ] Human-in-the-loop points defined?
[ ] Error handling and retry strategy?
[ ] Multi-agent coordination pattern? (if applicable)
[ ] Resource limits configured?

LangGraph架构决策

何时使用LangGraph

在以下情况下使用LangGraph：

- 有状态对话 - 具有记忆的多轮交互
人在回路中 - 审批关卡、修正、干预
复杂控制流 - 循环、分支、条件路由
多智能体协调 - 多个LLM协同工作
持久化 - 从检查点恢复、时间旅行调试
流式传输 - 实时令牌流式传输、进度更新
可靠性 - 重试、错误恢复、持久性保证

考虑替代方案的情况：

场景	替代方案	原因
单次LLM调用	直接API调用	开销不合理
线性流水线

状态模式决策

TypedDict vs Pydantic

TypedDict	Pydantic
轻量级，更快	运行时验证
字典式访问

建议：大多数情况下使用TypedDict。需要验证或复杂嵌套结构时使用Pydantic。

归约器选择

用例	归约器	示例
聊天消息	add_messages	处理ID、RemoveMessage
简单追加

状态大小考虑

python

小状态（< 1MB）- 放入状态

class State(TypedDict):
messages: Annotated[list, add_messages]
context: str

大数据 - 使用存储

class State(TypedDict): messages: Annotated[list, add_messages] document_ref: str # 存储引用

def node(state, *, store: BaseStore):
doc = store.get(namespace, state[document_ref])
# 处理而不使检查点膨胀

图结构决策

单图 vs 子图

单图适用于：

- 所有节点共享相同状态模式
简单线性或分支流程
节点数 < 10

子图适用于：

- 需要不同状态模式
跨图可复用组件
团队关注点分离
复杂分层工作流

条件边 vs 命令

条件边	命令
基于状态的路由	路由 + 状态更新
独立路由函数

python

条件边 - 当路由是重点时

def router(state) -> Literal[a, b]:
return a if condition else b
builder.addconditionaledges(node, router)

命令 - 当结合路由与更新时

def node(state) -> Command: return Command(goto=next, update={step: state[step] + 1})

静态 vs 动态路由

静态边（add_edge）：

- 构建时已知的固定流程
更清晰的图可视化
更容易推理

动态路由（addconditionaledges、Command、Send）：

- 基于状态的运行时决策
智能体驱动的导航
扇出模式

持久化策略

检查点选择

检查点	用例	特性
InMemorySaver	仅测试	重启后丢失
SqliteSaver

检查点范围

python

完全持久化（默认）

graph = builder.compile(checkpointer=checkpointer)

子图选项

subgraph = sub_builder.compile( checkpointer=None, # 从父图继承 checkpointer=True, # 独立检查点 checkpointer=False, # 无检查点（原子运行） )

何时禁用检查点

- 应原子运行的短生命周期子图
状态模式不兼容的子图
无需恢复的性能关键路径

多智能体架构

监督者模式

最适合：

- 清晰层级
集中决策
不同智能体专业化

┌─────────────┐
│ 监督者 │
└──────┬──────┘
┌────────┬───┴───┬────────┐
▼ ▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│智能体1│ │智能体2│ │智能体3│ │智能体4│
└──────┘ └──────┘ └──────┘ └──────┘

对等模式

最适合：

- 协作智能体
无清晰层级
灵活通信

┌──────┐ ┌──────┐
│智能体1│◄───►│智能体2│
└──┬───┘ └───┬──┘
│ │
▼ ▼
┌──────┐ ┌──────┐
│智能体3│◄───►│智能体4│
└──────┘ └──────┘

交接模式

最适合：

- 顺序专业化
清晰阶段转换
每阶段不同能力

┌────────┐ ┌────────┐ ┌────────┐
│研究 │───►│规划 │───►│执行 │
└────────┘ └────────┘ └────────┘

流式传输策略

流模式选择

模式	用例	数据
updates	UI更新	仅节点输出
values

子图流式传输

python

从子图流式传输

async for chunk in graph.astream(
input,
stream_mode=updates,
subgraphs=True # 包含子图事件
):
namespace, data = chunk # namespace表示深度

人在回路中设计

中断位置

策略	用例
interruptbefore	操作前审批
interruptafter

完成后审查 | | 节点内interrupt() | 动态、上下文暂停 |

恢复模式

python

简单恢复（同一线程）

graph.invoke(None, config)

带值恢复

graph.invoke(Command(resume=approved), config)

恢复特定中断

graph.invoke(Command(resume={interrupt_id: value}), config)

修改状态并恢复

graph.updatestate(config, {field: newvalue}) graph.invoke(None, config)

错误处理策略

重试配置

python

每节点重试

RetryPolicy(
initial_interval=0.5,
backoff_factor=2.0,
max_interval=60.0,
max_attempts=3,
retry_on=lambda e: isinstance(e, (APIError, TimeoutError))
)

多个策略（首个匹配获胜）

builder.addnode(node, fn, retrypolicy=[ RetryPolicy(retryon=RateLimitError, maxattempts=5), RetryPolicy(retryon=Exception, maxattempts=2), ])

回退模式

python
def nodewithfallback(state):
try:
return primary_operation(state)
except PrimaryError:
return fallback_operation(state)

或使用条件边进行复杂回退路由

def routeonerror

langgraph-architectureLangGraph架构指南