When to Use
User wants to convert text/images to vectors, build semantic search, or integrate embeddings into applications.
Quick Reference
| Topic | File |
|---|
| Provider comparison & selection | INLINECODE0 |
| Chunking strategies & code |
chunking.md |
| Vector database patterns |
storage.md |
| Search & retrieval tuning |
search.md |
Core Capabilities
- 1. Generate embeddings — Call provider APIs (OpenAI, Cohere, Voyage, local models)
- Chunk content — Split documents with overlap, semantic boundaries, token limits
- Store vectors — Insert into Pinecone, Weaviate, Qdrant, pgvector, Chroma
- Similarity search — Query with top-k, filters, hybrid search
- Batch processing — Handle large datasets with rate limiting and retries
- Model comparison — Evaluate embedding quality for specific use cases
Decision Checklist
Before recommending approach, ask:
- - [ ] What content type? (text, code, images, multimodal)
- [ ] Volume and update frequency?
- [ ] Latency requirements? (real-time vs batch)
- [ ] Budget constraints? (API costs vs self-hosted)
- [ ] Existing infrastructure? (cloud provider, database)
Critical Rules
- - Same model everywhere — Query embeddings MUST use identical model as document embeddings
- Normalize before storage — Most similarity metrics assume unit vectors
- Chunk with overlap — 10-20% overlap prevents context loss at boundaries
- Batch API calls — Never embed one item at a time in production
- Cache embeddings — Regenerating is expensive; store with source hash
- Monitor dimensions — Higher isn't always better; 768-1536 is usually optimal
Provider Quick Selection
| Need | Provider | Why |
|---|
| Best quality, any cost | OpenAI INLINECODE4 | Top benchmarks |
| Cost-sensitive |
OpenAI
text-embedding-3-small | 5x cheaper, 80% quality |
| Multilingual | Cohere
embed-multilingual-v3 | 100+ languages |
| Code/technical | Voyage
voyage-code-2 | Optimized for code |
| Privacy/offline | Local (e5, bge, nomic) | No data leaves machine |
| Images | OpenAI CLIP, Cohere multimodal | Cross-modal search |
Common Patterns
CODEBLOCK0
何时使用
用户希望将文本/图像转换为向量、构建语义搜索或将嵌入集成到应用程序中。
快速参考
| 主题 | 文件 |
|---|
| 提供商对比与选择 | providers.md |
| 分块策略与代码 |
chunking.md |
| 向量数据库模式 | storage.md |
| 搜索与检索调优 | search.md |
核心能力
- 1. 生成嵌入 — 调用提供商API(OpenAI、Cohere、Voyage、本地模型)
- 内容分块 — 使用重叠、语义边界、令牌限制拆分文档
- 存储向量 — 插入到Pinecone、Weaviate、Qdrant、pgvector、Chroma
- 相似度搜索 — 使用top-k、过滤器、混合搜索进行查询
- 批量处理 — 处理大数据集,支持速率限制和重试
- 模型对比 — 评估特定用例的嵌入质量
决策清单
在推荐方法前,请确认:
- - [ ] 内容类型?(文本、代码、图像、多模态)
- [ ] 数据量和更新频率?
- [ ] 延迟要求?(实时 vs 批量)
- [ ] 预算限制?(API成本 vs 自托管)
- [ ] 现有基础设施?(云提供商、数据库)
关键规则
- - 全流程使用同一模型 — 查询嵌入必须使用与文档嵌入完全相同的模型
- 存储前归一化 — 大多数相似度指标假设为单位向量
- 分块时保留重叠 — 10-20%的重叠可防止边界处上下文丢失
- 批量调用API — 生产环境中切勿逐条嵌入
- 缓存嵌入 — 重新生成成本高昂;使用源哈希存储
- 监控维度 — 并非越高越好;768-1536通常为最优
提供商快速选择
| 需求 | 提供商 | 原因 |
|---|
| 最佳质量,不限成本 | OpenAI text-embedding-3-large | 基准测试顶尖 |
| 成本敏感 |
OpenAI text-embedding-3-small | 便宜5倍,质量达80% |
| 多语言 | Cohere embed-multilingual-v3 | 支持100+语言 |
| 代码/技术类 | Voyage voyage-code-2 | 针对代码优化 |
| 隐私/离线 | 本地(e5、bge、nomic) | 数据不离开机器 |
| 图像 | OpenAI CLIP、Cohere多模态 | 跨模态搜索 |
常见模式
python
带重试的批量嵌入
def embed_batch(texts, model=text-embedding-3-small):
results = []
for chunk in batched(texts, 100): # API限制
response = client.embeddings.create(input=chunk, model=model)
results.extend([e.embedding for e in response.data])
return results
带过滤器的相似度搜索
results = index.query(
vector=query_embedding,
top_k=10,
filter={category: technical},
include_metadata=True
)