Semantic Cache
Cache LLM responses by meaning using Redis vector search. Similar questions return cached answers instantly instead of making expensive API calls.
How It Works
- 1. User asks a question or makes an LLM request
- The question is embedded into a vector using OpenAI text-embedding-3-small
- Redis vector search finds semantically similar cached queries (cosine similarity > 0.80)
- Cache hit: Return the cached response instantly (~100ms)
- Cache miss: Pass through to the LLM, cache the response for future similar queries
Commands
Cache a query and response
CODEBLOCK0
Check cache for a similar query
CODEBLOCK1
Cache stats
CODEBLOCK2
Clear all cached entries
CODEBLOCK3
Interactive mode — wraps any LLM call with caching
node scripts/cache.js query "Your question here"
This checks cache first. On miss, calls OpenAI, caches the result, and returns it.
When to Use This Skill
- - Before making any LLM API call, check if a semantically similar query was already answered
- When building agents that answer repetitive questions (support bots, FAQ systems)
- When you want to reduce OpenAI/Anthropic API costs by 40-80%
- When you need faster response times for common queries
Configuration
Set these environment variables:
- -
REDIS_URL — Redis connection string with vector search support (Redis Cloud or Redis Stack) - INLINECODE1 — For generating embeddings
- INLINECODE2 — Similarity threshold 0-1 (default: 0.80, higher = stricter matching)
- INLINECODE3 — Cache TTL in seconds (default: 86400 = 24 hours)
Example Workflow
CODEBLOCK5
Performance
- - Cache lookup: ~5-15ms (vs 1-5 seconds for LLM call)
- Embedding generation: ~50-100ms
- Storage per entry: ~6KB (1536-dim vector + metadata)
- Supports millions of cached entries
语义缓存
使用Redis向量搜索按语义缓存LLM响应。相似的问题可直接返回缓存的答案,无需进行昂贵的API调用。
工作原理
- 1. 用户提出问题或发起LLM请求
- 使用OpenAI text-embedding-3-small将问题嵌入为向量
- Redis向量搜索查找语义相似的已缓存查询(余弦相似度 > 0.80)
- 缓存命中:立即返回缓存的响应(约100毫秒)
- 缓存未命中:传递至LLM,缓存响应以供未来相似查询使用
命令
缓存查询和响应
bash
node scripts/cache.js store 法国的首都是什么? 法国的首都是巴黎。
检查缓存的相似查询
bash
node scripts/cache.js lookup 法国首都是哪个城市?
缓存统计
bash
node scripts/cache.js stats
清除所有缓存条目
bash
node scripts/cache.js clear
交互模式 — 为任何LLM调用添加缓存
bash
node scripts/cache.js query 在此输入您的问题
此命令首先检查缓存。若未命中,则调用OpenAI,缓存结果并返回。
何时使用此技能
- - 在进行任何LLM API调用之前,检查是否已有语义相似的查询被回答过
- 构建回答重复性问题的智能体时(支持机器人、常见问题解答系统)
- 希望将OpenAI/Anthropic API成本降低40-80%时
- 需要为常见查询提供更快的响应时间时
配置
设置以下环境变量:
- - REDISURL — 支持向量搜索的Redis连接字符串(Redis Cloud或Redis Stack)
- OPENAIAPIKEY — 用于生成嵌入向量
- SEMANTICCACHETHRESHOLD — 相似度阈值0-1(默认:0.80,值越高匹配越严格)
- SEMANTICCACHE_TTL — 缓存生存时间(秒)(默认:86400 = 24小时)
示例工作流程
用户:如何重置密码?
-> 嵌入查询 -> 搜索Redis -> 未命中
-> 调用LLM -> 获取响应 -> 缓存响应 -> 返回响应
用户:我忘记密码了,如何修改?
-> 嵌入查询 -> 搜索Redis -> 命中(相似度92.7%)
-> 在8毫秒内返回缓存的响应(节省约2秒+API成本)
性能
- - 缓存查找:约5-15毫秒(相比LLM调用的1-5秒)
- 嵌入向量生成:约50-100毫秒
- 每条条目存储:约6KB(1536维向量+元数据)
- 支持数百万条缓存条目