Semantic Cache

Cache LLM responses by meaning using Redis vector search. Similar questions return cached answers instantly instead of making expensive API calls.

How It Works

1. User asks a question or makes an LLM request
The question is embedded into a vector using OpenAI text-embedding-3-small
Redis vector search finds semantically similar cached queries (cosine similarity > 0.80)
Cache hit: Return the cached response instantly (~100ms)
Cache miss: Pass through to the LLM, cache the response for future similar queries

Commands

Cache a query and response

CODEBLOCK0

Check cache for a similar query

CODEBLOCK1

Cache stats

CODEBLOCK2

Clear all cached entries

CODEBLOCK3

Interactive mode — wraps any LLM call with caching

node scripts/cache.js query "Your question here"

This checks cache first. On miss, calls OpenAI, caches the result, and returns it.

When to Use This Skill

- Before making any LLM API call, check if a semantically similar query was already answered
When building agents that answer repetitive questions (support bots, FAQ systems)
When you want to reduce OpenAI/Anthropic API costs by 40-80%
When you need faster response times for common queries

Configuration

Set these environment variables:

- REDIS_URL — Redis connection string with vector search support (Redis Cloud or Redis Stack)
INLINECODE1 — For generating embeddings
INLINECODE2 — Similarity threshold 0-1 (default: 0.80, higher = stricter matching)
INLINECODE3 — Cache TTL in seconds (default: 86400 = 24 hours)

Example Workflow

CODEBLOCK5

Performance

- Cache lookup: ~5-15ms (vs 1-5 seconds for LLM call)
Embedding generation: ~50-100ms
Storage per entry: ~6KB (1536-dim vector + metadata)
Supports millions of cached entries

语义缓存

使用Redis向量搜索按语义缓存LLM响应。相似的问题可直接返回缓存的答案，无需进行昂贵的API调用。

工作原理

1. 用户提出问题或发起LLM请求
使用OpenAI text-embedding-3-small将问题嵌入为向量
Redis向量搜索查找语义相似的已缓存查询（余弦相似度 > 0.80）
缓存命中：立即返回缓存的响应（约100毫秒）
缓存未命中：传递至LLM，缓存响应以供未来相似查询使用

命令

缓存查询和响应

bash node scripts/cache.js store 法国的首都是什么？法国的首都是巴黎。

检查缓存的相似查询

bash node scripts/cache.js lookup 法国首都是哪个城市？

缓存统计

bash node scripts/cache.js stats

清除所有缓存条目

bash node scripts/cache.js clear

交互模式 — 为任何LLM调用添加缓存

bash node scripts/cache.js query 在此输入您的问题

此命令首先检查缓存。若未命中，则调用OpenAI，缓存结果并返回。

何时使用此技能

- 在进行任何LLM API调用之前，检查是否已有语义相似的查询被回答过
构建回答重复性问题的智能体时（支持机器人、常见问题解答系统）
希望将OpenAI/Anthropic API成本降低40-80%时
需要为常见查询提供更快的响应时间时

配置

设置以下环境变量：

- REDISURL — 支持向量搜索的Redis连接字符串（Redis Cloud或Redis Stack）
OPENAIAPIKEY — 用于生成嵌入向量
SEMANTICCACHETHRESHOLD — 相似度阈值0-1（默认：0.80，值越高匹配越严格）
SEMANTICCACHE_TTL — 缓存生存时间（秒）（默认：86400 = 24小时）

示例工作流程

用户：如何重置密码？
-> 嵌入查询 -> 搜索Redis -> 未命中
-> 调用LLM -> 获取响应 -> 缓存响应 -> 返回响应

用户：我忘记密码了，如何修改？
-> 嵌入查询 -> 搜索Redis -> 命中（相似度92.7%）
-> 在8毫秒内返回缓存的响应（节省约2秒+API成本）

性能

- 缓存查找：约5-15毫秒（相比LLM调用的1-5秒）
嵌入向量生成：约50-100毫秒
每条条目存储：约6KB（1536维向量+元数据）
支持数百万条缓存条目

semantic-cache语义缓存

semantic-cache

Semantic Cache

How It Works

Commands

Cache a query and response

Check cache for a similar query

Cache stats

Clear all cached entries

Interactive mode — wraps any LLM call with caching

When to Use This Skill

Configuration

Example Workflow

Performance

语义缓存

工作原理

命令

缓存查询和响应

检查缓存的相似查询

缓存统计

清除所有缓存条目

交互模式 — 为任何LLM调用添加缓存

何时使用此技能

配置

示例工作流程

性能

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

semantic-cache语义缓存

semantic-cache

Semantic Cache

How It Works

Commands

Cache a query and response

Check cache for a similar query

Cache stats

Clear all cached entries

Interactive mode — wraps any LLM call with caching

When to Use This Skill

Configuration

Example Workflow

Performance

语义缓存

工作原理

命令

缓存查询和响应

检查缓存的相似查询

缓存统计

清除所有缓存条目

交互模式 — 为任何LLM调用添加缓存

何时使用此技能

配置

示例工作流程

性能

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement