Ultramemory
Structured agent memory: extracts atomic facts from text, detects relations to existing knowledge (updates, contradicts, extends), embeds for semantic search, and auto-builds entity profiles.
PyPI: ultramemory | GitHub: INLINECODE1
Setup
CODEBLOCK0
Requirements:
- - Python 3.10+
- An LLM API key for fact extraction (Anthropic recommended, OpenAI also works)
- Local embeddings load automatically (sentence-transformers/all-MiniLM-L6-v2, ~80MB first run)
Environment:
CODEBLOCK1
Quick Start
CODEBLOCK2
CLI Usage
The scripts/memory.sh wrapper handles venv activation and API key loading.
Store memories (ingest)
CODEBLOCK3
Categories: person, preference, project, decision, event, INLINECODE8
Relations auto-detected: updates (supersedes old fact), contradicts, extends, supports, INLINECODE13
When a memory updates an existing one, the old memory is marked superseded and the new one gets an incremented version.
Recall (agent-optimized search)
Compact context block for injecting into agent prompts:
CODEBLOCK4
Output:
CODEBLOCK5
Search (full JSON)
CODEBLOCK6
Entity operations
CODEBLOCK7
Integration Patterns
Session startup
At the start of any session, hydrate context:
CODEBLOCK8
Post-conversation ingest
After meaningful conversations, pass the text directly:
CODEBLOCK9
API Server
For multi-agent setups, run the API server (requires separate install from PyPI):
CODEBLOCK10
Endpoints: POST /api/ingest, POST /api/search, POST /api/recall, INLINECODE19
Advanced: Auto-ingest from session transcripts
For continuous ingestion from session files, see the GitHub repo which includes auto_ingest.py and live_ingest.sh scripts.
Why Not Just Use MEMORY.md?
You'll outgrow a flat file fast. But you also can't replace it entirely. We tried.
At 18,000+ memories, search results get noisy. The DB is great at answering "what happened Tuesday?" but terrible as a session primer. Meanwhile, MEMORY.md is perfect for "who am I, who's my human, what are we working on" but can't hold 18K facts in 2K tokens.
The architecture we landed on uses three layers:
Layer 1: MEMORY.md (always loaded, zero cost)
Curated essentials under 2K tokens. Loaded every session, no API calls, no latency. Contains identity, active projects, key preferences. Think of it as working memory.
Layer 2: Ultramemory plugin (opportunistic injection)
When a message arrives, the plugin searches the DB and injects relevant memories if they score above a similarity threshold (we use 0.55). The agent never explicitly asks for this. It just gets richer context when the DB has something relevant.
Layer 3: Ultramemory direct (precision recall)
The agent explicitly searches when it needs specifics. "What was the benchmark result?" or "When did we decide to drop NYC?" This is the full 18K+ memory DB with semantic search, temporal filtering, and entity profiles.
MEMORY.md is the backup and the bootstrap. Ultramemory is the brain. You need both.
Architecture
- - Storage: Single SQLite DB with WAL mode. Agent-ID tagging for multi-agent isolation.
- Extraction: LLM extracts atomic facts, categorizes, detects entities, finds relations to existing memories.
- Embeddings: Local sentence-transformers (384-dim). No API calls for search.
- Relations: UPDATE, EXTEND, CONTRADICT, SUPPORT, DERIVE. Version chain with superseded tracking.
- Profiles: Auto-built entity profiles from accumulated facts.
- Events: Structured event extraction with canonical clustering and dedup.
- Temporal: Deterministic temporal expression parsing and date arithmetic (no LLM needed).
Cost
- - Ingest: ~$0.01-0.02 per call (3 LLM calls: extract, relate, profile)
- Search/recall: Free (local embeddings + SQLite)
- Embedding model: ~80MB download on first run, then cached
Benchmark
80% accuracy on LongMemEval_s (production-relevant questions). 32ms median search latency.
超强记忆
结构化智能体记忆:从文本中提取原子事实,检测与现有知识的关系(更新、矛盾、扩展),嵌入语义搜索,并自动构建实体档案。
PyPI:ultramemory | GitHub:jared-goering/ultramemory
安装
bash
安装(首次运行自动创建虚拟环境)
pip install ultramemory
或从源码安装
git clone https://github.com/jared-goering/ultramemory.git
cd ultramemory && pip install -e .
系统要求:
- - Python 3.10+
- 用于事实提取的LLM API密钥(推荐Anthropic,OpenAI也可用)
- 本地嵌入自动加载(sentence-transformers/all-MiniLM-L6-v2,首次运行约80MB)
环境变量:
bash
export ANTHROPICAPIKEY=sk-ant-... # 或 OPENAIAPIKEY
export ULTRAMEMORY_DB=./memory.db # 默认:当前目录下的memory.db
快速开始
python
from ultramemory import MemoryEngine
engine = MemoryEngine(db_path=memory.db)
摄入文本(提取原子事实,检测关系,构建档案)
results = engine.ingest(Jared从Bel Aire搬到了Wichita。预产期在七月。)
搜索
matches = engine.search(Jared住在哪里?, top_k=5)
回忆(为智能体提示生成紧凑上下文块)
context = engine.recall(当前项目和优先级, top_k=5)
CLI使用
scripts/memory.sh 封装脚本处理虚拟环境激活和API密钥加载。
存储记忆(摄入)
bash
bash scripts/memory.sh ingest \
Jared搬到了Wichita。预产期现在改为八月,不是七月。 \
--session main-2026-03-22 --agent kit
分类:person(人物)、preference(偏好)、project(项目)、decision(决策)、event(事件)、insight(洞察)
自动检测的关系:updates(更新,取代旧事实)、contradicts(矛盾)、extends(扩展)、supports(支持)、derives(衍生)
当一条记忆更新现有记忆时,旧记忆被标记为已取代,新记忆获得递增版本号。
回忆(智能体优化搜索)
紧凑上下文块,用于注入智能体提示:
bash
bash scripts/memory.sh recall Jared住在哪里? --top-k 5
输出:
[人物] Jared搬到了Wichita。(v2,当前,89%匹配)
-> 更新:Jared住在堪萨斯州Bel Aire。
[项目] Jared在WSU教授以人为本设计。(v1,当前,72%匹配)
搜索(完整JSON)
bash
bash scripts/memory.sh search 宝宝预产期 --top-k 10
包含已取代的记忆:
bash scripts/memory.sh search 宝宝预产期 --all
时间回溯:我们在3月1日知道什么?
bash scripts/memory.sh search 宝宝预产期 --as-of 2026-03-01
实体操作
bash
bash scripts/memory.sh entities # 列出所有已知实体
bash scripts/memory.sh history Jared # 版本时间线
bash scripts/memory.sh profile Jared # 自动构建的档案
bash scripts/memory.sh stats # 统计、分类
集成模式
会话启动
在任何会话开始时,注入上下文:
bash
bash scripts/startup-recall.sh
对话后摄入
在有意义的对话后,直接传入文本:
bash
bash scripts/memory.sh ingest 用户决定使用React作为前端。预算为5万美元。 --session $SESSIONKEY --agent $AGENTID
API服务器
对于多智能体设置,运行API服务器(需要从PyPI单独安装):
bash
pip install ultramemory
python3 -m uvicorn ultramemory.server:app --port 8642 --host 127.0.0.1
端点:POST /api/ingest、POST /api/search、POST /api/recall、GET /api/stats
高级:从会话记录自动摄入
如需从会话文件持续摄入,请参阅GitHub仓库,其中包含autoingest.py和live_ingest.sh脚本。
为什么不用MEMORY.md?
平面文件很快就会不够用。但也不能完全替代它。我们试过。
超过18,000条记忆后,搜索结果会变得嘈杂。数据库擅长回答周二发生了什么?,但作为会话引导却很差。而MEMORY.md非常适合我是谁,我的人类是谁,我们在做什么,但无法在2K token中容纳18K条事实。
我们最终采用的架构使用三层:
第一层:MEMORY.md(始终加载,零成本)
2K token以内的精选要点。每次会话加载,无需API调用,零延迟。包含身份、活跃项目、关键偏好。可以看作工作记忆。
第二层:Ultramemory插件(机会性注入)
当消息到达时,插件搜索数据库,如果相似度超过阈值(我们使用0.55),则注入相关记忆。智能体从不显式请求。当数据库有相关内容时,它只是获得更丰富的上下文。
第三层:Ultramemory直接调用(精确回忆)
智能体在需要具体信息时显式搜索。基准测试结果是什么?或我们什么时候决定放弃纽约?这是完整的18K+记忆数据库,支持语义搜索、时间过滤和实体档案。
MEMORY.md是备份和引导。Ultramemory是大脑。两者都需要。
架构
- - 存储: 单个SQLite数据库,使用WAL模式。Agent-ID标记用于多智能体隔离。
- 提取: LLM提取原子事实,分类,检测实体,发现与现有记忆的关系。
- 嵌入: 本地sentence-transformers(384维)。搜索无需API调用。
- 关系: UPDATE(更新)、EXTEND(扩展)、CONTRADICT(矛盾)、SUPPORT(支持)、DERIVE(衍生)。带取代追踪的版本链。
- 档案: 从累积事实自动构建实体档案。
- 事件: 结构化事件提取,带规范聚类和去重。
- 时间: 确定性时间表达式解析和日期运算(无需LLM)。
成本
- - 摄入:每次调用约$0.01-0.02(3次LLM调用:提取、关联、档案)
- 搜索/回忆:免费(本地嵌入+SQLite)
- 嵌入模型:首次运行下载约80MB,之后缓存
基准测试
在LongMemEval_s(生产相关问题上)达到80%准确率。中位搜索延迟32ms。