TurboQuant Memory
Compress embedding vectors 5-8x with 98%+ search accuracy using TurboQuant (Google, ICLR 2026).
Quick Start
1. Run tests
CODEBLOCK0
15 built-in tests: FWHT correctness, MSE distortion, IP correlation, recall, compression ratio, determinism.
2. Validate on your data
CODEBLOCK1
Auto-detects sqlite-vec vec0 tables, analyzes distribution, reports quantization quality and recall.
3. Quantize a memory database
CODEBLOCK2
4. Integrate into code
CODEBLOCK3
Recommended Configuration
| Preset | Mode | Bits | R@1 | Compression | Use Case |
|---|
| Default | MSE | 5 | 98% | 6.4x | Most memory/RAG search |
| Conservative |
MSE | 6 | 98%+ | 5.3x | High-fidelity retrieval |
| Aggressive | MSE | 4 | 92% | 8.0x | Large-scale, storage-constrained |
Parameters
| Parameter | Default | Description |
|---|
| INLINECODE1 | auto-detect | Embedding dimension (768, 1536, 3072, etc.) |
| INLINECODE2 |
5 | Bits per coordinate. See table above. |
|
seed | 42 | Rotation seed. Same seed = reproducible quantization. |
Algorithm
Blockwise Hadamard Rotation → Lloyd-Max Scalar Quantization
- 1. Split vector into power-of-2 blocks (e.g., 3072 = 3 × 1024)
- Per block: random sign flip + Fast Walsh-Hadamard Transform (fully invertible)
- Per-vector scale normalization
- Lloyd-Max optimal scalar quantizer per coordinate (precomputed codebook for N(0,1))
- Pack indices into compact bit representation
Key properties:
- - Data-oblivious: no training or calibration needed
- Fully invertible: zero information loss from rotation
- Near-optimal: within 2.7x of Shannon information-theoretic lower bound
- Deterministic: same seed = same output
See references/algorithm.md for full details.
Benchmark (Gemini embedding-001, 3072-dim, 112 vectors)
| Bits | MSE | Cosine | R@1 | R@5 | R@10 | Bytes/vec | Compression |
|---|
| 3 | 1.1e-5 | 0.982 | 88% | 90% | 91% | 1,160 | 10.6x |
| 4 |
3.2e-6 | 0.995 | 92% | 93% | 93% | 1,544 | 8.0x |
|
5 |
8.2e-7 |
0.999 |
98% |
96% |
96% |
1,928 |
6.4x |
| 6 | 2.2e-7 | 1.000 | 96% | 98% | 98% | 2,312 | 5.3x |
| 7 | 8e-8 | 1.000 | 100% | 98% | 99% | 2,696 | 4.6x |
| 8 | 3e-8 | 1.000 | 98% | 98% | 99% | 3,080 | 4.0x |
Compatibility
- - Python 3.9+, numpy only (no scipy, no GPU)
- Any embedding dimension ≥ 128
- Any embedding model (Gemini, OpenAI, Cohere, sentence-transformers, etc.)
- SQLite / sqlite-vec
vec0 tables (auto-detected)
References
TurboQuant 记忆压缩
使用 TurboQuant(Google,ICLR 2026)将嵌入向量压缩 5-8 倍,同时保持 98% 以上的搜索准确率。
快速开始
1. 运行测试
bash
python3 scripts/turboquant.py
内置 15 项测试:FWHT 正确性、MSE 失真、内积相关性、召回率、压缩比、确定性。
2. 在您的数据上验证
bash
python3 scripts/validate.py --db /path/to/memory.sqlite --auto-detect --bits 5
自动检测 sqlite-vec vec0 表,分析分布情况,报告量化质量和召回率。
3. 量化记忆数据库
bash
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --benchmark
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --migrate
4. 集成到代码中
python
from turboquant import TurboQuantMSE
初始化(确定性——相同种子 = 相同量化结果)
tq = TurboQuantMSE(dim=3072, bits=5)
量化存储
stored = tq.quantize(embedding_vector) # float32 → 压缩格式
重建
reconstructed = tq.dequantize(stored) # 压缩格式 → float32
搜索:查询保持 float32,数据库已量化
q_rot = tq.rotation.apply(query)
for doc in database:
score = doc[norm]
doc[scale] np.dot(q_rot, tq.codebook[doc[indices]])
推荐配置
| 预设 | 模式 | 比特数 | R@1 | 压缩比 | 使用场景 |
|---|
| 默认 | MSE | 5 | 98% | 6.4x | 大多数记忆/RAG 搜索 |
| 保守 |
MSE | 6 | 98%+ | 5.3x | 高保真检索 |
| 激进 | MSE | 4 | 92% | 8.0x | 大规模、存储受限场景 |
参数
| 参数 | 默认值 | 描述 |
|---|
| dim | 自动检测 | 嵌入维度(768、1536、3072 等) |
| bits |
5 | 每个坐标的比特数。参见上表。 |
| seed | 42 | 旋转种子。相同种子 = 可重现的量化结果。 |
算法
分块 Hadamard 旋转 → Lloyd-Max 标量量化
- 1. 将向量分割成 2 的幂次方大小的块(例如,3072 = 3 × 1024)
- 每块:随机符号翻转 + 快速 Walsh-Hadamard 变换(完全可逆)
- 每向量尺度归一化
- 每坐标 Lloyd-Max 最优标量量化器(针对 N(0,1) 预计算码本)
- 将索引打包成紧凑的比特表示
关键特性:
- - 数据无关:无需训练或校准
- 完全可逆:旋转过程零信息损失
- 接近最优:距离香农信息论下界仅 2.7 倍
- 确定性:相同种子 = 相同输出
详见 references/algorithm.md。
基准测试(Gemini embedding-001,3072 维,112 个向量)
| 比特数 | MSE | 余弦相似度 | R@1 | R@5 | R@10 | 字节/向量 | 压缩比 |
|---|
| 3 | 1.1e-5 | 0.982 | 88% | 90% | 91% | 1,160 | 10.6x |
| 4 |
3.2e-6 | 0.995 | 92% | 93% | 93% | 1,544 | 8.0x |
|
5 |
8.2e-7 |
0.999 |
98% |
96% |
96% |
1,928 |
6.4x |
| 6 | 2.2e-7 | 1.000 | 96% | 98% | 98% | 2,312 | 5.3x |
| 7 | 8e-8 | 1.000 | 100% | 98% | 99% | 2,696 | 4.6x |
| 8 | 3e-8 | 1.000 | 98% | 98% | 99% | 3,080 | 4.0x |
兼容性
- - Python 3.9+,仅需 numpy(无需 scipy,无需 GPU)
- 任意嵌入维度 ≥ 128
- 任意嵌入模型(Gemini、OpenAI、Cohere、sentence-transformers 等)
- SQLite / sqlite-vec vec0 表(自动检测)
参考文献