TurboQuant Memory

Compress embedding vectors 5-8x with 98%+ search accuracy using TurboQuant (Google, ICLR 2026).

Quick Start

1. Run tests

CODEBLOCK0

15 built-in tests: FWHT correctness, MSE distortion, IP correlation, recall, compression ratio, determinism.

2. Validate on your data

CODEBLOCK1

Auto-detects sqlite-vec vec0 tables, analyzes distribution, reports quantization quality and recall.

3. Quantize a memory database

CODEBLOCK2

4. Integrate into code

CODEBLOCK3

Recommended Configuration

Preset	Mode	Bits	R@1	Compression	Use Case
Default	MSE	5	98%	6.4x	Most memory/RAG search
Conservative

MSE | 6 | 98%+ | 5.3x | High-fidelity retrieval | | Aggressive | MSE | 4 | 92% | 8.0x | Large-scale, storage-constrained |

Parameters

Parameter	Default	Description
INLINECODE1	auto-detect	Embedding dimension (768, 1536, 3072, etc.)
INLINECODE2

Algorithm

Blockwise Hadamard Rotation → Lloyd-Max Scalar Quantization

1. Split vector into power-of-2 blocks (e.g., 3072 = 3 × 1024)
Per block: random sign flip + Fast Walsh-Hadamard Transform (fully invertible)
Per-vector scale normalization
Lloyd-Max optimal scalar quantizer per coordinate (precomputed codebook for N(0,1))
Pack indices into compact bit representation

Key properties:

- Data-oblivious: no training or calibration needed
Fully invertible: zero information loss from rotation
Near-optimal: within 2.7x of Shannon information-theoretic lower bound
Deterministic: same seed = same output

See references/algorithm.md for full details.

Benchmark (Gemini embedding-001, 3072-dim, 112 vectors)

Bits	MSE	Cosine	R@1	R@5	R@10	Bytes/vec	Compression
3	1.1e-5	0.982	88%	90%	91%	1,160	10.6x
4

3.2e-6 | 0.995 | 92% | 93% | 93% | 1,544 | 8.0x | | 5 | 8.2e-7 | 0.999 | 98% | 96% | 96% | 1,928 | 6.4x | | 6 | 2.2e-7 | 1.000 | 96% | 98% | 98% | 2,312 | 5.3x | | 7 | 8e-8 | 1.000 | 100% | 98% | 99% | 2,696 | 4.6x | | 8 | 3e-8 | 1.000 | 98% | 98% | 99% | 3,080 | 4.0x |

Compatibility

- Python 3.9+, numpy only (no scipy, no GPU)
Any embedding dimension ≥ 128
Any embedding model (Gemini, OpenAI, Cohere, sentence-transformers, etc.)
SQLite / sqlite-vec vec0 tables (auto-detected)

References

- TurboQuant paper: arXiv:2504.19874 (ICLR 2026)
PolarQuant paper: arXiv:2502.02617 (AISTATS 2026)

TurboQuant 记忆压缩

使用 TurboQuant（Google，ICLR 2026）将嵌入向量压缩 5-8 倍，同时保持 98% 以上的搜索准确率。

快速开始

1. 运行测试

bash
python3 scripts/turboquant.py

内置 15 项测试：FWHT 正确性、MSE 失真、内积相关性、召回率、压缩比、确定性。

2. 在您的数据上验证

bash
python3 scripts/validate.py --db /path/to/memory.sqlite --auto-detect --bits 5

自动检测 sqlite-vec vec0 表，分析分布情况，报告量化质量和召回率。

3. 量化记忆数据库

bash
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --benchmark
python3 scripts/memory_quantize.py --db /path/to/memory.db --bits 5 --migrate

4. 集成到代码中

python
from turboquant import TurboQuantMSE

初始化（确定性——相同种子 = 相同量化结果）

tq = TurboQuantMSE(dim=3072, bits=5)

量化存储

stored = tq.quantize(embedding_vector) # float32 → 压缩格式

重建

reconstructed = tq.dequantize(stored) # 压缩格式 → float32

搜索：查询保持 float32，数据库已量化

q_rot = tq.rotation.apply(query) for doc in database: score = doc[norm] doc[scale] np.dot(q_rot, tq.codebook[doc[indices]])

预设	模式	比特数	R@1	压缩比	使用场景
默认	MSE	5	98%	6.4x	大多数记忆/RAG 搜索
保守

参数

参数	默认值	描述
dim	自动检测	嵌入维度（768、1536、3072 等）
bits

算法

分块 Hadamard 旋转 → Lloyd-Max 标量量化

1. 将向量分割成 2 的幂次方大小的块（例如，3072 = 3 × 1024）
每块：随机符号翻转 + 快速 Walsh-Hadamard 变换（完全可逆）
每向量尺度归一化
每坐标 Lloyd-Max 最优标量量化器（针对 N(0,1) 预计算码本）
将索引打包成紧凑的比特表示

关键特性：

- 数据无关：无需训练或校准
完全可逆：旋转过程零信息损失
接近最优：距离香农信息论下界仅 2.7 倍
确定性：相同种子 = 相同输出

详见 references/algorithm.md。

基准测试（Gemini embedding-001，3072 维，112 个向量）

比特数	MSE	余弦相似度	R@1	R@5	R@10	字节/向量	压缩比
3	1.1e-5	0.982	88%	90%	91%	1,160	10.6x
4

兼容性

- Python 3.9+，仅需 numpy（无需 scipy，无需 GPU）
任意嵌入维度 ≥ 128
任意嵌入模型（Gemini、OpenAI、Cohere、sentence-transformers 等）
SQLite / sqlite-vec vec0 表（自动检测）

参考文献

- TurboQuant 论文：arXiv:2504.19874（ICLR 2026）
PolarQuant 论文：arXiv:2502.02617（AISTATS 2026）

turboquant-memoryTurboQuant记忆