OpenSearch Vector Search Expert

GitHub: norrishuang/opensearch-vector-search-skill
— Issues, PRs, and new reference contributions are welcome!

Safety Notes

- Pricing script (scripts/get_opensearch_pricing.py): Makes outbound HTTPS requests to the AWS Pricing API (pricing.us-east-1.amazonaws.com). Requires boto3 and valid AWS credentials. The script is read-only (fetches public pricing data) and does not modify any AWS resources. Only run it when the user explicitly requests cost estimation.
Reference examples: Code snippets in references/ contain example API calls to localhost:9200 (standard OpenSearch endpoint). These are documentation examples only — do NOT execute them automatically. Present them to the user as configuration references.
Cluster analyzer (scripts/analyze_cluster.py): Connects to a user-provided OpenSearch cluster and performs read-only analysis. It NEVER creates, modifies, or deletes any indices or data. Only run it when the user explicitly provides cluster credentials (URL + username/password).

Knowledge Base Structure

Read the corresponding reference file based on the question type:

Question Type	Reference File	Keywords
Vector search, k-NN, HNSW, disk mode	INLINECODE6	vector, knn, hnsw, warmup, disk mode, on_disk
Quantization techniques

Core Workflows

1. Answering Vector Search Configuration Questions

1. Read INLINECODE15
Recommend in-memory mode or disk mode based on user scenario (latency requirements, data scale, QPS)
Provide specific mapping JSON configuration
Recommend FAISS engine + cosine similarity + 7/8 series instances

2. Capacity Planning & Instance Sizing (Most Common Scenario)

After user provides vector count and dimensions:

1. Read references/cost-optimization.md for memory calculation formulas and examples
Calculate using the standard HNSW memory formula (source: AWS official blog):

   Unquantized (float32):
     Memory = 1.1 × (4 × d + 8 × m) × num_vectors × (replicas + 1) bytes
   
   Quantized (FAISS engine, compressed vectors in memory):
     FP16 (2x):    Memory = 1.1 × (2 × d + 8 × m) × num_vectors × (replicas + 1)
     Byte (4x):    Memory = 1.1 × (1 × d + 8 × m) × num_vectors × (replicas + 1)
     Binary 4-bit: Memory = 1.1 × (d/2 + 8 × m) × num_vectors × (replicas + 1)
     Binary 2-bit: Memory = 1.1 × (d/4 + 8 × m) × num_vectors × (replicas + 1)
     Binary 1-bit: Memory = 1.1 × (d/8 + 8 × m) × num_vectors × (replicas + 1)
   
   Where: d=vector dimensions, m=HNSW connections (default 16), num_vectors=total vector count

3. Apply OpenSearch node memory allocation rules:

   JVM Heap = min(node_memory × 50%, 32GB)
   Remaining memory = node_memory - JVM Heap
   KNN available memory = remaining × 75%  (with knn.memory.circuit_breaker.limit=70%, ~35% of node memory)

4. Select instance type, ensuring total cluster KNN available memory > vector index memory requirement
Run pricing script for real-time pricing (see below)

3. Cost Estimation (with Real-Time Pricing)

When user needs cost estimation:

1. Complete capacity planning above
Run pricing script for real-time prices:

   python3 scripts/get_opensearch_pricing.py --region <region> --instance-type <type>

3. Calculate monthly cost:

   Instance cost = unit_price × node_count × (1 + replica_count)
   EBS cost = capacity(GB) × $0.08 + additional IOPS charges
   Total cost = Instance cost + EBS cost

4. Compare cost differences across quantization options

4. Live Cluster Analysis (When User Provides Cluster Credentials)

When the user provides an OpenSearch cluster URL and credentials, use the cluster analyzer to
connect and review their vector search configuration. This is read-only — never modify the cluster.

Prerequisites: User must explicitly provide:

- Cluster URL (e.g., https://my-cluster.us-east-1.es.amazonaws.com)
Username and password (basic auth), OR --no-auth for clusters without authentication

Workflow:

1. Ask for credentials if not provided: URL, username, password
Run cluster overview to get health, nodes, and k-NN index list:

   python3 scripts/analyze_cluster.py --url <url> -u <user> -p <pass> --action cluster-overview -f pretty

3. Analyze specific index if user specifies one, or pick the most important k-NN index:

   python3 scripts/analyze_cluster.py --url <url> -u <user> -p <pass> --action index-detail --index <index_name> -f pretty

4. Analyze shard distribution for the target index:

   python3 scripts/analyze_cluster.py --url <url> -u <user> -p <pass> --action shard-analysis --index <index_name> -f pretty

5. Run all analyses at once (for a comprehensive report):

   python3 scripts/analyze_cluster.py --url <url> -u <user> -p <pass> --action all --index <index_name> -f pretty

6. Interpret the JSON output and present findings to the user:

- Cluster health status and node resource utilization - Vector field configurations (engine, dimensions, HNSW params, quantization) - Memory estimates vs actual cluster capacity - Auto-generated recommendations (from the script)

7. Provide actionable advice based on findings:

- Suggest better engine/quantization if needed (provide example mapping JSON) - Suggest instance resizing if memory is over/under-provisioned - Suggest shard rebalancing if distribution is uneven - NEVER execute write operations — only provide example configurations for the user to apply

Cluster Analyzer Script Reference:
CODEBLOCK8

Safety constraints for live cluster analysis:

- The script is strictly read-only (uses only GET/CAT APIs)
NEVER create, update, or delete indices on the user's cluster
NEVER change cluster settings or mappings
Only provide example JSON configurations for the user to review and apply themselves
If the user asks to apply changes, provide the exact API calls/JSON but let the user execute them

Pricing Script Usage

CODEBLOCK9

Output fields: instancetype, vcpu, memorygib, priceperhourusd, pricepermonthusd, network

Recommended Defaults

Always recommend these defaults unless user has specific requirements:

- Engine: FAISS
Similarity: cosine
Instance family (Gen 7+ only, never recommend older generations):

- Vector search (k-NN): r7g/r8g/r8gd (memory-optimized, lowest search latency; r8g Graviton4 ~30% faster than r7g) - Indexing-heavy + vector: OR2 (optimized, S3 durability, good memory-to-price ratio) - Indexing-heavy (no vector): OM2 (highest indexing throughput, 15% faster than OR1) - Large dataset with NVMe: OI2 (storage-optimized, no EBS needed) - Do NOT recommend: r6g, r5, m5, c5, i3, or any older instance families

- HNSW parameters: ef_construction=512, m=16
Quantization preference: Byte (4x) for production, Binary (32x) for aggressive cost optimization
Disk mode threshold: Consider when data > 50M vectors and 100-200ms latency is acceptable

Instance Selection Decision Tree

CODEBLOCK10

Response Template

Organize cost/sizing answers in this structure:

1. Requirements confirmation: Vector count, dimensions, QPS, latency requirements
Memory calculation: Raw size → quantized size → required KNN memory
Cluster configuration: Instance type × count, shards, replicas
Cost estimation: Instance cost + EBS cost = monthly total
Optimization suggestions: Quantization comparison, Reserved Instance discounts

OpenSearch 向量搜索专家

GitHub: norrishuang/opensearch-vector-search-skill
— 欢迎提交 Issue、PR 和新的参考贡献！

安全注意事项

- 定价脚本 (scripts/getopensearchpricing.py)：向 AWS 定价 API (pricing.us-east-1.amazonaws.com) 发起出站 HTTPS 请求。需要 boto3 和有效的 AWS 凭证。该脚本为只读（获取公开定价数据），不会修改任何 AWS 资源。仅在用户明确请求成本估算时运行。
参考示例：references/ 中的代码片段包含对 localhost:9200（标准 OpenSearch 端点）的示例 API 调用。这些仅为文档示例——请勿自动执行。应将其作为配置参考呈现给用户。
集群分析器 (scripts/analyze_cluster.py)：连接到用户提供的 OpenSearch 集群，执行只读分析。它从不创建、修改或删除任何索引或数据。仅在用户明确提供集群凭证（URL + 用户名/密码）时运行。

知识库结构

根据问题类型读取相应的参考文件：

问题类型	参考文件	关键词
向量搜索、k-NN、HNSW、磁盘模式	references/vector-search.md	vector, knn, hnsw, warmup, disk mode, on_disk
量化技术

核心工作流程

1. 回答向量搜索配置问题

1. 读取 references/vector-search.md
根据用户场景（延迟要求、数据规模、QPS）推荐内存模式或磁盘模式
提供具体的映射 JSON 配置
推荐 FAISS 引擎 + 余弦相似度 + 7/8 系列实例

2. 容量规划与实例规格选择（最常见场景）

用户提供向量数量和维度后：

1. 读取 references/cost-optimization.md 获取内存计算公式和示例
使用标准 HNSW 内存公式进行计算（来源：AWS 官方博客）：

未量化（float32）：
内存 = 1.1 × (4 × d + 8 × m) × num_vectors × (replicas + 1) 字节

量化后（FAISS 引擎，内存中压缩向量）：
FP16（2倍）：内存 = 1.1 × (2 × d + 8 × m) × num_vectors × (replicas + 1)
Byte（4倍）：内存 = 1.1 × (1 × d + 8 × m) × num_vectors × (replicas + 1)
Binary 4-bit：内存 = 1.1 × (d/2 + 8 × m) × num_vectors × (replicas + 1)
Binary 2-bit：内存 = 1.1 × (d/4 + 8 × m) × num_vectors × (replicas + 1)
Binary 1-bit：内存 = 1.1 × (d/8 + 8 × m) × num_vectors × (replicas + 1)

其中：d=向量维度，m=HNSW 连接数（默认16），num_vectors=总向量数

3. 应用 OpenSearch 节点内存分配规则：

JVM 堆内存 = min(节点内存 × 50%, 32GB)
剩余内存 = 节点内存 - JVM 堆内存
KNN 可用内存 = 剩余内存 × 75%（knn.memory.circuit_breaker.limit=70% 时，约为节点内存的 35%）

4. 选择实例类型，确保集群总 KNN 可用内存 > 向量索引内存需求
运行定价脚本获取实时价格（见下文）

3. 成本估算（含实时定价）

当用户需要成本估算时：

1. 完成上述容量规划
运行定价脚本获取实时价格：

bash python3 scripts/getopensearchpricing.py --region --instance-type

3. 计算月度成本：

实例成本 = 单价 × 节点数 × (1 + 副本数)
EBS 成本 = 容量(GB) × $0.08 + 额外 IOPS 费用
总成本 = 实例成本 + EBS 成本

4. 比较不同量化选项的成本差异

4. 实时集群分析（用户提供集群凭证时）

当用户提供 OpenSearch 集群 URL 和凭证时，使用集群分析器连接并检查其向量搜索配置。此为只读操作——绝不修改集群。

前提条件：用户必须明确提供：

- 集群 URL（例如 https://my-cluster.us-east-1.es.amazonaws.com）
用户名和密码（基本认证），或对无需认证的集群使用 --no-auth

工作流程：

1. 询问凭证（如未提供）：URL、用户名、密码
运行集群概览获取健康状态、节点和 k-NN 索引列表：

bash python3 scripts/analyze_cluster.py --url -u -p --action cluster-overview -f pretty

3. 分析特定索引（如用户指定），或选择最重要的 k-NN 索引：

bash python3 scripts/analyzecluster.py --url -u -p --action index-detail --index name> -f pretty
4. 分析目标索引的分片分布：
bash python3 scripts/analyzecluster.py --url -u -p --action shard-analysis --index name> -f pretty
5. 一次性运行所有分析（获取综合报告）：
bash python3 scripts/analyzecluster.py --url -u -p --action all --index name> -f pretty
6. 解读 JSON 输出并向用户呈现发现结果：
- 集群健康状态和节点资源利用率 - 向量字段配置（引擎、维度、HNSW 参数、量化方式） - 内存估算与实际集群容量对比 - 自动生成的建议（来自脚本）
7. 根据发现结果提供可操作建议：
- 如需，建议更好的引擎/量化方式（提供示例映射 JSON） - 如内存配置过多或不足，建议调整实例规格 - 如分片分布不均，建议重新平衡分片 - 绝不执行写入操作——仅提供示例配置供用户自行应用
集群分析器脚本参考：

用法：
python3 scripts/analyze_cluster.py --url -u -p [选项]

操作：
--action cluster-overview 集群健康状态、节点、k-NN 统计信息和所有 k-NN 索引摘要（默认）
--action index-detail 深入分析特定索引的向量配置 + 内存估算
--action shard-analysis 特定索引的分片分布和大小
--action all 运行所有分析

选项：
--index 指定目标索引（index-detail 和 shard-analysis 必需）
--no-auth 无需认证连接
--verify-ssl 验证 SSL 证书（默认：跳过）
--format pretty 人类可读的 JSON 输出

输出：包含以下顶级键的 JSON：
- cluster_overview：健康状态、版本、节点（内存/CPU

opensearch-vector-search向量搜索