Tree-Graph Hybrid RAG
This skill teaches Claude how to build the database layer of a Tree-Graph Hybrid RAG system. It focuses on the integration seam between PageIndex-style tree output and LightRAG-style graph extraction, both stored in PostgreSQL.
Core Philosophy
- - Tree (Macro): Represents the document's native hierarchy. Gives the LLM the structural skeleton (Chapter -> Section).
- Graph (Micro): Represents Entities and Relationships. Gives the LLM cross-document, fine-grained factual connections.
- Fusion: Every node and edge in the Graph is anchored to a specific
node_id in the Tree, enabling bidirectional traversal (from graph detail to tree context, or tree context to graph detail).
Bundled Resources
This skill includes the minimum resources needed to teach Claude the database design and data flow:
- - schema.sql: The complete PostgreSQL table definitions required for this architecture.
- ingestioncore.py: Python script demonstrating how to flatten the Tree JSON into Postgres and how to extract graph entities anchored to the tree.
- retrievalcore.py: Python script demonstrating the Hybrid Retrieval logic (Querying the Graph to find Tree nodeids, then extracting the macro context).
- smoketest.py: Minimal no-database smoke test that validates the ingestion and retrieval flow with a fake pool.
- integration-pattern.md: Explains what this skill covers, what it intentionally does not reimplement, and where it should sit in a real service.
- queries.md: Common SQL patterns for loading skeletons, anchoring graph hits, and assembling answer context.
Standard Workflows
1. Indexing Workflow
- 1. Tree Extraction: Extract headers/TOC. Save skeleton to
nodes and text to node_contents. - Graph Extraction: Pass each
node_contents to an LLM to extract entities and relations. - Anchoring: Save entities/relations with their corresponding
node_id as a foreign key.
2. Retrieval Workflow
- 1. Entity/Relation Search: Extract keywords from the user query. Search the
entities and relationships tables to find matching factual details. - Anchor Resolution: Get the
node_ids associated with the matched graph elements. - Contextualization (Tree Traversal): Query the
nodes table using the node_ids. Traverse up (parent_id) to gather the section titles and summaries. - Content Fetch: Retrieve the full text from
node_contents only for the required nodes. - Synthesis: Feed the LLM a prompt containing:
- Found Entities & Relations
- Tree Context (e.g., "This was mentioned in Chapter 3: Financials")
- Raw Text Chunks
Output Expectations
When this skill is triggered, prefer producing:
- 1. PostgreSQL DDL or migration SQL
- Tree-flattening ingestion code
- Graph anchoring logic tied to INLINECODE12
- Retrieval SQL that starts from graph hits and resolves back to tree context
- Clear explanation of why this database design is preferable to storing one giant nested JSON blob
Developer Guidelines
- - Always enforce bone-meat separation: Never store massive text chunks in the
nodes or entities tables. - Always maintain multi-tenancy: Ensure every query filters by
workspace. - When users ask to implement a retrieval function, write SQL queries that join
relationships -> nodes -> node_contents to demonstrate the hybrid power. - Do not build a full product scaffold inside the skill. Keep the focus on database design, ingestion, anchoring, and retrieval patterns.
- Do not rewrite PageIndex or LightRAG in full inside the skill. Reuse their existing pipelines and apply this skill at the integration seam.
树-图混合RAG
本技能教授Claude如何构建树-图混合RAG系统的数据库层。它专注于PageIndex风格的树形输出与LightRAG风格的图提取之间的集成接口,两者均存储在PostgreSQL中。
核心理念
- - 树(宏观):代表文档的原生层级结构。为LLM提供结构骨架(章节 -> 小节)。
- 图(微观):代表实体和关系。为LLM提供跨文档的细粒度事实连接。
- 融合:图中的每个节点和边都锚定到树中的特定node_id,实现双向遍历(从图细节到树上下文,或从树上下文到图细节)。
捆绑资源
本技能包含教授Claude数据库设计和数据流所需的最小资源:
标准工作流
1. 索引工作流
- 1. 树提取:提取标题/目录。将骨架保存到nodes,文本保存到nodecontents。
- 图提取:将每个nodecontents传递给LLM以提取实体和关系。
- 锚定:将实体/关系及其对应的node_id作为外键保存。
2. 检索工作流
- 1. 实体/关系搜索:从用户查询中提取关键词。搜索entities和relationships表以找到匹配的事实细节。
- 锚点解析:获取与匹配图元素关联的nodeid。
- 上下文化(树遍历):使用nodeid查询nodes表。向上遍历(parentid)以收集章节标题和摘要。
- 内容获取:仅从nodecontents中检索所需节点的完整文本。
- 综合:向LLM提供包含以下内容的提示:
- 找到的实体和关系
- 树上下文(例如,这在第3章:财务部分提到)
- 原始文本块
输出预期
当触发此技能时,优先生成:
- 1. PostgreSQL DDL或迁移SQL
- 树扁平化摄取代码
- 绑定到node_id的图锚定逻辑
- 从图命中结果开始并解析回树上下文的检索SQL
- 清晰解释为什么这种数据库设计优于存储单个巨大的嵌套JSON blob
开发者指南
- - 始终强制执行骨肉分离:切勿在nodes或entities表中存储大量文本块。
- 始终维护多租户:确保每个查询按workspace过滤。
- 当用户要求实现检索函数时,编写连接relationships -> nodes -> node_contents的SQL查询以展示混合能力。
- 不要在技能内部构建完整的产品脚手架。保持关注数据库设计、摄取、锚定和检索模式。
- 不要在技能内部完整重写PageIndex或LightRAG。复用其现有管道,并在集成接口处应用本技能。