Librarian - Semantic Research Skill
Version: 2.0.0 (Protocol-driven)
Status: 🚧 Development
Architecture: Sandwich (🎤 Skill → 👷 Wrapper → ⚙️ Python)
What This Skill Does
Search your book library using natural language. Ask questions like "What does Graeber say about debt?" and get precise citations with page numbers.
Protocol Flow
CODEBLOCK0
Status: ✅ All nodes ready (v0.15.0 complete)
Protocol Nodes:
- 1. Load Metadata: Reads
.library-index.json + .topic-index.json files - Infer Scope: Confidence >75% → proceed | <75% → ask clarification
- Build Command: INLINECODE2
- Format Output: Synthesized answer + emoji citations + sources
- 🤚 Hard Stop: Honest failure > invented answer (VISION.md principle)
Sandwich Architecture:
Flow: 🎤 Skill → 👷 Sh → ⚙️ Py → 👷 Sh → 🎤 Skill
Why this pattern:
- 1. 🎤 Skill interprets user intent (conversational, flexible, handles ambiguity)
- 👷 Sh builds correct command syntax (skill errs often, sh hardens protocol)
- ⚙️ Py executes deterministic work (search, embeddings, JSON output)
- 👷 Sh formats py output to structured syntax (protocol compliance)
- 🎤 Skill presents to human (natural language, citations, formatting)
Symbols:
- - 🎤 = Skill (you, AI conversational layer)
- 👷 = Wrapper (librarian.sh, protocol enforcement)
- ⚙️ = Python (research.py, heavy lifting)
- 🤚 = Hard stop (honest failure > invented answer)
🤚 Hard Stop Protocol (CRITICAL)
You are a messenger, not the system.
When wrapper returns error codes:
- -
ERROR_NO_METADATA → "Não tem metadata. Roda librarian index." - INLINECODE5 → "Não entendi. Reformula? (topic ou book?)"
- INLINECODE6 → "Sistema quebrado."
- INLINECODE7 → "Não achei nada sobre [query]."
STOP THERE. Do NOT:
- - ❌ Offer web search alternatives
- ❌ Suggest workarounds ("vamos tentar X...")
- ❌ Hallucinate ("maybe the book says...")
- ❌ Apologize or frame as your failure
Hard stop = SUCCESS. You detected system state and reported honestly.
You didn't create the problem. You're just telling the truth:
- - "Tem goteira." ← Bad news, but not your fault.
- "Não tem resultados." ← Reality, not failure.
Reporting hard stops IS your job done. ✅
Metadata Structure (Subway Map)
How metadata is organized:
CODEBLOCK1
Navigation:
- - Topic scope = 1 step (scan
.library-index.json only) - Book scope = 2 steps (
.library-index.json → infer topics → scan .topic-index.json files)
🔴 CRITICAL: Extension Handling
User NEVER mentions file extensions.
Examples:
- - ✅ User says: "I Ching hexagram"
- ✅ User says: "Condensed Chaos"
- ❌ User NEVER says: "I Ching.epub"
Why: Extension = metadata detail (epub vs pdf), irrelevant to user.
Your job:
- 1. Match query → book
title (NO extension) - Pass
filename to wrapper (WITH extension: "I Ching.epub") - Results show title only (NO extension in output)
Metadata fields:
- -
.library-index.json → topics list (big picture) - INLINECODE14 → books list per topic (narrow view)
- Book metadata:
title (user-facing, no ext) + filename (internal, with ext)
Full taxonomy: See backstage/epic-notes/metadata-taxonomy.md
How To Use This Skill
Trigger Detection
Activate when user query matches ANY of these patterns:
Book/Author references:
- - "What does [AUTHOR] say about [TOPIC]?"
- "Search [BOOK] for [QUERY]"
- "Find references to [CONCEPT] in [BOOK]"
Topic keywords (with confidence >75%):
- - "tarot", "I Ching", "divination" → chaos-magick
- "debt", "finance", "money", "banking" → finance
- "anarchism", "mutual aid", "commons" → anarchy
Explicit commands:
- - "pesquisa [QUERY]" / "search [QUERY]"
- "procura [CONCEPT]" / "find [CONCEPT]"
- "librarian: [QUERY]"
If confidence <75% → CLARIFY (ask user)
Node 2: 🎤 Infer Scope
Determine WHAT to search (topic or book) from user intent.
AI = router. Intelligence is in the index (embeddings). You just match query → scope.
Confidence Logic (Binary)
Read metadata (.library-index.json):
CODEBLOCK2
Fuzzy match query against metadata:
| Match book? | Match topic? | → Action |
|---|
| ✅ | ✅ | TOPIC (tiebreaker: future mixed searches) |
| ✅ |
❌ |
BOOK |
| ❌ | ✅ |
TOPIC |
| ❌ | ❌ |
CLARIFY (hard stop) |
Match rules:
- - Book: Query contains book title substring OR author name (case-insensitive)
- Topic: Query contains topic keyword (case-insensitive)
Examples
TOPIC wins (tiebreaker):
- - "Graeber debt finance" → matches both "Debt.epub" + "finance" → TOPIC: finance
BOOK only:
- - "Graeber hexagram 23" → matches "Debt.epub" only → BOOK: Debt.epub
- "I Ching moving lines" → matches "I Ching.epub" only → BOOK: I Ching.epub
TOPIC only:
- - "chaos magick sigils" → matches "chaos-magick" only → TOPIC: chaos-magick
- "mutual aid commons" → matches "anarchy" only → TOPIC: anarchy
CLARIFY (no match):
- - "philosophy" → no match → CLARIFY: "Search which topic or book?"
- "systems" → no match → CLARIFY: "Need more context - which area?"
Scope Types
- 1. Topic scope: INLINECODE19
- Available topics: chaos-magick, finance, anarchy (check .topic-index.json)
- 2. Book scope: INLINECODE20
- Requires exact filename (e.g., "Condensed Chaos.epub")
- Use fuzzy matching: "Condensed" → "Condensed Chaos.epub"
Node 3-5: 👷 Call Wrapper
Execute wrapper script with inferred scope:
CODEBLOCK3
Arguments:
- -
QUERY: User's search query (exact string) - INLINECODE22 : "topic" or "book"
- INLINECODE23 : topic_id or book filename
- INLINECODE24 : Number of results (default: 5)
Example calls:
CODEBLOCK4
Wrapper Exit Codes
The wrapper returns structured status via exit codes:
- - 0: Success (JSON results on stdout)
- 1: ERRORNOMETADATA (🤚 stop: tell user to run
librarian index) - 2: ERRORBROKEN (🤚 stop: system issue, report to Nicholas)
- 3: ERRORNO_RESULTS (🤚 stop: query returned 0 results)
Handle Each Error
Exit 1 (NO_METADATA):
CODEBLOCK5
Exit 2 (BROKEN):
CODEBLOCK6
Exit 3 (NO_RESULTS):
🤚 No results found for "[QUERY]"
Try:
- Broader terms (e.g., "debt" instead of "sovereign debt crisis")
- Different scope (search topic instead of single book?)
- Check spelling
Node 4: 🎤 Format Output
When wrapper returns success (exit 0), format the JSON results for the user.
JSON Structure
CODEBLOCK8
Formatting Rules
1. Synthesize answer (don't just list chunks)
- Read all results
- Extract key points
- Write coherent paragraph(s)
2. Cite sources with emojis
- 📕 = book citation
- Use location (page/paragraph) when available
- Format: INLINECODE26
3. Show similarity scores (optional, if useful)
- ⭐⭐⭐⭐⭐ (0.9-1.0) = Highly relevant
- ⭐⭐⭐⭐ (0.8-0.89) = Very relevant
- ⭐⭐⭐ (0.7-0.79) = Relevant
4. Keep original query context
- If user asked about debt, frame answer around debt
- Don't drift to tangential topics
Example Output
User: "What does Graeber say about the origins of money?"
Librarian:
Graeber argues that money did NOT originate from barter (the myth Adam Smith popularized). Instead, credit and debt systems came first — people kept track of obligations long before coins existed. 📕 [Debt: The First 5000 Years, p.21, ¶2]
He traces debt back to ancient Mesopotamia (~3500 BCE), where temple administrators recorded loans in cuneiform tablets. Money as we know it (coins) only appeared around 600 BCE in Lydia. 📕 [Debt, p.40, ¶5]
Key insight: Debt is older than money. Markets emerged from moral obligations, not rational barter. 📕 [Debt, p.89, ¶1]
Sources:
- - 📕 Debt: The First 5000 Years (David Graeber) - 3 passages
- Similarity: ⭐⭐⭐⭐⭐
Hard Stops (🤚 Honest Failures)
NEVER invent answers. If system fails, STOP and tell user exactly what's wrong.
When to Stop
- 1. Metadata missing → Tell user to run INLINECODE27
- Low confidence (<75%) → Ask clarifying question
- System broken → Report error, don't guess
- No results → Say "no results", suggest alternatives
Why Hard Stops Matter
From VISION.md: "Honest incompetence > false competence"
A broken skill that TELLS you it's broken is more trustworthy than one that invents plausible-sounding nonsense.
Installation & Setup
Requirements
- - Python 3.9+
- Dependencies:
sentence-transformers, faiss-cpu, pypdf, INLINECODE31
Install
CODEBLOCK9
Index Your Library
CODEBLOCK10
Troubleshooting
"No metadata found"
- - Run
index_library.py first - Check
books/.topic-index.json exists
"No results" but book exists
- - Check topic ID matches (e.g., "chaos-magick" not "chaos magick")
- Verify book is in correct topic folder
- Try broader query terms
"System broken"
- - Check Python dependencies: INLINECODE34
- Verify research.py syntax: INLINECODE35
- Check FAISS index integrity
References
Architecture:
- - Agentic Design Patterns (Andrew Ng, 2024) - Agentic workflows
- OpenClaw skill best practices - Protocol-driven skills
Sandwich pattern:
- - 🎤 Skill = Conversational I/O (trigger, infer, format, respond)
- 👷 Wrapper = Protocol enforcement (validate, build, check)
- ⚙️ Python = Heavy lifting (embeddings, search, ranking)
Why this works:
- - AI is good at: interpreting intent, formatting output, human communication
- AI is bad at: following syntax exactly, deterministic execution
- Wrapper hardens protocol: same query → same command → same behavior
Emoji Legend
- - 🎤 = Skill (AI conversational layer)
- 👷 = Wrapper (shell script protocol)
- ⚙️ = Python (research engine)
- 🤚 = Hard stop (honest failure)
- 📕 = Book citation
- ⭐ = Relevance score
Last updated: 2026-02-20
Epic: v0.15.0 Skill as Protocol
图书管理员 - 语义研究技能
版本: 2.0.0(协议驱动)
状态: 🚧 开发中
架构: 三明治架构(🎤 技能 → 👷 封装层 → ⚙️ Python)
技能功能
使用自然语言搜索您的图书库。提出诸如格雷伯对债务有什么看法?之类的问题,并获得带有页码的精确引用。
协议流程
mermaid
flowchart TB
TRIGGER[🎤 触发 + 上下文]:::ready
TRIGGER --> METADATA[👷 加载元数据 1️⃣]:::ready
METADATA --> CHECK{👷 元数据是否存在?}:::ready
CHECK -->|否| ERROR[🎤 🤚 未找到元数据:
运行 librarian index 5️⃣]:::ready
CHECK -->|是| INFER{🎤 推断搜索范围?2️⃣}:::ready
INFER -->|置信度低于75%| CLARIFY[🎤 🤚 请再说一遍?5️⃣]:::ready
INFER -->|置信度高于75%| BUILD[👷 构建命令 3️⃣]:::ready
BUILD --> CHECK_SYSTEM{⚙️ 系统是否正常运行?}:::ready
CHECK_SYSTEM -->|否| BROKEN[🎤 🤚 系统故障 5️⃣]:::ready
CHECK_SYSTEM -->|是| EXEC[⚙️ 运行带参数的Python脚本]:::ready
EXEC --> JSON[⚙️ 返回JSON]:::ready
JSON --> CHECK_RESULTS{👷 是否找到结果?}:::ready
CHECK_RESULTS -->|否| EMPTY[🎤 🤚 未找到结果 5️⃣]:::ready
CHECK_RESULTS -->|是| FORMAT[🎤 格式化输出 4️⃣]:::ready
FORMAT --> RESPONSE[🎤 图书管理员响应]:::ready
classDef ready fill:#c8e6c9,stroke:#81c784,color:#2e7d32
状态: ✅ 所有节点就绪(v0.15.0 完成)
协议节点:
- 1. 加载元数据: 读取 .library-index.json + .topic-index.json 文件
- 推断范围: 置信度 >75% → 继续 | <75% → 请求澄清
- 构建命令: python3 research.py 查询内容 --topic 主题ID
- 格式化输出: 综合回答 + 表情符号引用 + 来源
- 🤚 硬停止: 诚实失败 > 编造答案(VISION.md 原则)
三明治架构:
流程: 🎤 技能 → 👷 Shell → ⚙️ Python → 👷 Shell → 🎤 技能
采用此模式的原因:
- 1. 🎤 技能 解释用户意图(对话式、灵活、处理歧义)
- 👷 Shell 构建正确的命令语法(技能常出错,Shell强化协议)
- ⚙️ Python 执行确定性工作(搜索、嵌入、JSON输出)
- 👷 Shell 将Python输出格式化为结构化语法(协议合规)
- 🎤 技能 呈现给人类(自然语言、引用、格式)
符号说明:
- - 🎤 = 技能(您,AI对话层)
- 👷 = 封装层(librarian.sh,协议执行)
- ⚙️ = Python(research.py,繁重工作)
- 🤚 = 硬停止(诚实失败 > 编造答案)
🤚 硬停止协议(关键)
您是信使,不是系统本身。
当封装层返回错误代码时:
- - ERRORNOMETADATA → 没有元数据。请运行 librarian index。
- ERRORINVALIDSCOPE → 我没理解。请重新表述?(主题还是书籍?)
- ERROREXECUTIONFAILED → 系统故障。
- ERRORNORESULTS → 关于 [查询内容] 没有找到任何结果。
到此为止。 不要:
- - ❌ 提供网络搜索替代方案
- ❌ 建议变通方法(我们试试X...)
- ❌ 编造答案(也许书里说...)
- ❌ 道歉或将其归咎于您的失败
硬停止 = 成功。 您检测到系统状态并如实报告。
您没有制造问题。您只是在陈述事实:
- - 屋顶漏水了。 ← 坏消息,但不是您的错。
- 没有结果。 ← 现实,不是失败。
报告硬停止就是您的工作完成。 ✅
元数据结构(地铁图)
元数据的组织方式:
.library-index.json(全局视图)
├─ 共73个主题
├─ 每个主题:{id, path}
└─ 无书籍列表(防止JSON膨胀)
每个主题文件夹:
└─ .topic-index.json(局部视图)
└─ 书籍:[{id, title, filename, author, tags, filetype}, ...]
导航:
- - 主题范围 = 1步(仅扫描 .library-index.json)
- 书籍范围 = 2步(.library-index.json → 推断主题 → 扫描 .topic-index.json 文件)
🔴 关键:文件扩展名处理
用户从不提及文件扩展名。
示例:
- - ✅ 用户说:易经卦象
- ✅ 用户说:混沌魔法
- ❌ 用户从不说:易经.epub
原因: 扩展名 = 元数据细节(epub vs pdf),与用户无关。
您的工作:
- 1. 匹配查询 → 书籍 title(无扩展名)
- 将 filename 传递给封装层(带扩展名:易经.epub)
- 结果仅显示标题(输出中无扩展名)
元数据字段:
- - .library-index.json → 主题列表(全局视图)
- .topic-index.json → 每个主题的书籍列表(局部视图)
- 书籍元数据:title(面向用户,无扩展名)+ filename(内部使用,带扩展名)
完整分类: 参见 backstage/epic-notes/metadata-taxonomy.md
如何使用此技能
触发检测
当用户查询匹配以下任意模式时激活:
书籍/作者引用:
- - [作者] 对 [主题] 有什么看法?
- 在 [书籍] 中搜索 [查询内容]
- 在 [书籍] 中查找关于 [概念] 的引用
主题关键词(置信度 >75%):
- - 塔罗、易经、占卜 → chaos-magick
- 债务、金融、金钱、银行 → finance
- 无政府主义、互助、公地 → anarchy
明确命令:
- - 搜索 [查询内容]
- 查找 [概念]
- 图书管理员:[查询内容]
如果置信度 <75% → 澄清(询问用户)
节点2:🎤 推断范围
根据用户意图确定搜索内容(主题或书籍)。
AI = 路由器。 智能在索引中(嵌入)。您只需匹配查询 → 范围。
置信度逻辑(二元)
读取元数据(.library-index.json):
json
{
books: [债务 - 最初的5000年.epub, 宇宙之道易经.epub],
topics: [chaos-magick, finance, anarchy]
}
模糊匹配查询与元数据:
| 匹配书籍? | 匹配主题? | → 操作 |
|---|
| ✅ | ✅ | 主题(决胜规则:未来混合搜索) |
| ✅ |
❌ |
书籍 |
| ❌ | ✅ |
主题 |
| ❌ | ❌ |
澄清(硬停止) |
匹配规则:
- - 书籍:查询包含书籍标题子串或作者名(不区分大小写)
- 主题:查询包含主题关键词(不区分大小写)
示例
主题胜出(决胜规则):
- - 格雷伯债务金融 → 同时匹配债务.epub + finance → 主题:finance
仅书籍:
- - 格雷伯卦象23 → 仅匹配债务.epub → 书籍:债务.epub
- 易经动爻 → 仅匹配易经.epub → 书籍:易经.epub
仅主题:
- - 混沌魔法符印 → 仅匹配chaos-magick → 主题:chaos-magick
- 互助公地 → 仅匹配anarchy → 主题:anarchy