Librarian - Semantic Research Skill

Version: 2.0.0 (Protocol-driven)
Status: 🚧 Development
Architecture: Sandwich (🎤 Skill → 👷 Wrapper → ⚙️ Python)

What This Skill Does

Search your book library using natural language. Ask questions like "What does Graeber say about debt?" and get precise citations with page numbers.

Protocol Flow

CODEBLOCK0

Status: ✅ All nodes ready (v0.15.0 complete)

Protocol Nodes:

1. Load Metadata: Reads .library-index.json + .topic-index.json files
Infer Scope: Confidence >75% → proceed | <75% → ask clarification
Build Command: INLINECODE2
Format Output: Synthesized answer + emoji citations + sources
🤚 Hard Stop: Honest failure > invented answer (VISION.md principle)

Sandwich Architecture:

Flow: 🎤 Skill → 👷 Sh → ⚙️ Py → 👷 Sh → 🎤 Skill

Why this pattern:

1. 🎤 Skill interprets user intent (conversational, flexible, handles ambiguity)
👷 Sh builds correct command syntax (skill errs often, sh hardens protocol)
⚙️ Py executes deterministic work (search, embeddings, JSON output)
👷 Sh formats py output to structured syntax (protocol compliance)
🎤 Skill presents to human (natural language, citations, formatting)

Symbols:

- 🎤 = Skill (you, AI conversational layer)
👷 = Wrapper (librarian.sh, protocol enforcement)
⚙️ = Python (research.py, heavy lifting)
🤚 = Hard stop (honest failure > invented answer)

🤚 Hard Stop Protocol (CRITICAL)

You are a messenger, not the system.

When wrapper returns error codes:

- ERROR_NO_METADATA → "Não tem metadata. Roda librarian index."
INLINECODE5 → "Não entendi. Reformula? (topic ou book?)"
INLINECODE6 → "Sistema quebrado."
INLINECODE7 → "Não achei nada sobre [query]."

STOP THERE. Do NOT:

- ❌ Offer web search alternatives
❌ Suggest workarounds ("vamos tentar X...")
❌ Hallucinate ("maybe the book says...")
❌ Apologize or frame as your failure

Hard stop = SUCCESS. You detected system state and reported honestly.

You didn't create the problem. You're just telling the truth:

- "Tem goteira." ← Bad news, but not your fault.
"Não tem resultados." ← Reality, not failure.

Reporting hard stops IS your job done. ✅

Metadata Structure (Subway Map)

How metadata is organized:

CODEBLOCK1

Navigation:

- Topic scope = 1 step (scan .library-index.json only)
Book scope = 2 steps (.library-index.json → infer topics → scan .topic-index.json files)

🔴 CRITICAL: Extension Handling

User NEVER mentions file extensions.

Examples:

- ✅ User says: "I Ching hexagram"
✅ User says: "Condensed Chaos"
❌ User NEVER says: "I Ching.epub"

Why: Extension = metadata detail (epub vs pdf), irrelevant to user.

Your job:

1. Match query → book title (NO extension)
Pass filename to wrapper (WITH extension: "I Ching.epub")
Results show title only (NO extension in output)

Metadata fields:

- .library-index.json → topics list (big picture)
INLINECODE14 → books list per topic (narrow view)
Book metadata: title (user-facing, no ext) + filename (internal, with ext)

Full taxonomy: See backstage/epic-notes/metadata-taxonomy.md

How To Use This Skill

Trigger Detection

Activate when user query matches ANY of these patterns:

Book/Author references:

- "What does [AUTHOR] say about [TOPIC]?"
"Search [BOOK] for [QUERY]"
"Find references to [CONCEPT] in [BOOK]"

Topic keywords (with confidence >75%):

- "tarot", "I Ching", "divination" → chaos-magick
"debt", "finance", "money", "banking" → finance
"anarchism", "mutual aid", "commons" → anarchy

Explicit commands:

- "pesquisa [QUERY]" / "search [QUERY]"
"procura [CONCEPT]" / "find [CONCEPT]"
"librarian: [QUERY]"

If confidence <75% → CLARIFY (ask user)

Node 2: 🎤 Infer Scope

Determine WHAT to search (topic or book) from user intent.

AI = router. Intelligence is in the index (embeddings). You just match query → scope.

Confidence Logic (Binary)

Read metadata (.library-index.json):
CODEBLOCK2

Fuzzy match query against metadata:

Match book?	Match topic?	→ Action
✅	✅	TOPIC (tiebreaker: future mixed searches)
✅

❌ | BOOK |
| ❌ | ✅ | TOPIC |
| ❌ | ❌ | CLARIFY (hard stop) |

Match rules:

- Book: Query contains book title substring OR author name (case-insensitive)
Topic: Query contains topic keyword (case-insensitive)

Examples

TOPIC wins (tiebreaker):

- "Graeber debt finance" → matches both "Debt.epub" + "finance" → TOPIC: finance

BOOK only:

- "Graeber hexagram 23" → matches "Debt.epub" only → BOOK: Debt.epub
"I Ching moving lines" → matches "I Ching.epub" only → BOOK: I Ching.epub

TOPIC only:

- "chaos magick sigils" → matches "chaos-magick" only → TOPIC: chaos-magick
"mutual aid commons" → matches "anarchy" only → TOPIC: anarchy

CLARIFY (no match):

- "philosophy" → no match → CLARIFY: "Search which topic or book?"
"systems" → no match → CLARIFY: "Need more context - which area?"

Scope Types

1. Topic scope: INLINECODE19

- Available topics: chaos-magick, finance, anarchy (check .topic-index.json)

2. Book scope: INLINECODE20

- Requires exact filename (e.g., "Condensed Chaos.epub") - Use fuzzy matching: "Condensed" → "Condensed Chaos.epub"

Node 3-5: 👷 Call Wrapper

Execute wrapper script with inferred scope:

CODEBLOCK3

Arguments:

- QUERY: User's search query (exact string)
INLINECODE22: "topic" or "book"
INLINECODE23: topic_id or book filename
INLINECODE24: Number of results (default: 5)

Example calls:

CODEBLOCK4

Wrapper Exit Codes

The wrapper returns structured status via exit codes:

- 0: Success (JSON results on stdout)
1: ERRORNOMETADATA (🤚 stop: tell user to run librarian index)
2: ERRORBROKEN (🤚 stop: system issue, report to Nicholas)
3: ERRORNO_RESULTS (🤚 stop: query returned 0 results)

Handle Each Error

Exit 1 (NO_METADATA):
CODEBLOCK5

Exit 2 (BROKEN):
CODEBLOCK6

Exit 3 (NO_RESULTS):

🤚 No results found for "[QUERY]"

Try:
- Broader terms (e.g., "debt" instead of "sovereign debt crisis")
- Different scope (search topic instead of single book?)
- Check spelling

Node 4: 🎤 Format Output

When wrapper returns success (exit 0), format the JSON results for the user.

JSON Structure

CODEBLOCK8

Formatting Rules

1. Synthesize answer (don't just list chunks)
- Read all results
- Extract key points
- Write coherent paragraph(s)

2. Cite sources with emojis
- 📕 = book citation
- Use location (page/paragraph) when available
- Format: INLINECODE26

3. Show similarity scores (optional, if useful)
- ⭐⭐⭐⭐⭐ (0.9-1.0) = Highly relevant
- ⭐⭐⭐⭐ (0.8-0.89) = Very relevant
- ⭐⭐⭐ (0.7-0.79) = Relevant

4. Keep original query context
- If user asked about debt, frame answer around debt
- Don't drift to tangential topics

Example Output

User: "What does Graeber say about the origins of money?"

Librarian:

Graeber argues that money did NOT originate from barter (the myth Adam Smith popularized). Instead, credit and debt systems came first — people kept track of obligations long before coins existed. 📕 [Debt: The First 5000 Years, p.21, ¶2]
He traces debt back to ancient Mesopotamia (~3500 BCE), where temple administrators recorded loans in cuneiform tablets. Money as we know it (coins) only appeared around 600 BCE in Lydia. 📕 [Debt, p.40, ¶5]
Key insight: Debt is older than money. Markets emerged from moral obligations, not rational barter. 📕 [Debt, p.89, ¶1]
Sources:

- 📕 Debt: The First 5000 Years (David Graeber) - 3 passages
Similarity: ⭐⭐⭐⭐⭐

Hard Stops (🤚 Honest Failures)

NEVER invent answers. If system fails, STOP and tell user exactly what's wrong.

When to Stop

1. Metadata missing → Tell user to run INLINECODE27
Low confidence (<75%) → Ask clarifying question
System broken → Report error, don't guess
No results → Say "no results", suggest alternatives

Why Hard Stops Matter

From VISION.md: "Honest incompetence > false competence"

A broken skill that TELLS you it's broken is more trustworthy than one that invents plausible-sounding nonsense.

Installation & Setup

Requirements

- Python 3.9+
Dependencies: sentence-transformers, faiss-cpu, pypdf, INLINECODE31

Install

CODEBLOCK9

Index Your Library

CODEBLOCK10

Troubleshooting

"No metadata found"

- Run index_library.py first
Check books/.topic-index.json exists

"No results" but book exists

- Check topic ID matches (e.g., "chaos-magick" not "chaos magick")
Verify book is in correct topic folder
Try broader query terms

"System broken"

- Check Python dependencies: INLINECODE34
Verify research.py syntax: INLINECODE35
Check FAISS index integrity

References

Architecture:

- Agentic Design Patterns (Andrew Ng, 2024) - Agentic workflows
OpenClaw skill best practices - Protocol-driven skills

Sandwich pattern:

- 🎤 Skill = Conversational I/O (trigger, infer, format, respond)
👷 Wrapper = Protocol enforcement (validate, build, check)
⚙️ Python = Heavy lifting (embeddings, search, ranking)

Why this works:

- AI is good at: interpreting intent, formatting output, human communication
AI is bad at: following syntax exactly, deterministic execution
Wrapper hardens protocol: same query → same command → same behavior

Emoji Legend

- 🎤 = Skill (AI conversational layer)
👷 = Wrapper (shell script protocol)
⚙️ = Python (research engine)
🤚 = Hard stop (honest failure)
📕 = Book citation
⭐ = Relevance score

Last updated: 2026-02-20 Epic: v0.15.0 Skill as Protocol

图书管理员 - 语义研究技能

版本： 2.0.0（协议驱动）
状态： 🚧 开发中
架构： 三明治架构（🎤 技能 → 👷 封装层 → ⚙️ Python）

技能功能

使用自然语言搜索您的图书库。提出诸如格雷伯对债务有什么看法？之类的问题，并获得带有页码的精确引用。

协议流程

mermaid
flowchart TB
TRIGGER[🎤 触发 + 上下文]:::ready
TRIGGER --> METADATA[👷 加载元数据 1️⃣]:::ready
METADATA --> CHECK{👷 元数据是否存在？}:::ready

CHECK -->|否| ERROR[🎤 🤚 未找到元数据：
运行 librarian index 5️⃣]:::ready
CHECK -->|是| INFER{🎤 推断搜索范围？2️⃣}:::ready

INFER -->|置信度低于75%| CLARIFY[🎤 🤚 请再说一遍？5️⃣]:::ready
INFER -->|置信度高于75%| BUILD[👷 构建命令 3️⃣]:::ready

BUILD --> CHECK_SYSTEM{⚙️ 系统是否正常运行？}:::ready

CHECK_SYSTEM -->|否| BROKEN[🎤 🤚 系统故障 5️⃣]:::ready
CHECK_SYSTEM -->|是| EXEC[⚙️ 运行带参数的Python脚本]:::ready

EXEC --> JSON[⚙️ 返回JSON]:::ready
JSON --> CHECK_RESULTS{👷 是否找到结果？}:::ready

CHECK_RESULTS -->|否| EMPTY[🎤 🤚 未找到结果 5️⃣]:::ready
CHECK_RESULTS -->|是| FORMAT[🎤 格式化输出 4️⃣]:::ready

FORMAT --> RESPONSE[🎤 图书管理员响应]:::ready

classDef ready fill:#c8e6c9,stroke:#81c784,color:#2e7d32

状态： ✅ 所有节点就绪（v0.15.0 完成）

协议节点：

1. 加载元数据： 读取 .library-index.json + .topic-index.json 文件
推断范围： 置信度 >75% → 继续 | <75% → 请求澄清
构建命令： python3 research.py 查询内容 --topic 主题ID
格式化输出： 综合回答 + 表情符号引用 + 来源
🤚 硬停止： 诚实失败 > 编造答案（VISION.md 原则）

三明治架构：

流程： 🎤 技能 → 👷 Shell → ⚙️ Python → 👷 Shell → 🎤 技能

采用此模式的原因：

1. 🎤 技能 解释用户意图（对话式、灵活、处理歧义）
👷 Shell 构建正确的命令语法（技能常出错，Shell强化协议）
⚙️ Python 执行确定性工作（搜索、嵌入、JSON输出）
👷 Shell 将Python输出格式化为结构化语法（协议合规）
🎤 技能 呈现给人类（自然语言、引用、格式）

符号说明：

- 🎤 = 技能（您，AI对话层）
👷 = 封装层（librarian.sh，协议执行）
⚙️ = Python（research.py，繁重工作）
🤚 = 硬停止（诚实失败 > 编造答案）

🤚 硬停止协议（关键）

您是信使，不是系统本身。

当封装层返回错误代码时：

- ERRORNOMETADATA → 没有元数据。请运行 librarian index。
ERRORINVALIDSCOPE → 我没理解。请重新表述？（主题还是书籍？）
ERROREXECUTIONFAILED → 系统故障。
ERRORNORESULTS → 关于 [查询内容] 没有找到任何结果。

到此为止。 不要：

- ❌ 提供网络搜索替代方案
❌ 建议变通方法（我们试试X...）
❌ 编造答案（也许书里说...）
❌ 道歉或将其归咎于您的失败

硬停止 = 成功。 您检测到系统状态并如实报告。

您没有制造问题。您只是在陈述事实：

- 屋顶漏水了。 ← 坏消息，但不是您的错。
没有结果。 ← 现实，不是失败。

报告硬停止就是您的工作完成。 ✅

元数据结构（地铁图）

元数据的组织方式：

.library-index.json（全局视图）
├─ 共73个主题
├─ 每个主题：{id, path}
└─ 无书籍列表（防止JSON膨胀）

每个主题文件夹：
└─ .topic-index.json（局部视图）
└─ 书籍：[{id, title, filename, author, tags, filetype}, ...]

导航：

- 主题范围 = 1步（仅扫描 .library-index.json）
书籍范围 = 2步（.library-index.json → 推断主题 → 扫描 .topic-index.json 文件）

🔴 关键：文件扩展名处理

用户从不提及文件扩展名。

示例：

- ✅ 用户说：易经卦象
✅ 用户说：混沌魔法
❌ 用户从不说：易经.epub

原因： 扩展名 = 元数据细节（epub vs pdf），与用户无关。

您的工作：

1. 匹配查询 → 书籍 title（无扩展名）
将 filename 传递给封装层（带扩展名：易经.epub）
结果仅显示标题（输出中无扩展名）

元数据字段：

- .library-index.json → 主题列表（全局视图）
.topic-index.json → 每个主题的书籍列表（局部视图）
书籍元数据：title（面向用户，无扩展名）+ filename（内部使用，带扩展名）

完整分类： 参见 backstage/epic-notes/metadata-taxonomy.md

如何使用此技能

触发检测

当用户查询匹配以下任意模式时激活：

书籍/作者引用：

- [作者] 对 [主题] 有什么看法？
在 [书籍] 中搜索 [查询内容]
在 [书籍] 中查找关于 [概念] 的引用

主题关键词（置信度 >75%）：

- 塔罗、易经、占卜 → chaos-magick
债务、金融、金钱、银行 → finance
无政府主义、互助、公地 → anarchy

明确命令：

- 搜索 [查询内容]
查找 [概念]
图书管理员：[查询内容]

如果置信度 <75% → 澄清（询问用户）

节点2：🎤 推断范围

根据用户意图确定搜索内容（主题或书籍）。

AI = 路由器。 智能在索引中（嵌入）。您只需匹配查询 → 范围。

置信度逻辑（二元）

读取元数据（.library-index.json）：
json
{
books: [债务 - 最初的5000年.epub, 宇宙之道易经.epub],
topics: [chaos-magick, finance, anarchy]
}

模糊匹配查询与元数据：

匹配书籍？	匹配主题？	→ 操作
✅	✅	主题（决胜规则：未来混合搜索）
✅

❌ | 书籍 |
| ❌ | ✅ | 主题 |
| ❌ | ❌ | 澄清（硬停止） |

匹配规则：

- 书籍：查询包含书籍标题子串或作者名（不区分大小写）
主题：查询包含主题关键词（不区分大小写）

示例

主题胜出（决胜规则）：

- 格雷伯债务金融 → 同时匹配债务.epub + finance → 主题：finance

仅书籍：

- 格雷伯卦象23 → 仅匹配债务.epub → 书籍：债务.epub
易经动爻 → 仅匹配易经.epub → 书籍：易经.epub

仅主题：

- 混沌魔法符印 → 仅匹配chaos-magick → 主题：chaos-magick
互助公地 → 仅匹配anarchy → 主题：anarchy

librarian图书管理员

librarian

Librarian - Semantic Research Skill

What This Skill Does

Protocol Flow

🤚 Hard Stop Protocol (CRITICAL)

Metadata Structure (Subway Map)

How To Use This Skill

Trigger Detection

Node 2: 🎤 Infer Scope

Confidence Logic (Binary)

Examples

Scope Types

Node 3-5: 👷 Call Wrapper

Wrapper Exit Codes

Handle Each Error

Node 4: 🎤 Format Output

JSON Structure

Formatting Rules

Example Output

Hard Stops (🤚 Honest Failures)

When to Stop

Why Hard Stops Matter

Installation & Setup

Requirements

Install

Index Your Library

Troubleshooting

References

Emoji Legend

图书管理员 - 语义研究技能

技能功能

协议流程

🤚 硬停止协议（关键）

元数据结构（地铁图）

如何使用此技能

触发检测

节点2：🎤 推断范围

置信度逻辑（二元）

示例

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement