Provide a strict, auditable EPUB workflow that safely handles long books through explicit task routing instead of loading full-book text by default.
- - The user mentions an
.epub file or ebook - The user wants a quick structural overview
- The user wants chapter-specific or chunk-specific reading
- The user wants full-book sequential reading with chunking
- The user wants structured extraction
- The user wants to inspect images, tables, or other complex content
- The user wants to batch-process multiple EPUB files
STEP 0 - Choose exactly one task mode before doing anything else
| Mode | Purpose | Use when |
|---|
| INLINECODE1 | Fast structural overview | Metadata, TOC, themes, structure only |
| INLINECODE2 |
Focused reading | Specific chapters, chunk ranges, or keyword hits |
|
full_read | Sequential reading | Long-book chunked reading with saved progress |
|
extract | Structured extraction | Keywords, definitions, quotes, examples, action items, entities, tables, lists |
|
complex_content | Complex-layout inspection | Images, tables, SVG, low-text sections |
|
batch | Multi-book planning | Multiple EPUB files or folders |
Default to overview or targeted_read when the user intent is ambiguous. Never load a long book's full body text by default.
STEP 1 - Parse if needed
- 1. Check whether the output directory already exists and contains
manifest.json. - If not, run
parse_epub.py. - After parsing, report:
- title
- author
- chapter count
- chunk count if available
- image count
- table count
- output directory
STEP 2 - Build an execution plan
Use task_router.py to decide whether parsing, chunking, or state updates are required:
CODEBLOCK0
The plan should tell you:
- - whether parsing is required
- whether chunking is required
- which files are recommended to read
- whether session state must be updated
STEP 3 - Mode-specific behavior
overview
- - Read only metadata, TOC, reading index, and other structural outputs
- Do not load the whole book body by default
- Return:
- title
- author
- chapter count
- TOC structure
- theme overview
- suggested next actions
targeted_read
Support:
- - INLINECODE12
- INLINECODE13
- INLINECODE14
- INLINECODE15
- INLINECODE16
Return:
- - the requested section
- short context
- concise summary
full_read
- - Prefer chunk-based reading for long books
- Support continue, previous, next, and jump flows
- Always update INLINECODE17
- Never pretend progress exists if the session state is missing
extract
Support extracting:
- - keywords
- definitions
- quotes
- examples
- action_items
- names
- locations
- organizations
- tables
- lists
Return a hit list with chapter references and short context.
complex_content
Inspect:
- - images
- SVG
- tables
- image-heavy sections
- low-text / high-resource sections
Return a structured report. OCR is not required by default.
batch
Support:
- - multiple EPUB file paths
- directory scanning
- batch planning
- batch extraction requests
Return success / failure counts and a concise overview.
STEP 4 - Long-book safety rules
- - Never push the full body of a long book into context at once
- Prefer
chunks/ over chapter markdown for full sequential reading - When chunking is required, run
chunk_book.py first - Use
reading_index.json to map chapters to chunk ranges
STEP 5 - State management rules
When running full_read or any progress-sensitive flow:
- 1. Read
session_state.json first - Update it after every progress-changing action
- Respect existing saved progress unless the user explicitly asks to restart
STEP 6 - Output style
Be explicit about:
- - what files were used
- what mode was selected
- why a long book was chunked instead of loaded fully
- what the user can do next
When possible, point the user toward the safest next step:
- - continue reading
- jump to a chapter
- inspect a chunk range
- extract a structure
- review complex content
Before considering the task complete, check:
- - parsing outputs exist
- chunk files exist when required
- reading index and session state are coherent
- extraction targets match the requested type
- complex-content reports are generated from real parsed outputs
技能名称: epub-read
详细描述:
提供一个严格、可审计的EPUB工作流程,通过明确的任务路由安全处理长篇幅书籍,默认情况下不加载全书文本。
- - 用户提及.epub文件或电子书
- 用户希望快速了解结构概览
- 用户希望按章节或按块进行阅读
- 用户希望分块顺序阅读全书
- 用户希望进行结构化提取
- 用户希望查看图片、表格或其他复杂内容
- 用户希望批量处理多个EPUB文件
步骤0 - 在执行任何操作前,先选择一种任务模式
| 模式 | 用途 | 使用场景 |
|---|
| overview | 快速结构概览 | 仅需元数据、目录、主题、结构 |
| targeted_read |
定向阅读 | 特定章节、块范围或关键词命中 |
| full_read | 顺序阅读 | 长篇幅书籍分块阅读并保存进度 |
| extract | 结构化提取 | 关键词、定义、引文、示例、行动项、实体、表格、列表 |
| complex_content | 复杂布局检查 | 图片、表格、SVG、低文本区域 |
| batch | 多书规划 | 多个EPUB文件或文件夹 |
当用户意图不明确时,默认使用overview或targeted_read模式。默认情况下,切勿加载长篇幅书籍的完整正文。
步骤1 - 必要时进行解析
- 1. 检查输出目录是否已存在并包含manifest.json。
- 如果不存在,运行parse_epub.py。
- 解析后,报告:
- 标题
- 作者
- 章节数量
- 块数量(如有)
- 图片数量
- 表格数量
- 输出目录
步骤2 - 制定执行计划
使用task_router.py判断是否需要解析、分块或更新状态:
bash
python3 taskrouter.py dir> --mode [params...]
该计划应告知你:
- - 是否需要解析
- 是否需要分块
- 建议读取哪些文件
- 是否需要更新会话状态
步骤3 - 模式特定行为
overview
- - 仅读取元数据、目录、阅读索引及其他结构输出
- 默认不加载全书正文
- 返回:
- 标题
- 作者
- 章节数量
- 目录结构
- 主题概览
- 建议的后续操作
targeted_read
支持:
- - --chapter
- --chapter-id
- --chunk-start
- --chunk-end
- --keyword
返回:
full_read
- - 长篇幅书籍优先采用基于块的阅读方式
- 支持继续、上一页、下一页和跳转流程
- 始终更新session_state.json
- 如果会话状态缺失,切勿假装存在进度
extract
支持提取:
返回包含章节引用和简短上下文的命中列表。
complex_content
检查:
返回结构化报告。默认情况下不需要OCR。
batch
支持:
- - 多个EPUB文件路径
- 目录扫描
- 批量规划
- 批量提取请求
返回成功/失败计数和简明概览。
步骤4 - 长篇幅书籍安全规则
- - 切勿一次性将长篇幅书籍的完整正文推入上下文
- 对于完整的顺序阅读,优先使用chunks/而非章节Markdown
- 当需要分块时,先运行chunkbook.py
- 使用readingindex.json将章节映射到块范围
步骤5 - 状态管理规则
当运行full_read或任何与进度相关的流程时:
- 1. 首先读取session_state.json
- 每次改变进度的操作后更新该文件
- 除非用户明确要求重新开始,否则尊重已保存的现有进度
步骤6 - 输出风格
明确说明:
- - 使用了哪些文件
- 选择了哪种模式
- 为什么长篇幅书籍被分块而非完整加载
- 用户接下来可以做什么
尽可能引导用户采取最安全的下一步操作:
- - 继续阅读
- 跳转到某个章节
- 检查某个块范围
- 提取某种结构
- 查看复杂内容
在认为任务完成之前,请检查:
- - 解析输出是否存在
- 需要时块文件是否存在
- 阅读索引和会话状态是否一致
- 提取目标是否与请求类型匹配
- 复杂内容报告是否基于实际解析输出生成