epub-read

Provide a strict, auditable EPUB workflow that safely handles long books through explicit task routing instead of loading full-book text by default.

- The user mentions an .epub file or ebook
The user wants a quick structural overview
The user wants chapter-specific or chunk-specific reading
The user wants full-book sequential reading with chunking
The user wants structured extraction
The user wants to inspect images, tables, or other complex content
The user wants to batch-process multiple EPUB files

STEP 0 - Choose exactly one task mode before doing anything else

Mode	Purpose	Use when
INLINECODE1	Fast structural overview	Metadata, TOC, themes, structure only
INLINECODE2

Default to overview or targeted_read when the user intent is ambiguous. Never load a long book's full body text by default.

STEP 1 - Parse if needed

1. Check whether the output directory already exists and contains manifest.json.
If not, run parse_epub.py.
After parsing, report:

- title - author - chapter count - chunk count if available - image count - table count - output directory

STEP 2 - Build an execution plan

Use task_router.py to decide whether parsing, chunking, or state updates are required:

CODEBLOCK0

The plan should tell you:

- whether parsing is required
whether chunking is required
which files are recommended to read
whether session state must be updated

STEP 3 - Mode-specific behavior

overview

- Read only metadata, TOC, reading index, and other structural outputs
Do not load the whole book body by default
Return:

- title - author - chapter count - TOC structure - theme overview - suggested next actions

targeted_read

Support:

- INLINECODE12
INLINECODE13
INLINECODE14
INLINECODE15
INLINECODE16

Return:

- the requested section
short context
concise summary

full_read

- Prefer chunk-based reading for long books
Support continue, previous, next, and jump flows
Always update INLINECODE17
Never pretend progress exists if the session state is missing

extract

Support extracting:

- keywords
definitions
quotes
examples
action_items
names
locations
organizations
tables
lists

Return a hit list with chapter references and short context.

complex_content

Inspect:

- images
SVG
tables
image-heavy sections
low-text / high-resource sections

Return a structured report. OCR is not required by default.

batch

Support:

- multiple EPUB file paths
directory scanning
batch planning
batch extraction requests

Return success / failure counts and a concise overview.

STEP 4 - Long-book safety rules

- Never push the full body of a long book into context at once
Prefer chunks/ over chapter markdown for full sequential reading
When chunking is required, run chunk_book.py first
Use reading_index.json to map chapters to chunk ranges

STEP 5 - State management rules

When running full_read or any progress-sensitive flow:

1. Read session_state.json first
Update it after every progress-changing action
Respect existing saved progress unless the user explicitly asks to restart

STEP 6 - Output style

Be explicit about:

- what files were used
what mode was selected
why a long book was chunked instead of loaded fully
what the user can do next

When possible, point the user toward the safest next step:

- continue reading
jump to a chapter
inspect a chunk range
extract a structure
review complex content

Before considering the task complete, check:

- parsing outputs exist
chunk files exist when required
reading index and session state are coherent
extraction targets match the requested type
complex-content reports are generated from real parsed outputs

技能名称: epub-read
详细描述:

提供一个严格、可审计的EPUB工作流程，通过明确的任务路由安全处理长篇幅书籍，默认情况下不加载全书文本。

- 用户提及.epub文件或电子书
用户希望快速了解结构概览
用户希望按章节或按块进行阅读
用户希望分块顺序阅读全书
用户希望进行结构化提取
用户希望查看图片、表格或其他复杂内容
用户希望批量处理多个EPUB文件

步骤0 - 在执行任何操作前，先选择一种任务模式

模式	用途	使用场景
overview	快速结构概览	仅需元数据、目录、主题、结构
targeted_read

当用户意图不明确时，默认使用overview或targeted_read模式。默认情况下，切勿加载长篇幅书籍的完整正文。

步骤1 - 必要时进行解析

1. 检查输出目录是否已存在并包含manifest.json。
如果不存在，运行parse_epub.py。
解析后，报告：

- 标题 - 作者 - 章节数量 - 块数量（如有） - 图片数量 - 表格数量 - 输出目录

步骤2 - 制定执行计划

使用task_router.py判断是否需要解析、分块或更新状态：

bash
python3 taskrouter.py dir> --mode [params...]

该计划应告知你：

- 是否需要解析
是否需要分块
建议读取哪些文件
是否需要更新会话状态

步骤3 - 模式特定行为

overview

- 仅读取元数据、目录、阅读索引及其他结构输出
默认不加载全书正文
返回：
- 标题 - 作者 - 章节数量 - 目录结构 - 主题概览 - 建议的后续操作
targeted_read

支持：

- --chapter
--chapter-id
--chunk-start
--chunk-end
--keyword

返回：

- 请求的章节
简短上下文
简明摘要

full_read

- 长篇幅书籍优先采用基于块的阅读方式
支持继续、上一页、下一页和跳转流程
始终更新session_state.json
如果会话状态缺失，切勿假装存在进度

extract

支持提取：

- 关键词
定义
引文
示例
行动项
人名
地点
组织
表格
列表

返回包含章节引用和简短上下文的命中列表。

complex_content

检查：

- 图片
SVG
表格
图片密集区域
低文本/高资源区域

返回结构化报告。默认情况下不需要OCR。

batch

支持：

- 多个EPUB文件路径
目录扫描
批量规划
批量提取请求

返回成功/失败计数和简明概览。

步骤4 - 长篇幅书籍安全规则

- 切勿一次性将长篇幅书籍的完整正文推入上下文
对于完整的顺序阅读，优先使用chunks/而非章节Markdown
当需要分块时，先运行chunkbook.py
使用readingindex.json将章节映射到块范围

步骤5 - 状态管理规则

当运行full_read或任何与进度相关的流程时：

1. 首先读取session_state.json
每次改变进度的操作后更新该文件
除非用户明确要求重新开始，否则尊重已保存的现有进度

步骤6 - 输出风格

明确说明：

- 使用了哪些文件
选择了哪种模式
为什么长篇幅书籍被分块而非完整加载
用户接下来可以做什么

尽可能引导用户采取最安全的下一步操作：

- 继续阅读
跳转到某个章节
检查某个块范围
提取某种结构
查看复杂内容

在认为任务完成之前，请检查：

- 解析输出是否存在
需要时块文件是否存在
阅读索引和会话状态是否一致
提取目标是否与请求类型匹配
复杂内容报告是否基于实际解析输出生成