Content Alchemy
Skill Purpose
Use this skill to transform reading input into reusable personal outcomes rather than plain summaries.
Supported input types:
- - article text
- web URLs
- extracted web text
- PDF files
- book excerpts
- long explanatory passages
Expected output shape:
- - structured notes
- key insights
- actionable next steps
- reusable takeaway
When To Use
Prefer this skill when the user wants something like:
- - "Turn this article into something I can keep"
- "Extract the useful takeaways from this page"
- "Turn this PDF into notes and actions"
- "Help me continue reading this long PDF"
- "Summarize this content, but make it more useful than a plain summary"
Differentiation Rules
Always follow these rules:
- 1. Do not treat the task as plain summarization.
- Reconstruct value, structure, and usefulness instead of merely compressing content.
- The output should feel like a saved personal artifact, not model paraphrase.
- Every result should improve at least one of these:
- easier to revisit
- easier to retain
- easier to act on
- easier to reuse
- 5. If the result still reads like a generic summary, restructure it again.
Scope and Limits
This release supports three routes:
- - INLINECODE0
- INLINECODE1
- INLINECODE2
This release does not directly handle:
- - OCR for scanned PDFs
- code analysis workflows
- pure table-first analysis
- fragmented, image-first inputs with little readable text
If text extraction fails or text quality is too low, say so clearly and recommend OCR or source text.
Script Rules
When running bundled scripts:
- - always use INLINECODE3
- prefer absolute paths from the installed skill directory
- do not assume the current working directory is the skill directory
Recommended setup:
CODEBLOCK0
Content transformation is performed directly by the model.
- - There is no
process_content_alchemy.py script. - Do not invent a hidden processing script.
- If you need a fixed output structure, use:
-
templates/result_template.md
- INLINECODE6
Input Route A: plain_text
Use this route for:
- - article bodies
- extracted web content
- excerpts
- long explanatory text
Process directly in-model using the outcome-oriented structure.
Input Route B: web_url
When the input is a URL:
- 1. Run INLINECODE7
- Extract title, site, author, publication time, and body text
- Check whether the extraction is strong enough to support transformation
- If not, explain the limit and ask for source text
Command:
CODEBLOCK1
Troubleshooting only:
CODEBLOCK2
Input Route C: pdf_file
When the input is a PDF:
- 1. Run INLINECODE8
- Determine the strategy from page count and text quality
- Use
extract_pdf_text.py for the appropriate page range - For longer PDFs, initialize or restore state and proceed segment by segment
Plan command:
CODEBLOCK3
The planning result returns:
- - INLINECODE10
- INLINECODE11
- INLINECODE12
- INLINECODE13
- INLINECODE14
- INLINECODE15
Prefer the exact returned paths and commands instead of guessing filenames.
Extract a page range:
CODEBLOCK4
Initialize or restore state:
CODEBLOCK5
Force reset only when the user explicitly wants to restart:
CODEBLOCK6
Move to the next segment:
CODEBLOCK7
Save the current segment result:
CODEBLOCK8
Build the next checkpoint package:
CODEBLOCK9
Save a checkpoint summary:
CODEBLOCK10
Show session progress:
CODEBLOCK11
Find the most recent saved PDF session:
CODEBLOCK12
PDF Routing Rules
Default routing by page count:
- -
1-40 pages -> INLINECODE17 - INLINECODE18 pages -> INLINECODE19
- INLINECODE20 pages -> INLINECODE21
- INLINECODE22 pages -> INLINECODE23
If multi-window sampling still reports low_text_pdf = true, treat the PDF as likely scanned, image-based, or low-quality text.
Session State Rules
For segmented_read, long_form_read, and book_mode:
- 1. Initialize state before the first reading step.
- Read state before continuing.
- Update state before previous / next / jump actions.
- Do not rely on chat memory alone in a new session.
- If state is missing, re-plan or re-initialize instead of pretending progress exists.
- Prefer returned commands from the planning result whenever available.
- Restore saved progress by default unless the user explicitly asks to restart.
- Save every completed segment result immediately.
- Build checkpoint source material before writing a checkpoint summary.
Existing Session Behavior
If plan_pdf_reading.py returns an existing_session:
- 1. "Continue next segment" should restore state and then move forward.
- "Resume from last position" should restore state and read the current segment without advancing.
- "Where am I?" or "reading status" should call
summarize_pdf_session.py. - Only use
--force-reset when the user explicitly wants to restart from the beginning.
In status summaries, distinguish clearly between:
- - total completed segments
- contiguous completion from the beginning
- the earliest incomplete checkpoint window
Output Structure
Default segment results should use this shape:
- 1. Source information
- Content theme
- Three core ideas
- Reconstructed structure
- Key insights
- Actionable next steps
- Reusable takeaway
For checkpoints:
- 1. stage range
- stage theme
- core findings
- reconstructed structure
- key insights
- follow-up actions or reading guidance
- reusable checkpoint takeaway
Writing Rules
- - The model writes the transformation directly.
- Write result content to a temporary markdown file first.
- Then call the correct record script to save it into the official session structure.
- Do not manually write final
segment-XXX.md or checkpoint-XXX.md files unless the record script is intentionally bypassed for debugging.
内容炼金术
技能目的
使用此技能将阅读输入转化为可复用的个人成果,而非简单的摘要。
支持的输入类型:
- - 文章正文
- 网页链接
- 提取的网页文本
- PDF文件
- 书籍摘录
- 长篇解释性段落
预期输出形式:
- - 结构化笔记
- 核心见解
- 可执行的后续步骤
- 可复用的收获
使用时机
当用户提出类似以下需求时,优先使用此技能:
- - 把这篇文章变成我能保留的东西
- 从这个页面提取有用的收获
- 把这个PDF变成笔记和行动项
- 帮我继续阅读这个长PDF
- 总结这个内容,但要让它比普通摘要更有用
区分规则
始终遵循以下规则:
- 1. 不要将任务视为单纯的摘要。
- 重构价值、结构和实用性,而非仅仅压缩内容。
- 输出应感觉像保存的个人成果,而非模型的转述。
- 每个结果应至少改善以下一项:
- 更易于回顾
- 更易于记忆
- 更易于行动
- 更易于复用
- 5. 如果结果读起来仍像通用摘要,则重新构建。
范围与限制
本版本支持三种输入路径:
- - plaintext(纯文本)
- weburl(网页链接)
- pdf_file(PDF文件)
本版本不直接处理:
- - 扫描版PDF的OCR识别
- 代码分析工作流
- 纯表格优先的分析
- 碎片化、以图片为主且可读文本较少的输入
如果文本提取失败或文本质量过低,需明确说明情况并建议使用OCR或提供源文本。
脚本规则
运行捆绑脚本时:
- - 始终使用 python3
- 优先使用已安装技能目录的绝对路径
- 不要假设当前工作目录就是技能目录
推荐配置:
bash
SKILL_ROOT=$HOME/.claude/skills/content-alchemy
内容转换由模型直接执行。
- - 不存在 processcontentalchemy.py 脚本。
- 不要虚构隐藏的处理脚本。
- 如需固定输出结构,请使用:
- templates/result_template.md
- templates/checkpoint_template.md
输入路径A:纯文本
适用于以下输入:
直接在模型中使用面向结果的输出结构进行处理。
输入路径B:网页链接
当输入为链接时:
- 1. 运行 extractwebtext.py
- 提取标题、网站、作者、发布时间和正文
- 检查提取结果是否足够支持转换
- 如不足,说明限制并请求提供源文本
命令:
bash
python3 $SKILLROOT/scripts/extractweb_text.py https://example.com/article
仅用于故障排查:
bash
python3 $SKILLROOT/scripts/extractweb_text.py https://example.com/article --insecure
输入路径C:PDF文件
当输入为PDF时:
- 1. 运行 planpdfreading.py
- 根据页数和文本质量确定策略
- 使用 extractpdftext.py 提取相应页码范围
- 对于较长的PDF,初始化或恢复状态并逐段处理
规划命令:
bash
python3 $SKILLROOT/scripts/planpdf_reading.py /path/to/file.pdf
规划结果返回:
- - sessionroot(会话根目录)
- planfile(规划文件)
- statefile(状态文件)
- commands(命令)
- segmentresultsdir(分段结果目录)
- checkpointresults_dir(检查点结果目录)
优先使用返回的确切路径和命令,而非猜测文件名。
提取页码范围:
bash
python3 $SKILLROOT/scripts/extractpdf_text.py /path/to/file.pdf --page-start 1 --page-end 5
初始化或恢复状态:
bash
python3 $SKILLROOT/scripts/updatepdfsessionstate.py init --plan-file <返回的planfile> --state-file <返回的statefile>
仅在用户明确要求重新开始时强制重置:
bash
python3 $SKILLROOT/scripts/updatepdfsessionstate.py init --plan-file <返回的planfile> --state-file <返回的statefile> --force-reset
移动到下一段:
bash
python3 $SKILLROOT/scripts/updatepdfsessionstate.py next --state-file <返回的state_file>
保存当前分段结果:
bash
python3 $SKILLROOT/scripts/recordpdfsegmentresult.py --state-file <返回的state_file> --content-file /path/to/segment-result.md
构建下一个检查点包:
bash
python3 $SKILLROOT/scripts/buildpdfcheckpoint.py --state-file <返回的statefile>
保存检查点摘要:
bash
python3 $SKILLROOT/scripts/recordpdfcheckpoint.py --state-file <返回的statefile> --content-file /path/to/checkpoint-summary.md
显示会话进度:
bash
python3 $SKILLROOT/scripts/summarizepdfsession.py --state-file <返回的statefile>
查找最近保存的PDF会话:
bash
python3 $SKILLROOT/scripts/findrecentpdfsession.py
PDF路径规则
按页数的默认路径:
- - 1-40 页 -> singlepass(单次通读)
- 41-150 页 -> segmentedread(分段阅读)
- 151-400 页 -> longformread(长文阅读)
- 401+ 页 -> book_mode(书籍模式)
如果多次窗口采样仍报告 lowtextpdf = true,则将该PDF视为可能为扫描版、基于图片或低质量文本。
会话状态规则
对于 segmentedread、longformread 和 bookmode:
- 1. 在首次阅读步骤前初始化状态。
- 继续前先读取状态。
- 在执行上一步/下一步/跳转操作前更新状态。
- 在新会话中不要仅依赖聊天记忆。
- 如果状态丢失,重新规划或重新初始化,而非假装存在进度。
- 尽可能优先使用规划结果返回的命令。
- 默认恢复已保存的进度,除非用户明确要求重新开始。
- 立即保存每个已完成的分段结果。
- 在编写检查点摘要前先构建检查点源材料。
现有会话行为
如果 planpdfreading.py 返回 existing_session(现有会话):
- 1. 继续下一段应恢复状态然后向前推进。
- 从上次位置继续应恢复状态并读取当前段,但不推进。
- 我在哪里?或阅读状态应调用 summarizepdfsession.py。
- 仅在用户明确要求从头重新开始时使用 --force-reset。
在状态摘要中,清晰区分:
- - 已完成的总分段数
- 从头开始的连续完成情况
- 最早未完成的检查点窗口
输出结构
默认分段结果应使用以下结构:
- 1. 来源信息
- 内容主题
- 三个核心观点
- 重构的结构
- 关键见解
- 可执行的后续步骤
- 可复用的收获
检查点结构:
- 1. 阶段范围
- 阶段主题
- 核心发现
- 重构的结构
- 关键见解
- 后续行动或阅读指导
- 可复用的检查点收获
写作规则
- - 模型直接执行转换。
- 先将结果内容写入临时markdown文件。
- 然后调用正确的记录脚本将其保存到官方会话结构中。
- 除非出于调试目的有意绕过记录脚本,否则不要手动编写最终的 segment-XXX.md 或 checkpoint-XXX.md 文件。