Content Extraction — Executable Skill
This skill is the local executable version. It keeps the source-aware routing design and restores a concrete extraction workflow.
What it does
- - Detects the input source
- Selects the best extraction channel
- Produces clean Markdown
- Saves long content locally when needed
- Explains fallback failures instead of hiding them
Main entrypoints
- -
scripts/extract_router.py — classify input and build a route plan - INLINECODE1 — generate an executable extraction spec
Route priorities
- 1. WeChat → browser chain
- Feishu doc/wiki → Feishu tools
- YouTube → transcript chain
- Generic URL →
r.jina.ai → defuddle.md → web_fetch → browser fallback
Output contract
Always return:
- - title
- author when available
- source
- url
- summary
- Markdown body
- save path when content is long
Fallback rule
Never claim success when extraction is partial. If a layer fails, report:
- - where it failed
- why it failed
- what fallback was tried next
Notes
- - The ClawHub abstracted package stays abstract.
- This local version restores the executable workflow for OpenClaw use and ClawDex publishing.
内容提取 — 可执行技能
该技能为本地可执行版本。它保留了源感知路由设计,并还原了具体的提取工作流程。
功能说明
- - 检测输入来源
- 选择最佳提取通道
- 生成干净的Markdown格式
- 必要时在本地保存长内容
- 解释回退失败原因而非隐藏问题
主要入口
- - scripts/extract_router.py — 对输入进行分类并构建路由方案
- scripts/extract.py — 生成可执行的提取规范
路由优先级
- 1. 微信 → 浏览器链
- 飞书文档/知识库 → 飞书工具
- YouTube → 转录链
- 通用URL → r.jina.ai → defuddle.md → web_fetch → 浏览器回退
输出约定
始终返回:
- - 标题
- 作者(如有)
- 来源
- 链接
- 摘要
- Markdown正文
- 长内容时的保存路径
回退规则
当提取不完整时,绝不声称成功。如果某一层失败,需报告:
备注
- - ClawHub抽象包保持抽象状态。
- 此本地版本为OpenClaw使用和ClawDex发布还原了可执行工作流程。