Decompose
Decompose any text or URL into classified semantic units. Each unit gets authority level, risk category, attention score, entity extraction, and irreducibility flags. No LLM required. Deterministic. Runs locally.
Setup
1. Install
CODEBLOCK0
2. Configure MCP Server
Add to your OpenClaw MCP config:
CODEBLOCK1
3. Verify
CODEBLOCK2
Available Tools
decompose_text
Decompose any text into classified semantic units.
Parameters:
- -
text (required) — The text to decompose - INLINECODE2 (optional, default: false) — Omit zero-value fields for smaller output
- INLINECODE3 (optional, default: 2000) — Max characters per unit
Example prompt: "Decompose this spec and tell me which sections are mandatory"
Returns: JSON with units array. Each unit contains:
- -
authority — mandatory, prohibitive, directive, permissive, conditional, informational - INLINECODE6 — safety_critical, security, compliance, financial, contractual, advisory, informational
- INLINECODE7 — 0.0 to 10.0 priority score
- INLINECODE8 — whether someone needs to act on this
- INLINECODE9 — whether content must be preserved verbatim
- INLINECODE10 — referenced standards and codes (ASTM, ASCE, IBC, OSHA, etc.)
- INLINECODE11 — extracted date references
- INLINECODE12 — extracted dollar amounts and percentages
- INLINECODE13 — document structure hierarchy
decompose_url
Fetch a URL and decompose its content. Handles HTML, Markdown, and plain text.
Parameters:
- -
url (required) — URL to fetch and decompose - INLINECODE16 (optional, default: false) — Omit zero-value fields
Example prompt: "Decompose https://spec.example.com/transport and show me the security requirements"
What It Detects
- - Authority levels — RFC 2119 keywords: "shall" = mandatory, "should" = directive, "may" = permissive
- Risk categories — safety-critical, security, compliance, financial, contractual
- Attention scoring — authority weight x risk multiplier, 0-10 scale
- Standards references — ASTM, ASCE, IBC, OSHA, ACI, AISC, AWS, ISO, EN
- Financial values — dollar amounts, percentages, retainage, liquidated damages
- Dates — deadlines, milestones, notice periods
- Irreducibility — legal mandates, threshold values, formulas that cannot be paraphrased
Use Cases
- - Pre-process documents before sending to your LLM — save 60-80% of context window
- Classify specs, contracts, policies, regulations by obligation level
- Extract standards references and compliance requirements
- Route high-attention content to specialized analysis chains
- Build structured training data from raw documents
Performance
- - ~14ms average per document on Apple Silicon
- 1,000+ chars/ms throughput
- Zero API calls, zero cost, works offline
- Deterministic — same input always produces same output
Security & Trust
Text classification is fully local. The decompose_text tool performs all processing in-process with no network I/O. No data leaves your machine.
URL fetching performs outbound HTTP requests. The decompose_url tool fetches the target URL, which necessarily involves network I/O to the specified host. This is why the skill declares the network permission in claw.json. If you do not need URL fetching, you can use decompose_text exclusively with no network access required.
SSRF protection. URL fetching blocks private/internal IP ranges before connecting: 0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 127.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, 192.168.0.0/16, ::1/128, fc00::/7, fe80::/10. The implementation resolves the hostname via DNS before connecting and checks all returned addresses against the blocklist. See src/decompose/mcp_server.py lines 19-49.
No API keys or credentials required. No external services are contacted except when using decompose_url to fetch user-specified URLs.
Source code is fully auditable. The complete source is published at github.com/echology-io/decompose. The PyPI package is built from this repo via GitHub Actions (publish.yml) using PyPI Trusted Publishers (OIDC), so the published artifact is traceable to a specific commit.
Resources
Decompose
将任何文本或URL分解为分类语义单元。每个单元都包含权威级别、风险类别、关注度评分、实体提取和不可简化标记。无需LLM。确定性处理。本地运行。
安装配置
1. 安装
bash
pip install decompose-mcp
2. 配置MCP服务器
添加到您的OpenClaw MCP配置中:
json
{
mcpServers: {
decompose: {
command: python3,
args: [-m, decompose, --serve]
}
}
}
3. 验证
bash
python3 -m decompose --text 承包商应按照ASTM C150-20标准提供所有材料。
可用工具
decompose_text
将任何文本分解为分类语义单元。
参数:
- - text(必填)— 待分解的文本
- compact(可选,默认:false)— 省略零值字段以减小输出体积
- chunk_size(可选,默认:2000)— 每个单元的最大字符数
示例提示:分解此规范并告诉我哪些部分是强制性的
返回:包含units数组的JSON。每个单元包含:
- - authority — 强制性、禁止性、指令性、许可性、条件性、信息性
- risk — 安全关键、安全、合规、财务、合同、咨询、信息
- attention — 0.0到10.0的优先级评分
- actionable — 是否需要有人对此采取行动
- irreducible — 内容是否必须逐字保留
- entities — 引用的标准和规范(ASTM、ASCE、IBC、OSHA等)
- dates — 提取的日期引用
- financial — 提取的金额和百分比
- heading_path — 文档结构层级
decompose_url
获取URL并分解其内容。支持HTML、Markdown和纯文本。
参数:
- - url(必填)— 要获取和分解的URL
- compact(可选,默认:false)— 省略零值字段
示例提示:分解https://spec.example.com/transport并显示安全要求
检测内容
- - 权威级别 — RFC 2119关键词:shall=强制性,should=指令性,may=许可性
- 风险类别 — 安全关键、安全、合规、财务、合同
- 关注度评分 — 权威权重×风险乘数,0-10分制
- 标准引用 — ASTM、ASCE、IBC、OSHA、ACI、AISC、AWS、ISO、EN
- 财务数值 — 金额、百分比、保留金、违约金
- 日期 — 截止日期、里程碑、通知期限
- 不可简化性 — 法律强制要求、阈值、无法改写的公式
使用场景
- - 在将文档发送给LLM之前进行预处理 — 节省60-80%的上下文窗口
- 按义务级别对规范、合同、政策、法规进行分类
- 提取标准引用和合规要求
- 将高关注度内容路由到专门的分析链
- 从原始文档构建结构化训练数据
性能表现
- - Apple Silicon上每份文档平均约14ms
- 1000+字符/毫秒吞吐量
- 零API调用,零成本,离线运行
- 确定性 — 相同输入始终产生相同输出
安全与信任
文本分类完全在本地进行。decompose_text工具在进程内完成所有处理,无需网络I/O。您的数据不会离开您的机器。
URL获取会执行出站HTTP请求。decomposeurl工具会获取目标URL,这必然涉及与指定主机的网络I/O。这就是该技能在claw.json中声明network权限的原因。如果您不需要URL获取功能,可以仅使用decomposetext,无需网络访问。
SSRF保护。URL获取在连接前会阻止私有/内部IP范围:0.0.0.0/8、10.0.0.0/8、100.64.0.0/10、127.0.0.0/8、169.254.0.0/16、172.16.0.0/12、192.168.0.0/16、::1/128、fc00::/7、fe80::/10。实现在连接前通过DNS解析主机名,并对照阻止列表检查所有返回的地址。详见src/decompose/mcp_server.py第19-49行。
无需API密钥或凭据。除使用decompose_url获取用户指定的URL外,不会联系任何外部服务。
源代码完全可审计。完整源代码发布在github.com/echology-io/decompose。PyPI包通过GitHub Actions(publish.yml)使用PyPI可信发布者(OIDC)从此仓库构建,因此发布的工件可追溯到特定提交。
资源