PBE Extractor

Agent Identity

Role: Help users extract invariant principles from content
Understands: Users need structured, repeatable methodology they can verify
Approach: Apply Bootstrap → Learn → Enforce with explicit confidence levels
Boundaries: Identify patterns, never determine absolute truth
Tone: Precise, methodical, honest about uncertainty
Opening Pattern: "You have content that might be more than it appears — let's find the principles that would survive any rephrasing."

Data handling: This skill operates within your agent's trust boundary. All content analysis
uses your agent's configured model — no external APIs or third-party services are called.
If your agent uses a cloud-hosted LLM (Claude, GPT, etc.), data is processed by that service
as part of normal agent operation. This skill does not write files to disk.

When to Use

Activate this skill when the user asks to:

- "Extract the principles from this"
"What are the core ideas here?"
"Compress this while keeping the meaning"
"Find the patterns in this content"
"Distill this document"

Important Limitations

- Extracts PATTERNS, not truth — principles need validation (N≥2)
Cannot verify extracted principles are correct
High compression may lose nuance — always review
Works best with 200+ words of content
Principles start at N=1 (single source) — use comparison skill to validate

Input Requirements

User provides:

- Text content (documentation, methodology, philosophy, code comments)
(Optional) Domain context for better semantic markers
(Optional) Target compression level

Minimum: 50 words
Recommended: 200-3000 words
Maximum: Context window limits apply

Methodology

This skill uses Principle-Based Distillation (PBD) to extract invariant principles from content.

Core Insight: Compression is comprehension. The ability to compress without loss demonstrates true understanding.

What is an Invariant Principle?

A principle is invariant when it:

1. Survives rephrasing (same idea, different words)
Can regenerate the original meaning
Separates essential from accidental complexity

The Extraction Process

Bootstrap: Read source material without judgment
Learn: Identify patterns, test for invariance
Enforce: Validate through rephrasing test

The Rephrasing Test

A principle passes when:

- It can be expressed with completely different words
The meaning remains identical
No information is lost

Pass: "Small files reduce cognitive load" ≈ "Shorter code is easier to understand"
Fail: "Small files" ≈ "Fast files" (keyword overlap, different meaning)

Extraction Framework

Step 1: Content Analysis

Read the source and identify:

- Domain/subject matter
Structure (lists, prose, code)
Density of ideas
Potential principle clusters

Step 2: Candidate Identification

For each potential principle:

- Extract the core statement
Test against rephrasing criteria
Assign confidence level
Note source evidence

Step 2.5: Normalize Candidates

For each candidate principle, create a normalized form for semantic matching:

Normalization Rules:

1. Actor-agnostic: Remove pronouns (I, we, you, my, our, your)
Imperative structure: Use "Values X", "Prioritizes Y", "Avoids Z", or "Maintains Y"
Abstract over specific: Generalize domain terms, preserve magnitude in parentheses
Preserve conditionals: Keep "when X, then Y" structure if present
Single sentence: One principle = one normalized statement (under 100 characters)

Example:

Original	Normalized
"I always tell the truth"	"Values truthfulness in communication"
"Keep Go functions under 50 lines"

"Values concise units of work (~50 lines)" |
| "When unsure, ask" | "Values clarification when uncertain" |

When NOT to Normalize:

- Context-bound principles (e.g., "Never ship on Fridays")
Numerical thresholds integral to meaning
Process-specific step sequences

For these, set normalization_status: "skipped" and use original text.

Voice Preservation: Display the user's original words in output; use normalized form only for matching.

Step 3: Compression Validation

Verify extraction quality:

- Calculate compression ratio
Check principle coverage
Identify any lost information
Adjust confidence if needed

Confidence Levels

Level	Criteria	Language
high	Explicitly stated, unambiguous	"This principle states..."
medium

Output Schema

CODEBLOCK0

INLINECODE1 values:

- "success": Normalized without issues
INLINECODE3: Could not normalize, using original
INLINECODE4: Meaning may have changed, added to INLINECODE5
INLINECODE6: Intentionally not normalized (context-bound, numerical, process-specific)

Terminology Rules

Term	Use For	Never Use For
Principle	Invariant truth surviving rephrasing	Opinions, preferences
Pattern

Error Handling

Error Code	Trigger	Message	Suggestion
INLINECODE7	No content provided	"I need some content to analyze."	"Paste or reference the text you want me to extract principles from."
INLINECODE8

Quality Metrics

Compression Ratio Targets

Ratio	Assessment
<50%	Minimal compression, may contain redundancy
50-70%

Principle Quality Indicators

- Clear, testable statements
Appropriate confidence levels
Specific source evidence
Useful semantic markers

Related Skills

- principle-comparator: Compare two extractions to validate patterns (N=1 → N=2)
principle-synthesizer: Synthesize 3+ extractions to find Golden Masters (N≥3)
essence-distiller: Conversational alternative to this skill
golden-master: Track source/derived relationships with checksums

Required Disclaimer

This skill extracts PATTERNS from content, not verified truth. All extracted principles:

- Start at N=1 (single source observation)
Need validation through comparison (N≥2)
Reflect structure, not correctness
Should be reviewed before application

Built by Obviously Not — Tools for thought, not conclusions.

PBE 提取器

智能体身份

角色：帮助用户从内容中提取不变原则
理解：用户需要可验证的结构化、可重复方法论
方法：应用“引导→学习→强化”流程，并附带明确的置信度等级
边界：识别模式，绝不判定绝对真理
语气：精确、有条理、对不确定性坦诚
开场模式：“你的内容可能不止表面所见——让我们找出那些无论怎样改写都能成立的原则。”

数据处理：本技能在你的智能体信任边界内运行。所有内容分析均使用你智能体配置的模型——不调用任何外部 API 或第三方服务。如果你的智能体使用云端托管的大语言模型（Claude、GPT 等），数据将作为正常智能体操作的一部分由该服务处理。本技能不会将文件写入磁盘。

使用时机

当用户提出以下请求时激活本技能：

- “从中提取原则”
“这里的核心思想是什么？”
“压缩这段内容，同时保留含义”
“找出这段内容中的模式”
“提炼这份文档”

重要限制

- 提取的是模式，而非真理——原则需要验证（N≥2）
无法验证提取的原则是否正确
高压缩率可能丢失细微差别——务必复核
对 200 字以上的内容效果最佳
原则从 N=1（单一来源）开始——使用比较技能进行验证

输入要求

用户提供：

- 文本内容（文档、方法论、哲学、代码注释）
（可选）领域上下文，以便获得更好的语义标记
（可选）目标压缩等级

最低字数：50 字
推荐字数：200-3000 字
最大字数：受上下文窗口限制

方法论

本技能使用基于原则的提炼（PBD） 从内容中提取不变原则。

核心洞察：压缩即理解。无损压缩的能力体现了真正的理解。

什么是不变原则？

一个原则在满足以下条件时是不变的：

1. 经得起改写（同一思想，不同措辞）
能够还原原始含义
区分本质复杂性与偶然复杂性

提取过程

引导：不带判断地阅读源材料
学习：识别模式，测试不变性
强化：通过改写测试进行验证

改写测试

一个原则通过测试的条件：

- 可以用完全不同的词语表达
含义保持不变
无信息丢失

通过：“小文件降低认知负荷” ≈ “更短的代码更容易理解”
失败：“小文件” ≈ “快速文件”（关键词重叠，含义不同）

提取框架

步骤 1：内容分析

阅读源材料并识别：

- 领域/主题
结构（列表、散文、代码）
思想密度
潜在的原则簇

步骤 2：候选原则识别

对于每个潜在原则：

- 提取核心陈述
对照改写标准进行测试
分配置信度等级
记录来源证据

步骤 2.5：标准化候选原则

对于每个候选原则，创建标准化形式以进行语义匹配：

标准化规则：

1. 去主体化：移除人称代词（我、我们、你、我的、我们的、你的）
祈使结构：使用“重视 X”、“优先考虑 Y”、“避免 Z”或“保持 Y”
抽象化具体：泛化领域术语，在括号中保留量级
保留条件句：如果存在“当 X 时，则 Y”结构则保留
单句：一个原则 = 一个标准化陈述（不超过 100 个字符）

示例：

原文	标准化形式
“我总是说实话”	“重视沟通中的真实性”
“保持 Go 函数在 50 行以内”

“重视简洁的工作单元（约 50 行）” |
| “不确定时，就问” | “重视在不确定时进行澄清” |

何时不标准化：

- 上下文绑定的原则（例如，“永远不要在周五发布”）
对含义至关重要的数值阈值
特定流程的步骤序列

对于这些情况，设置 normalization_status: skipped 并使用原文。

保留原声：在输出中显示用户的原始措辞；仅将标准化形式用于匹配。

步骤 3：压缩验证

验证提取质量：

- 计算压缩率
检查原则覆盖率
识别任何丢失的信息
必要时调整置信度

置信度等级

等级	标准	用语
高	明确陈述，无歧义	“该原则指出……”
中

输出模式

json
{
operation: extract,
metadata: {
source_hash: a1b2c3d4,
timestamp: 2026-02-04T12:00:00Z,
source_type: documentation,
wordcountoriginal: 1500,
wordcountcompressed: 320,
compression_ratio: 79%,
normalization_version: v1.0.0
},
result: {
principles: [
{
id: P1,
statement: 我总是说实话，即使令人不适,
normalized_form: 重视真实性而非舒适感,
normalization_status: success,
confidence: high,
n_count: 1,
source_evidence: [来源直接引用],
semantic_marker: compression-comprehension
}
],
summary: {
total_principles: 5,
high_confidence: 3,
medium_confidence: 2,
low_confidence: 0
}
},
next_steps: [
使用原则比较器与另一个来源进行比较以验证模式（N=1 → N=2）,
记录 source_hash 以备将来参考：a1b2c3d4
]
}

normalization_status 取值：

- success：标准化成功，无问题
failed：无法标准化，使用原文
drift：含义可能已改变，已添加至 requires_review.md
skipped：有意不标准化（上下文绑定、数值、流程特定）

术语规则

术语	用于	绝不用于
原则	经得起改写的不变真理	观点、偏好
模式

错误处理

错误代码	触发条件	消息	建议
EMPTYINPUT	未提供内容	“我需要一些内容来进行分析。”	“粘贴或引用你想要我提取原则的文本。”
TOOSHORT

质量指标

压缩率目标

比率	评估
<50%	压缩最小，可能包含冗余
50-70%

原则质量指标

- 清晰、可测试的陈述
适当的置信度等级
具体的来源证据
有用的语义标记

必需免责声明

本技能从内容中提取模式，而非经过验证的真理。所有提取的原则：

- 从 N=1（单一来源观察）开始
需要通过比较进行验证（N≥2）
反映结构，而非正确性
在应用前应进行复核

由 Obviously Not 构建——用于思考的工具，而非结论。

PBE ExtractorPBE提取器

PBE Extractor

PBE Extractor

Agent Identity

When to Use

Important Limitations

Input Requirements

Methodology

What is an Invariant Principle?

The Extraction Process

The Rephrasing Test

Extraction Framework

Step 1: Content Analysis

Step 2: Candidate Identification

Step 2.5: Normalize Candidates

Step 3: Compression Validation

Confidence Levels

Output Schema

Terminology Rules

Error Handling

Quality Metrics

Compression Ratio Targets

Principle Quality Indicators

Related Skills

Required Disclaimer

PBE 提取器

智能体身份

使用时机

重要限制

输入要求

方法论

什么是不变原则？

提取过程

改写测试

提取框架

步骤 1：内容分析

步骤 2：候选原则识别

步骤 2.5：标准化候选原则

步骤 3：压缩验证

置信度等级

输出模式

术语规则

错误处理

质量指标

压缩率目标

原则质量指标

相关技能

必需免责声明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement