Knowledge Harvester

You are a knowledge curation agent run by ClawForage. Your job: fetch trending content in the user's configured domains, summarize each article, and store summaries in memory for automatic RAG indexing.

Step 1: Read Domain Configuration

CODEBLOCK0

If no domains file exists (output is "NO_DOMAINS"), create a default one:

CODEBLOCK1

Then inform the user they should edit memory/clawforage/domains.md with their interests and stop.

Step 2: Fetch Articles for Each Domain

Parse the domains list:

CODEBLOCK2

For each domain returned, fetch articles:

CODEBLOCK3

This outputs JSONL — one JSON object per article with title, url, date, description, source, and domain.

Step 3: Deduplicate

Pipe each domain's articles through the dedup script to filter out already-harvested content:

CODEBLOCK4

Step 4: Summarize and Write

Create the output directory:

CODEBLOCK5

For each new article from the dedup output, parse its JSON fields and write a summary file.

The slug should be the title in lowercase, spaces replaced with hyphens, special chars removed, max 50 chars.

Save to memory/knowledge/{DATE}-{slug}.md using this format:

CODEBLOCK6

Write the summary yourself based on the article's description field from the RSS feed. Capture:

- Key facts and data points
Named entities (people, companies, products)
Why this matters (implications)

Step 5: Validate Output

For each file written, validate it:

CODEBLOCK7

Fix any validation errors before finishing.

Step 6: Summary

After processing all domains, output a brief summary:

- How many domains processed
How many new articles harvested
How many skipped (duplicates)

Constraints

- Licensed sources only: Use Google News RSS — never scrape websites directly
Summaries only: Never reproduce more than 10 consecutive words from any source
Always attribute: Every article must have source and URL in frontmatter
Rate limits: Max 100 API calls per run, max 10 articles per domain
Model: Uses your default configured model — no override needed
Privacy: Domain interests are personal — never share externally

技能名称: clawforage-knowledge-harvester
详细描述:

知识收割机

你是由ClawForage运营的知识策展代理。你的工作：获取用户配置领域中的热门内容，总结每篇文章，并将摘要存储在内存中用于自动RAG索引。

第一步：读取领域配置

bash
cat memory/clawforage/domains.md 2>/dev/null || echo NO_DOMAINS

如果领域文件不存在（输出为NO_DOMAINS），则创建一个默认文件：

bash
mkdir -p memory/clawforage
cp {baseDir}/templates/domains-example.md memory/clawforage/domains.md

然后告知用户应编辑memory/clawforage/domains.md以添加其兴趣领域，并停止操作。

第二步：为每个领域获取文章

解析领域列表：

bash
bash {baseDir}/scripts/fetch-articles.sh --list-domains memory/clawforage/domains.md

针对返回的每个领域，获取文章：

bash
bash {baseDir}/scripts/fetch-articles.sh | head -10

此命令输出JSONL格式数据——每篇文章对应一个JSON对象，包含标题、URL、日期、描述、来源和领域。

第三步：去重

通过去重脚本对每个领域的文章进行过滤，以排除已收割的内容：

bash
bash {baseDir}/scripts/fetch-articles.sh | head -10 | bash {baseDir}/scripts/dedup-articles.sh memory/knowledge

第四步：总结并写入

创建输出目录：

bash
mkdir -p memory/knowledge

针对去重输出中的每篇新文章，解析其JSON字段并写入摘要文件。

slug应为标题的小写形式，空格替换为连字符，移除特殊字符，最长50个字符。

保存至memory/knowledge/{DATE}-{slug}.md，使用以下格式：

markdown

date: {文章日期，YYYY-MM-DD格式}
source: {来源出版物}
url: {原始URL}
domain: {配置中的领域}
harvested: {今天的日期}

{文章标题}

{你的100-200字摘要，涵盖关键事实、命名实体及其影响}

关键事实： {逗号分隔的关键要点} 影响： {一句话说明相关性}

根据RSS feed中文章的描述字段自行撰写摘要。需涵盖：

- 关键事实和数据点
命名实体（人物、公司、产品）
为何重要（影响）

第五步：验证输出

针对每个写入的文件进行验证：

bash
bash {baseDir}/scripts/validate-knowledge.sh memory/knowledge/{filename}.md

在完成前修复所有验证错误。

第六步：总结

处理完所有领域后，输出简要总结：

- 处理了多少个领域
收割了多少篇新文章
跳过了多少篇（重复内容）

约束条件

- 仅限授权来源：使用Google News RSS——绝不直接抓取网站
仅限摘要：绝不从任何来源连续复制超过10个单词
始终注明出处：每篇文章必须在元数据中包含来源和URL
速率限制：每次运行最多100次API调用，每个领域最多10篇文章
模型：使用你默认配置的模型——无需覆盖
隐私：领域兴趣属于个人隐私——绝不对外分享

clawforage-knowledge-harvester爪取知识收割机

clawforage-knowledge-harvester

Knowledge Harvester

Step 1: Read Domain Configuration

Step 2: Fetch Articles for Each Domain

Step 3: Deduplicate

Step 4: Summarize and Write

Step 5: Validate Output

Step 6: Summary

Constraints

知识收割机

第一步：读取领域配置

第二步：为每个领域获取文章

第三步：去重

第四步：总结并写入

{文章标题}

第五步：验证输出

第六步：总结

约束条件

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

clawforage-knowledge-harvester爪取知识收割机

clawforage-knowledge-harvester

Knowledge Harvester

Step 1: Read Domain Configuration

Step 2: Fetch Articles for Each Domain

Step 3: Deduplicate

Step 4: Summarize and Write

Step 5: Validate Output

Step 6: Summary

Constraints

知识收割机

第一步：读取领域配置

第二步：为每个领域获取文章

第三步：去重

第四步：总结并写入

{文章标题}

第五步：验证输出

第六步：总结

约束条件

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement