WeChat Article Summarize

把一个或多个微信公众号文章链接整理成结构化 markdown，支持单篇整理和多篇日报汇总。

功能简介

- 读取一个或多个 mp.weixin.qq.com 文章链接
抽取文章正文、标题、发布时间，以及可选的图片链接
自动修复常见的微信正文乱码问题
调用 summarize 用中文总结全文内容
生成结构化 markdown 文件

- 单篇文章整理 - 多篇文章汇总 / 日报

- 支持按日期 + 标题，或日期 + 篇数 + 汇总说明命名
支持把文件保存到用户指定目录

使用前需要确认

在真正开始抓取文章之前，需要先确认：

1. summarize 已经配置好 API key，并且可正常使用
是否需要在最终 markdown 中保留图片链接
最终文件保存到哪个目录

适用场景

- 总结单篇微信文章
把多篇微信文章汇总成一份日报
输出适合继续阅读、归档或二次整理的 markdown 文件

Workflow

Step 0: Confirm prerequisites before fetching anything

Do not fetch article content until all three items are clear:

1. summarize is ready

- Ask the user to configure summarize API access first if needed. - Verify summarize by running a tiny Chinese test. - Proceed only if summarize returns a usable summary.

2. Image preference

- Ask whether the final markdown should include image links. - Map user intent to include_images=true|false.

3. Output directory

- Ask where to save the final markdown file. - If the user says “下载文件夹”, use ~/Downloads. - Create the target directory if it does not exist.

If any of the three items is missing, stop and ask before continuing.

Step 1: Extract each WeChat article

For each mp.weixin.qq.com URL, run:

CODEBLOCK0

This produces structured metadata, raw HTML, and a first-pass markdown export.

Step 2: Clean the body text

Do not trust the first-pass article markdown blindly.

If the body contains mojibake or obvious encoding corruption, repair it from raw.html by running:

CODEBLOCK1

Use the cleaned body text as the canonical input for summarization.

Step 3: Summarize in Chinese

Always summarize the cleaned local text, not the original WeChat URL.

Run:

CODEBLOCK2

or for a combined report:

CODEBLOCK3

The script enforces Chinese output and fails if the returned summary is not sufficiently Chinese.

Step 4: Normalize summary text before writing markdown

Never write summarize output directly into the final file.

Normalize paragraph breaks and spacing with:

CODEBLOCK4

Use this for:

- each single-article summary
the combined daily-report overview

This prevents ugly line wrapping and mixed-language formatting artifacts.

Step 5: Build the final markdown

Single article

Run:

CODEBLOCK5

Multiple articles / daily report

Run:

CODEBLOCK6

The batch report must:

1. summarize all articles individually
summarize the full set as one combined overview
place the combined overview first
then append each single article section

Output rules

Naming

Single article

CODEBLOCK7

Multiple articles

CODEBLOCK8

Content rules

Single article output should contain

- title
source URL
publish time
summarize-generated Chinese summary
mindmap-style structure
optional image section

Batch report output should contain

- combined daily overview at the top
combined mindmap
per-article title, URL, date, summary, and mindmap
optional image overview

Non-negotiable quality gates

Before writing the final markdown:

1. Summary language check

- If the summary is not mainly Chinese, retry or fail.

2. Paragraph normalization

- Collapse unnatural line breaks inside prose. - Keep markdown headings and bullet lists intact.

3. Clean body source

- Prefer repaired text from raw.html when the extracted body is corrupted.

Bundled scripts

- scripts/read_wechat_article.py — fetch WeChat article metadata, body, raw HTML, and image links
INLINECODE10 — repair mojibake and extract clean text from raw HTML
INLINECODE11 — run summarize in Chinese and enforce a language check
INLINECODE12 — normalize prose paragraphs and line breaks
INLINECODE13 — generate single-article markdown files
INLINECODE14 — generate multi-article combined reports
INLINECODE15 — orchestrate the full workflow end to end after the required user confirmations

微信公众号文章总结

将一个或多个微信公众号文章链接整理成结构化markdown，支持单篇整理和多篇日报汇总。

功能简介

- 读取一个或多个 mp.weixin.qq.com 文章链接
抽取文章正文、标题、发布时间，以及可选的图片链接
自动修复常见的微信正文乱码问题
调用 summarize 用中文总结全文内容
生成结构化markdown文件

- 单篇文章整理 - 多篇文章汇总/日报

- 支持按日期+标题，或日期+篇数+汇总说明命名
支持将文件保存到用户指定目录

使用前需要确认

在真正开始抓取文章之前，需要先确认：

1. summarize 已经配置好API key，并且可正常使用
是否需要在最终markdown中保留图片链接
最终文件保存到哪个目录

适用场景

- 总结单篇微信文章
将多篇微信文章汇总成一份日报
输出适合继续阅读、归档或二次整理的markdown文件

工作流程

步骤0：抓取前确认前提条件

在三个事项都明确之前，不要抓取文章内容：

1. summarize已就绪

- 如有需要，先请用户配置summarize API访问权限 - 通过运行一个简短的中文测试来验证summarize - 仅当summarize返回可用的summary时才继续

2. 图片偏好

- 询问最终markdown是否应包含图片链接 - 将用户意图映射为include_images=true|false

- 询问最终markdown文件保存位置 - 如果用户说下载文件夹，则使用~/Downloads - 如果目标目录不存在则创建

如果上述三项中有任何一项缺失，则停止并询问，然后再继续。

步骤1：提取每篇微信文章

对每个 mp.weixin.qq.com URL，运行：

bash
python3 scripts/readwechatarticle.py url> --out dir>

这将生成结构化元数据、原始HTML和初步的markdown导出文件。

步骤2：清理正文文本

不要盲目信任初步的文章markdown。

如果正文包含乱码或明显的编码损坏，通过运行以下命令从raw.html修复：

bash
python3 scripts/fixwechatbody.py --out

使用清理后的正文文本作为摘要的标准输入。

步骤3：用中文总结

始终总结清理后的本地文本，而不是原始的微信URL。

运行：

bash
python3 scripts/summarize_cn.py --out --length short

或用于合并报告：

bash
python3 scripts/summarize_cn.py --out --length medium

该脚本强制输出中文，如果返回的摘要中文不够充分则失败。

步骤4：写入markdown前规范化摘要文本

切勿直接将摘要输出写入最终文件。

使用以下命令规范化段落分隔和间距：

bash
python3 scripts/normalizemarkdowntext.py --out

用于：

- 每篇单篇文章的摘要
合并的日报概览

这可以防止难看的换行和混合语言格式问题。

步骤5：构建最终markdown

单篇文章

运行：

bash
python3 scripts/buildmindmapmarkdown.py \
--result \
--body \
--summary \
--output-dir \
--include-images true

多篇文章/日报

运行：

bash
python3 scripts/buildbatchreport.py \
--inputs \
--output-dir \
--include-images true \
--report-label 微信文章日报

批量报告必须：

1. 单独总结每篇文章
将全部文章作为一个合并概览进行总结
将合并概览放在首位
然后附加每个单篇文章部分

输出规则

命名

单篇文章

text
YYYYMMDD-文章标题.md

多篇文章

text
YYYYMMDD-<总文章数量>篇-<汇总说明>.md

内容规则

单篇文章输出应包含

- 标题
来源URL
发布时间
摘要生成的中文总结
思维导图风格结构
可选的图片部分

批量报告输出应包含

- 顶部为合并日报概览
合并思维导图
每篇文章的标题、URL、日期、摘要和思维导图
可选的图片概览

不可妥协的质量关卡

在写入最终markdown之前：

1. 摘要语言检查

- 如果摘要不是以中文为主，则重试或失败

2. 段落规范化

- 折叠散文中不自然的换行 - 保持markdown标题和项目符号列表完整

3. 干净的正文来源

- 当提取的正文损坏时，优先使用来自raw.html的修复文本

配套脚本

- scripts/readwechatarticle.py — 获取微信文章元数据、正文、原始HTML和图片链接
scripts/fixwechatbody.py — 修复乱码并从原始HTML中提取干净文本
scripts/summarizecn.py — 用中文运行摘要并强制执行语言检查
scripts/normalizemarkdowntext.py — 规范化散文段落和换行
scripts/buildmindmapmarkdown.py — 生成单篇文章markdown文件
scripts/buildbatchreport.py — 生成多篇文章合并报告
scripts/runwechatmindmapworkflow.py — 在用户确认后编排完整工作流程

WeChat Article Summarize微信公众号文章摘要