WeChat → Obsidian Clipper
Clips a WeChat public article to your Obsidian vault:
- - Headless — uses
agent-browser (no visible window, fast) - Full fidelity — extracts text and images in their original DOM order
- Images downloaded — saves to
attachments/ next to the note using curl with correct Referer header (required to avoid 403) - Clean Markdown — Obsidian
![[filename]] embed syntax
Requirements
- -
agent-browser ≥ 0.17: INLINECODE6 - INLINECODE7 (pre-installed on macOS/Linux)
- An Obsidian vault accessible on the local filesystem
Trigger
User sends a mp.weixin.qq.com link and says something like:
- - "存 Obsidian" / "剪藏" / "帮我存"
- "save to Obsidian" / "clip this" / "save article"
Standard Workflow
Step 1 — Open page (headless)
CODEBLOCK0
Step 2 — Get title
CODEBLOCK1
Step 3 — Scroll to trigger lazy-loaded images
WeChat uses lazy loading — must scroll before extracting image URLs.
CODEBLOCK2
Step 4 — Extract content in DOM order
Important: Use classic function(){} syntax, not arrow functions — arrow functions inside JSON.stringify() cause a syntax error in agent-browser eval.
CODEBLOCK3
Image filter rules (skip these):
- -
mmbiz.qlogo — author avatar - INLINECODE12 — account profile image
- height < 50px — decorative dividers
- alt containing
二维码 / 引导 / 赞赏 — QR codes and tip prompts
Step 5 — Confirm save location
Always ask the user before writing. Suggest a directory based on topic:
📂 Suggested: <vault_root>/<topic_directory>/
📄 Filename: <title-keywords-YYYY-MM-DD>.md
🖼 Images → same folder's attachments/ directory
Confirm, or pick a different location?
Step 6 — Download images
CODEBLOCK4
Naming convention: <slug>-图00.png, 图01.jpg … numbered in DOM order.
Extension: check wx_fmt=jpeg → .jpg, otherwise → .png.
Shell tip for batch download — use a function, not declare -A (zsh doesn't support associative arrays):
CODEBLOCK5
Step 7 — Write the note
Strictly follow the DOM order from Step 4 — insert images at their original positions, not at the top or bottom.
CODEBLOCK6
Step 8 — Close browser
CODEBLOCK7
Step 9 — Report to user
Tell the user:
- - Note path
- Number of images downloaded
- ⚠️ Reminder: if you move the note, move the
attachments/ folder with it. Obsidian's ![[]] does a global vault search so links won't break immediately, but it's cleaner to keep them together.
Gotchas (hard-won lessons)
| # | Issue | Fix |
|---|
| 1 | Lazy-loaded images return empty INLINECODE27 | Always scroll the full page before extracting |
| 2 |
WeChat images return 403 |
curl must pass
-e 'https://mp.weixin.qq.com/' as Referer |
| 3 | Image order wrong | Traverse DOM nodes in order — don't collect images separately |
| 4 |
zsh: bad substitution | zsh doesn't support
declare -A; use a shell function instead |
| 5 |
SyntaxError: missing ) after argument list in agent-browser eval | Use classic
function(){} not arrow functions inside
JSON.stringify() |
| 6 |
async eval hangs | Wrap in
(async () => { ... })() — no outer
await needed |
WeChat → Obsidian 剪藏工具
将微信公众号文章剪藏到你的 Obsidian 仓库:
- - 无头模式 — 使用 agent-browser(无可见窗口,速度快)
- 完整保真 — 按原始 DOM 顺序提取文本和图片
- 图片下载 — 使用带正确 Referer 标头的 curl 保存到笔记旁的 attachments/ 目录(避免 403 错误)
- 干净 Markdown — Obsidian ![[filename]] 嵌入语法
前提条件
- - agent-browser ≥ 0.17:npm install -g agent-browser && agent-browser install
- curl(macOS/Linux 预装)
- 本地文件系统可访问的 Obsidian 仓库
触发方式
用户发送 mp.weixin.qq.com 链接并说类似:
- - 存 Obsidian / 剪藏 / 帮我存
- save to Obsidian / clip this / save article
标准工作流程
步骤 1 — 打开页面(无头模式)
bash
agent-browser open
agent-browser wait --load networkidle
步骤 2 — 获取标题
bash
agent-browser get title
步骤 3 — 滚动以触发懒加载图片
微信公众号使用懒加载 — 提取图片 URL 前必须先滚动。
bash
agent-browser eval
(async () => {
window.scrollTo(0, document.body.scrollHeight);
await new Promise(r => setTimeout(r, 2000));
const step = 600;
for (let y = 0; y < document.body.scrollHeight; y += step) {
window.scrollTo(0, y);
await new Promise(r => setTimeout(r, 300));
}
return done;
})()
步骤 4 — 按 DOM 顺序提取内容
重要: 使用经典 function(){} 语法,不要使用箭头函数 — 在 JSON.stringify() 内部使用箭头函数会导致 agent-browser eval 出现语法错误。
bash
agent-browser eval
(function() {
var nodes = document.querySelectorAll(#jscontent p, #jscontent section, #js_content img);
var result = [];
var imgIdx = 0;
nodes.forEach(function(node) {
if (node.tagName === IMG) {
var src = node.currentSrc || node.src || node.dataset.src || ;
if (src && src.includes(mmbiz) &&
!src.includes(mmbiz.qlogo) &&
!src.includes(profile)) {
var h = node.naturalHeight || node.height || 0;
var alt = (node.alt || ).toLowerCase();
if ((h >= 50 || h === 0) &&
!alt.includes(二维码) &&
!alt.includes(引导) &&
!alt.includes(赞赏)) {
result.push({ type: img, idx: imgIdx++, src: src });
}
}
} else {
var text = node.innerText ? node.innerText.trim() : ;
if (text && text.length > 3) result.push({ type: text, text: text });
}
});
return JSON.stringify(result);
})()
图片过滤规则(跳过以下内容):
- - mmbiz.qlogo — 作者头像
- mp_profile — 公众号头像
- 高度 < 50px — 装饰性分隔线
- alt 包含 二维码 / 引导 / 赞赏 — 二维码和打赏提示
步骤 5 — 确认保存位置
写入前务必询问用户。根据主题建议目录:
📂 建议位置:root>/directory>/
📄 文件名:.md
🖼 图片 → 同一文件夹的 attachments/ 目录
确认,或选择其他位置?
步骤 6 — 下载图片
bash
mkdir -p /attachments
必须包含 Referer 标头 — 否则微信公众号返回 403
curl -s -L --fail \
-A Mozilla/5.0 (Macintosh; Intel Mac OS X 10
157) AppleWebKit/537.36 \
-e https://mp.weixin.qq.com/ \
url> -o dir>/attachments/
命名规则:-图00.png、图01.jpg … 按 DOM 顺序编号。
扩展名:检查 wx_fmt=jpeg → .jpg,否则 → .png。
批量下载的 Shell 技巧 — 使用函数,不要用 declare -A(zsh 不支持关联数组):
bash
download_img() {
local idx=$1 url=$2
local ext=png
echo $url | grep -q wx_fmt=jpeg && ext=jpg
local fname=$(printf -图%02d.%s $idx $ext)
curl -s -L --fail \
-A Mozilla/5.0 \
-e https://mp.weixin.qq.com/ \
$url -o $fname \
&& echo OK $fname || echo FAIL $fname
}
步骤 7 — 写入笔记
严格遵循步骤 4 的 DOM 顺序 — 在原始位置插入图片,而不是放在顶部或底部。
markdown
{文章标题}
来源: 微信公众号 — {作者/公众号名称}
原文链接: {URL}
剪藏时间: {YYYY-MM-DD}
标签: #{标签1} #{标签2}
{段落文本}
![[slug-图00.jpg]]
{更多段落文本}
![[slug-图01.png]]
...
参考链接:
步骤 8 — 关闭浏览器
bash
agent-browser close
步骤 9 — 向用户报告
告知用户:
- - 笔记路径
- 下载的图片数量
- ⚠️ 提醒:如果移动笔记,请同时移动 attachments/ 文件夹。Obsidian 的 ![[]] 会进行全局仓库搜索,链接不会立即断开,但保持在一起更整洁。
注意事项(经验教训)
| # | 问题 | 解决方法 |
|---|
| 1 | 懒加载图片返回空 src | 提取前务必滚动整个页面 |
| 2 |
微信公众号图片返回 403 | curl 必须传递 -e https://mp.weixin.qq.com/ 作为 Referer |
| 3 | 图片顺序错误 | 按顺序遍历 DOM 节点 — 不要单独收集图片 |
| 4 | zsh: bad substitution | zsh 不支持 declare -A;改用 shell 函数 |
| 5 | agent-browser eval 中出现 SyntaxError: missing ) after argument list | 在 JSON.stringify() 内部使用经典 function(){},不要用箭头函数 |
| 6 | async eval 挂起 | 包裹在 (async () => { ... })() 中 — 不需要外层 await |