Obsidian Clipper

Universal clipper — saves URLs from any platform to your Obsidian Vault with local media, tags, and wikilinks.

Configuration

On first run, read config.yml from the same directory as this SKILL.md file. If missing, tell the user to copy config.yml.example to config.yml and fill in vault.base_path.

Key paths derived from config:

CODEBLOCK0

URL Router

Match the URL and dispatch to the correct handler:

URL pattern	Handler
INLINECODE4 or INLINECODE5	X Handler
INLINECODE6

WeChat Handler |
| xhslink.com/* or xiaohongshu.com/* | Xiaohongshu Handler |
| v.douyin.com/* or douyin.com/video/* | Douyin Handler |
| github.com/{owner}/{repo} (no deeper path) | GitHub Handler |
| Everything else | Web Handler |

Shared Rules

These apply to ALL handlers:

File naming

- Use the content title as filename
Strip /\:*?"<>| and all emoji / special Unicode
Truncate if >60 characters
If a file with the same name exists, ask the user before overwriting

Media download

- Download all images and videos to INLINECODE19
Image naming: {title-slug}-{n}.{ext} (slug ≤20 chars, no emoji)
Video naming: {title-slug}.mp4 or INLINECODE22
Replace remote URLs with Obsidian wikilinks: INLINECODE23

"Why clipped" field

- Infer from conversation context if possible
If context is insufficient, ask the user

Cross-platform linking

- If the content contains a github.com/{owner}/{repo} link, auto-trigger the GitHub Handler to create a GitHub note, then add bidirectional wikilinks

X Handler

Saves X (Twitter) posts and articles.

Step 1: Fetch post data

CODEBLOCK1

Extract from URL: x.com/{handle}/status/{id} → handle, tweet_id.

Fields:

- tweet.author.name → author name
INLINECODE27 → @handle
INLINECODE28 → post body (short posts)
INLINECODE29 → long-form article (if present)
INLINECODE30 → article title
INLINECODE31 → structured content (Draft.js format)
INLINECODE32 → publish date
INLINECODE33 / tweet.retweets / tweet.views → engagement
INLINECODE36 → short post images
INLINECODE37 → article inline images

Step 2: Determine content type

Short post (tweet.article is null): use tweet.text, images from tweet.media.photos.

Article (tweet.article exists): parse tweet.article.content.blocks (Draft.js):

block type	Markdown
INLINECODE43	paragraph
INLINECODE44

Inline styles: Bold → **text**, Italic → *text*.

For atomic blocks: entityMap → value.data.mediaItems[].mediaId → match media_entities → media_info.original_img_url.

Step 3: Download images

Download all images to ATTACHMENTS/.

Step 4: Generate file

Write to X_DIR/{title}.md:

Short post:
CODEBLOCK2

Article:
CODEBLOCK3

Notes

- fxtwitter API needs no auth but may rate-limit; try vxtwitter.com as fallback
Title: articles use article.title; short posts use INLINECODE71

WeChat Handler

Saves WeChat Official Account (公众号) articles.

Step 1: Fetch article data

CODEBLOCK4

Extract: title, nick_name, create_time, content_noencode, link.

If API returns 204 or fails, fall back to defuddle or WebFetch.

Step 2: Fetch HTML (if rich content needed)

If content has images or complex formatting:

CODEBLOCK5

Extract image URLs and formatted content from HTML.

Step 3: Download images

- Download each image to INLINECODE79
WeChat image URLs: mmbiz.qpic.cn/mmbiz_png/... → .png, mmbiz_jpg/... → INLINECODE83
Replace with INLINECODE84

Step 4: Generate file

Write to WECHAT_DIR/{title}.md:

CODEBLOCK6

Xiaohongshu Handler

Saves 小红书 notes (images + video).

Step 1: Resolve short link

If URL is xhslink.com, follow redirects to get the real URL with full query parameters (especially xsec_token):

CODEBLOCK7

Important: The full query string is required. Bare URLs return empty noteDetailMap.

Step 2: Fetch SSR data

Xiaohongshu is an SPA, but SSR embeds data in window.__INITIAL_STATE__:

CODEBLOCK8

Step 3: Download media

- Images: curl -L -o each image to INLINECODE91
Video (if present): curl -L -o to INLINECODE93

Step 4: Generate file

Write to XHS_DIR/{title}.md:

CODEBLOCK9

Notes

- If curl can't reach Xiaohongshu, add INLINECODE95
INLINECODE96 often contains [话题]#tag[话题]# markers — strip them for clean text

Douyin Handler

Saves 抖音 videos. Requires douyin-downloader tool — check config.douyin.enabled first. If disabled, tell the user to install douyin-downloader and enable it in config.

Step 1: Extract and resolve link

From share text like 7.94 复制打开抖音...https://v.douyin.com/xxxxx/ DUL:/..., extract the v.douyin.com short link.

Resolve to full video ID:

CODEBLOCK10

Construct: INLINECODE101

Step 2: Download with douyin-downloader

1. Edit {config.douyin.tool_path}/config.yml: set link to the full URL
Run:

   cd {config.douyin.tool_path} && {config.douyin.python} run.py

3. Find files in Downloaded/ — .mp4, _cover.jpg, INLINECODE107

Step 3: Extract metadata

From _data.json:

- desc → description (title + hashtags)
INLINECODE110 → author
INLINECODE111 → likes
INLINECODE112 → comments
INLINECODE113 → shares
INLINECODE114 → Unix timestamp → convert to YYYY-MM-DD

Step 4: Copy media to vault

Copy video and cover to ATTACHMENTS/.

Step 5: Generate file

Write to DOUYIN_DIR/{title}.md:

CODEBLOCK12

Notes

- Short links MUST be resolved first — douyin-downloader cannot handle them
If download fails (0% success), cookies may be expired — tell user to re-run cookie fetcher:

  cd {config.douyin.tool_path} && {config.douyin.python} -m tools.cookie_fetcher --config config.yml

- create_time is Unix timestamp: INLINECODE118

GitHub Handler

Saves GitHub repositories.

Step 1: Fetch repo metadata

CODEBLOCK14

Extract: name, full_name, description, stargazers_count, language, license.spdx_id, topics, created_at, html_url, homepage.

Step 2: Fetch README

CODEBLOCK15

Decode base64 content to get README markdown.

Step 3: Summarize README

Do NOT copy the full README. Extract:

1. Core features: 3-5 bullet points, one sentence each
Quick start: only the most essential command or API snippet
Notes: limitations, dependencies, special requirements

Step 4: Generate file

Write to GITHUB_DIR/{repo-name}.md:

CODEBLOCK16

Notes

- Star count as raw number (no 1.2k formatting — for Dataview sorting)
Use name not full_name for filename
If API returns 403 (rate limit), tell user to wait
Strip emoji from filenames

Web Handler

Saves generic web pages. Fallback for URLs that don't match any platform handler.

Step 1: Extract content with defuddle

CODEBLOCK17

Extract: title, author, content (HTML), description, published, domain.

If defuddle is not installed, run npm install -g defuddle-cli first.

Step 2: Handle empty content (SPA fallback)

If content is empty or just <body></body> (client-rendered SPA), try fallback paths in order:

Path A: CDP browser (if config.web.cdp_enabled):

1. Open page: curl -s "{config.web.cdp_url}/new?url=<URL>" → get INLINECODE145
Scroll: INLINECODE146
Extract title: INLINECODE147
Extract body: INLINECODE148
Extract images: INLINECODE149
Extract videos: INLINECODE150
Close tab: INLINECODE151

Path B: WebFetch — use the WebFetch tool as fallback.

Path C: Bookmark mode — if URL is a web app/tool/dashboard or all paths fail, create a bookmark note from meta info only.

Step 3: Download media

Download images and videos to ATTACHMENTS/, replace with ![[filename]].

Step 4: Generate file

Write to WEB_DIR/{title}.md:

Content page:
CODEBLOCK18

Bookmark page (tool/app/SPA):
CODEBLOCK19

Notes

- defuddle does not work on SPAs (React/Vue client-rendered) — use CDP path
Always close CDP tabs after use

Obsidian Clipper

通用剪贴工具——将任意平台的URL保存到你的Obsidian仓库，包含本地媒体、标签和维基链接。

配置

首次运行时，从与本SKILL.md文件相同的目录读取config.yml。如果文件缺失，请告知用户将config.yml.example复制为config.yml并填写vault.base_path。

从配置派生的关键路径：

BASE = config.vault.base_path
ATTACHMENTS = BASE / config.vault.attachments_dir
X_DIR = BASE / config.vault.dirs.x
WECHAT_DIR = BASE / config.vault.dirs.wechat
XHS_DIR = BASE / config.vault.dirs.xiaohongshu
DOUYIN_DIR = BASE / config.vault.dirs.douyin
GITHUB_DIR = BASE / config.vault.dirs.github
WEB_DIR = BASE / config.vault.dirs.web

URL路由

匹配URL并分发到正确的处理器：

URL模式	处理器
x.com/ 或 twitter.com/	X处理器
mp.weixin.qq.com/*

微信处理器 |
| xhslink.com/ 或 xiaohongshu.com/ | 小红书处理器 |
| v.douyin.com/ 或 douyin.com/video/ | 抖音处理器 |
| github.com/{owner}/{repo}（无更深路径） | GitHub处理器 |
| 其他所有 | 网页处理器 |

共享规则

以下规则适用于所有处理器：

文件命名

- 使用内容标题作为文件名
去除/\:*?<>|以及所有表情符号/特殊Unicode字符
超过60个字符时截断
如果同名文件已存在，在覆盖前询问用户

媒体下载

- 将所有图片和视频下载到ATTACHMENTS/
图片命名：{标题-slug}-{n}.{ext}（slug不超过20个字符，不含表情符号）
视频命名：{标题-slug}.mp4或{标题-slug}-video-{n}.mp4
将远程URL替换为Obsidian维基链接：![[filename.ext]]

收藏原因字段

- 尽可能从对话上下文中推断
如果上下文不足，询问用户

跨平台链接

- 如果内容包含github.com/{owner}/{repo}链接，自动触发GitHub处理器创建GitHub笔记，然后添加双向维基链接

X处理器

保存X（Twitter）帖子和文章。

步骤1：获取帖子数据

bash
curl -s https://api.fxtwitter.com/{handle}/status/{tweet_id}

从URL中提取：x.com/{handle}/status/{id} → handle, tweet_id。

字段：

- tweet.author.name → 作者名称
tweet.author.screenname → @handle
tweet.text → 帖子正文（短帖子）
tweet.article → 长文（如果存在）
tweet.article.title → 文章标题
tweet.article.content.blocks → 结构化内容（Draft.js格式）
tweet.createdat → 发布日期
tweet.likes / tweet.retweets / tweet.views → 互动数据
tweet.media → 短帖子图片
tweet.article.media_entities → 文章内嵌图片

步骤2：确定内容类型

短帖子（tweet.article为null）：使用tweet.text，图片来自tweet.media.photos。

文章（tweet.article存在）：解析tweet.article.content.blocks（Draft.js格式）：

块类型	Markdown
unstyled	段落
header-one

# 标题 |
| header-two | ## 标题 |
| header-three | ### 标题 |
| unordered-list-item | - 项目 |
| ordered-list-item | 1. 项目 |
| atomic | 插入图片![[filename]] |
| blockquote | > 引用 |

内联样式：Bold → 文本，Italic → 文本。

对于atomic块：entityMap → value.data.mediaItems[].mediaId → 匹配mediaentities → mediainfo.originalimgurl。

步骤3：下载图片

将所有图片下载到ATTACHMENTS/。

步骤4：生成文件

写入X_DIR/{title}.md：

短帖子：
markdown

title: {作者名称}的推文 - {前30个字符}
author: {作者名称}
handle: @{screen_name}
source: {原始URL}
date: {发布日期 YYYY-MM-DD}
tags:
- clipping
- {自动标签}

{作者名称}的推文

X: @{handle} ({name}) | {date} | {likes} 赞 · {retweets} 转推 · {views} 浏览

{正文文本}

{图片 ![[filename]]}

文章：
markdown

title: {article.title}
author: {作者名称}
handle: @{screen_name}
source: {原始URL}
date: {发布日期 YYYY-MM-DD}
tags:
- clipping
- {自动标签}

{article.title}

X: @{handle} ({name}) | {date} | {likes} 赞 · {retweets} 转推 · {views} 浏览

{解析后的Markdown正文，包含![[本地图片]]}

备注

- fxtwitter API无需认证但可能有速率限制；可尝试vxtwitter.com作为备用
标题：文章使用article.title；短帖子使用{name}-{文本前15个字符}

微信处理器

保存微信公众号文章。

步骤1：获取文章数据

bash
curl -s https://down.mptext.top/api/public/v1/download?url={URL编码后的链接}&format=json

提取：title，nickname，createtime，content_noencode，link。

如果API返回204或失败，回退到defuddle或WebFetch。

步骤2：获取HTML（如果需要富内容）

如果内容包含图片或复杂格式：

bash
curl -s https://down.mptext.top/api/public/v1/download?url={URL编码后的链接}&format=html

从HTML中提取图片URL和格式化内容。

步骤3：下载图片

- 将每张图片下载到ATTACHMENTS/
微信图片URL：mmbiz.qpic.cn/mmbizpng/... → .png，mmbizjpg/... → .jpg
替换为![[filename.jpg]]

步骤4：生成文件

写入WECHAT_DIR/{title}.md：

markdown

title: {文章标题}
author: {公众号名称}
source: {原始链接}
date: {发布日期 YYYY-MM-DD}
tags:
- clipping
- {自动标签}

{文章标题}

公众号：{name} | {发布日期}

{正文，包含![[本地图片]]}

小红书处理器

保存小红书笔记（图片+视频）。

步骤1：解析短链接

如果URL是xhslink.com，跟随重定向获取真实URL并保留完整查询参数（特别是xsec_token）：

bash
curl -sL <短链接> -o /dev/null -w %{url_effective}

重要：需要完整的查询字符串。裸URL会返回空的noteDetailMap。

步骤2：获取SSR数据

小红书是SPA，但SSR将数据嵌入到window.INITIAL_STATE中：

bash
curl -sL <完整URL-带参数> \
-H User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10157) AppleWebKit/537.36 \
-H Accept: text/html | python3 -c
import sys, re, json

html = sys.stdin.read()
m = re.search(rwindow\.INITIAL_STATE\s=\s

obsidian-clipper黑曜石剪藏