Gmail Link Archiver
Archive web content from your email links. This skill connects to Gmail via IMAP, filters emails by a subject prefix keyword, crawls every link using Playwright (headless Chromium), converts pages to Markdown, and saves them to your OpenClaw workspace.
Quick Start
1. Install dependencies (one-time)
CODEBLOCK0
This automatically installs:
- -
playwright (Python) + Chromium browser binary - INLINECODE1 for HTML→Markdown conversion
2. First run — interactive setup
CODEBLOCK1
The first run will prompt you for:
| Setting | Description | Default |
|---|
| IMAP server | Gmail IMAP host | INLINECODE2 |
| IMAP port |
SSL port |
993 |
| Gmail address | Your full email address | — |
| App password | Gmail App Password (NOT your regular password) | — |
| Default mailbox | IMAP folder to search |
INBOX |
| Subject prefix | Filter emails whose subject starts with this | — |
| Workspace path | Where to save Markdown files |
~/openclaw-workspace/mail-archive |
Credentials are saved locally to ~/.config/gmail-link-archiver/config.json with 0600 permissions. They are never transmitted or logged.
Gmail App Password: You need to generate an App Password at
https://myaccount.google.com/apppasswords (requires 2FA enabled).
3. Subsequent runs
After the first setup, subsequent runs will read credentials from the saved config:
CODEBLOCK2
How It Works
- 1. Connect — Authenticates to Gmail via IMAP SSL
- Filter — Searches the specified mailbox for emails matching the subject prefix
- Extract — Parses email bodies (HTML + plain text) to find HTTP/HTTPS links
- Crawl — Opens each link in headless Chromium via Playwright (bypasses bot detection, renders JavaScript)
- Convert — Transforms the crawled HTML into clean Markdown with metadata headers
- Save — Writes each Markdown file to the workspace directory
Pipeline Diagram
CODEBLOCK3
CLI Reference
CODEBLOCK4
Output Format
Each crawled page is saved as a Markdown file with YAML frontmatter:
CODEBLOCK5
Files are named using a sanitized version of the URL plus a short hash for uniqueness.
Example Usage with Claude
Ask Claude to run the archiver:
"Run the Gmail Link Archiver to crawl links from my emails with subject starting with '[ReadLater]'"
Claude will execute:
CODEBLOCK6
Or to set up fresh:
"Set up the Gmail Link Archiver with my credentials"
CODEBLOCK7
Troubleshooting
"App password" rejected?
- - Ensure 2-Step Verification is enabled on your Google account
- Generate a new App Password at https://myaccount.google.com/apppasswords
- Use the 16-character password without spaces
Playwright/Chromium issues?
CODEBLOCK8
No emails found?
- - Check the mailbox name (use
INBOX, [Gmail]/All Mail, etc.) - Verify the subject prefix matches exactly (case-sensitive)
- Try a broader prefix
Permission denied on config file?
CODEBLOCK9
Security
- - Credentials are stored locally at INLINECODE10
- File permissions are set to
0600 (owner read/write only) - Credentials are never transmitted anywhere except to the IMAP server
- Credentials are never logged or printed to stdout
- Use Gmail App Passwords (not your main Google password)
- The config directory has
0700 permissions
Requirements
- - Python 3.8+
- Linux (Ubuntu/Debian) for MVP
- Gmail account with IMAP enabled and App Password
- Internet connection for IMAP and web crawling
Gmail 链接归档器
从邮件链接中归档网页内容。该技能通过 IMAP 连接到 Gmail,按主题前缀关键词筛选邮件,使用 Playwright(无头 Chromium)爬取每个链接,将页面转换为 Markdown,并保存到你的 OpenClaw 工作区。
快速开始
1. 安装依赖(一次性)
bash
bash references/setup.sh
这将自动安装:
- - playwright(Python)+ Chromium 浏览器二进制文件
- 用于 HTML→Markdown 转换的 html2text
2. 首次运行 — 交互式设置
bash
python3 references/gmaillinkarchiver.py
首次运行将提示你输入以下内容:
| 设置项 | 描述 | 默认值 |
|---|
| IMAP 服务器 | Gmail IMAP 主机 | imap.gmail.com |
| IMAP 端口 |
SSL 端口 | 993 |
| Gmail 地址 | 你的完整邮箱地址 | — |
| 应用密码 | Gmail 应用密码(不是你的常规密码) | — |
| 默认邮箱 | 要搜索的 IMAP 文件夹 | INBOX |
| 主题前缀 | 筛选主题以此开头的邮件 | — |
| 工作区路径 | Markdown 文件的保存位置 | ~/openclaw-workspace/mail-archive |
凭据将保存在本地 ~/.config/gmail-link-archiver/config.json,权限为 0600。这些凭据绝不会被传输或记录。
Gmail 应用密码:你需要在 https://myaccount.google.com/apppasswords 生成一个应用密码(需要开启两步验证)。
3. 后续运行
首次设置后,后续运行将从保存的配置中读取凭据:
bash
使用保存的配置默认值
python3 references/gmail
linkarchiver.py
临时覆盖邮箱和前缀
python3 references/gmail
linkarchiver.py --mailbox INBOX --subject-prefix [Newsletter]
保存到不同的工作区
python3 references/gmail
linkarchiver.py --workspace ~/my-archive
限制爬取链接数量
python3 references/gmail
linkarchiver.py --max-links 10
重新运行设置向导
python3 references/gmail
linkarchiver.py --reconfigure
工作原理
- 1. 连接 — 通过 IMAP SSL 认证到 Gmail
- 筛选 — 在指定邮箱中搜索匹配主题前缀的邮件
- 提取 — 解析邮件正文(HTML + 纯文本)以查找 HTTP/HTTPS 链接
- 爬取 — 通过 Playwright 在无头 Chromium 中打开每个链接(绕过机器人检测,渲染 JavaScript)
- 转换 — 将爬取的 HTML 转换为带有元数据头部的干净 Markdown
- 保存 — 将每个 Markdown 文件写入工作区目录
流程示意图
Gmail IMAP ──► 按主题筛选 ──► 提取链接
│
▼
Playwright + Chromium(无头)
│
▼
HTML → Markdown(html2text)
│
▼
保存到 OpenClaw 工作区
CLI 参考
usage: gmaillinkarchiver.py [-h] [--mailbox MAILBOX]
[--subject-prefix PREFIX]
[--workspace PATH]
[--max-links N]
[--reconfigure]
选项:
--mailbox, -m 要搜索的 IMAP 邮箱(默认:来自配置)
--subject-prefix, -s 筛选邮件的主题前缀
--workspace, -w 保存 Markdown 文件的目录
--max-links 最大爬取链接数(默认:50)
--reconfigure 重新运行设置向导
输出格式
每个爬取的页面保存为带有 YAML 前置元数据的 Markdown 文件:
markdown
source: https://example.com/article
crawled_at: 2026-03-27T12:00:00Z
文章标题
转换为干净 Markdown 的文章内容...
文件使用 URL 的净化版本加上短哈希命名,以确保唯一性。
与 Claude 配合使用的示例
让 Claude 运行归档器:
运行 Gmail 链接归档器,从主题以 [ReadLater] 开头的邮件中爬取链接
Claude 将执行:
bash
python3 references/gmaillinkarchiver.py --subject-prefix [ReadLater]
或者全新设置:
用我的凭据设置 Gmail 链接归档器
bash
python3 references/gmaillinkarchiver.py --reconfigure
故障排除
应用密码被拒绝?
- - 确保你的 Google 账号已开启两步验证
- 在 https://myaccount.google.com/apppasswords 生成新的应用密码
- 使用不带空格的 16 位密码
Playwright/Chromium 问题?
bash
重新安装 Chromium
python3 -m playwright install chromium
安装系统依赖(Linux)
sudo python3 -m playwright install-deps chromium
没有找到邮件?
- - 检查邮箱名称(使用 INBOX、[Gmail]/All Mail 等)
- 验证主题前缀完全匹配(区分大小写)
- 尝试更宽泛的前缀
配置文件权限被拒绝?
bash
chmod 600 ~/.config/gmail-link-archiver/config.json
安全性
- - 凭据本地存储在 ~/.config/gmail-link-archiver/config.json
- 文件权限设置为 0600(仅所有者可读/写)
- 凭据绝不会传输到除 IMAP 服务器以外的任何地方
- 凭据绝不会被记录或打印到标准输出
- 使用 Gmail 应用密码(而不是你的主 Google 密码)
- 配置目录具有 0700 权限
系统要求
- - Python 3.8+
- Linux(Ubuntu/Debian)用于 MVP
- 已启用 IMAP 并拥有应用密码的 Gmail 账号
- 用于 IMAP 和网页爬取的互联网连接