Save Article with Images

Save web articles to local storage, supporting articles with images. Automatically downloads images, generates Markdown, and converts to PDF.

Triggers

- "save article"
"save this article"
"download article"
"clip article"

Quick Execution

Articles Without Images

CODEBLOCK0

Articles With Images

CODEBLOCK1

Complete Workflow

Step 1: Check if Article Has Images

Methods:

- Jina Reader returns content with ![Image](URL) format
Or original webpage has <img> tags

Decision:

- Images < 3 → Save Markdown directly, don't download images separately
Images ≥ 3 → Process with image workflow

Step 2: Create Directory Structure

CODEBLOCK2

Directory Structure:

reports/{article-name}/
├── {article-name}.md      # Markdown file
├── {article-name}.html    # HTML intermediate (optional)
├── {article-name}.pdf     # Final output (optional)
└── images/                # Image directory
    ├── image1.jpg
    ├── image2.png
    └── ...

Step 3: Fetch Article Content

Method A: Jina Reader (Recommended)

CODEBLOCK4

Pros: Auto-converts to Markdown, extracts image links
Cons: Some sites blocked

Method B: Browser Fetch

CODEBLOCK5

Step 4: Download Images

Single Image:

CODEBLOCK6

Batch Download (Python):

CODEBLOCK7

Image Naming:

- Sequential: image1.jpg, image2.png, ...
By content: cover.jpg, screenshot.png, ...

Step 5: Generate Markdown

Template:

CODEBLOCK8

Image Reference Format:

![Description](images/filename.ext)

Step 6: Convert to PDF (Optional)

Using Preset Styles:

CODEBLOCK10

PDF Configuration:

- Body: 16pt, line-height 1.8
Page: 6×9 inches, margins 1.5cm
Font: Noto Sans CJK SC

⚠️ Image Overflow Solution (Important)

Problem: Images too large (e.g., 1200px wide), exceed PDF page width (~432pt/6 inches)

Solution: Create CSS file to limit image max-width

Required CSS:
CODEBLOCK11

Correct PDF Generation Flow:
CODEBLOCK12

Key Points:

- ✅ Must add max-width: 100% or INLINECODE7
✅ Use relative paths INLINECODE8
❌ Don't render images at original size (will overflow)

Step 7: Send to Feishu

Send Markdown:
CODEBLOCK13

Send PDF:

message action=send channel=feishu target="user:ou_xxx" filePath="path/to/file.pdf"

Platform-Specific Handling

Source	Fetch Method	Image Handling
Twitter/X	Jina Reader	Download pbs.twimg.com images
WeChat Official Account

Twitter/X Articles

Image URL Format:
CODEBLOCK15

Download Command:

# Get best quality
curl -o "images/image1.jpg" "https://pbs.twimg.com/media/XXXXX?format=jpg&name=large"

WeChat Official Account Articles

Problem: WeChat has anti-hotlinking, direct download fails

Solutions:

1. Use browser to open article
Save screenshot
Or use Camoufox tool

CODEBLOCK17

Checklist

After saving, verify:

CODEBLOCK18

Error Handling

Error	Cause	Solution
Image download failed	Anti-hotlinking/Network	Use browser or lower quality
PDF generation failed

File Locations

Type	Directory
Simple articles	INLINECODE9
Articles with images

reports/{article-name}/ |
| Temporary files | /tmp/article-{id}/ |

Skill Version: 1.0.0
Created: 2026-03-17

保存带图片的文章

将网页文章保存到本地存储，支持带图片的文章。自动下载图片、生成Markdown并转换为PDF。

触发词

- 保存文章
保存这篇文章
下载文章
剪藏文章

快速执行

无图片文章

1. 获取文章内容（Jina Reader或浏览器）
保存到 saved-articles/{标题}-{日期}.md
发送文件到飞书

带图片文章

1. 创建目录 reports/{文章名称}/
创建 images/ 子目录
下载所有图片到 images/
生成 Markdown（相对路径引用）
转换为 PDF
发送 PDF 到飞书

完整工作流程

步骤 1：检查文章是否包含图片

方法：

- Jina Reader 返回的内容包含格式
或原始网页包含标签

决策：

- 图片 < 3 → 直接保存 Markdown，不单独下载图片
图片 ≥ 3 → 使用图片工作流程处理

步骤 2：创建目录结构

bash
mkdir -p ~/.openclaw/workspace/reports/{文章名称}/images/

目录结构：

reports/{文章名称}/
├── {文章名称}.md # Markdown 文件
├── {文章名称}.html # HTML 中间文件（可选）
├── {文章名称}.pdf # 最终输出（可选）
└── images/ # 图片目录
├── image1.jpg
├── image2.png
└── ...

步骤 3：获取文章内容

方法 A：Jina Reader（推荐）

bash
curl -s https://r.jina.ai/URL

优点：自动转换为 Markdown，提取图片链接
缺点：部分网站被屏蔽

方法 B：浏览器获取

bash

打开网页

browser action=open url=URL

获取内容

browser action=act kind=evaluate fn=() => document.body.innerText

获取图片

browser action=act kind=evaluate fn=() => { const imgs = document.querySelectorAll(img); return JSON.stringify(Array.from(imgs).map(img => ({ src: img.src, alt: img.alt }))); }

步骤 4：下载图片

单张图片：

bash
curl -o images/image1.jpg https://example.com/image.jpg

批量下载（Python）：

python
import requests
from pathlib import Path

def downloadimages(imageurls, output_dir):
下载图片列表
outputdir = Path(outputdir)
outputdir.mkdir(parents=True, existok=True)

for i, url in enumerate(image_urls, 1):
try:
# 获取扩展名
ext = url.split(.)[-1].split(?)[0]
if ext not in [jpg, jpeg, png, gif, webp]:
ext = jpg

# 下载
resp = requests.get(url, timeout=30, headers={
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
})

if resp.status_code == 200:
filename = fimage{i}.{ext}
(outputdir / filename).writebytes(resp.content)
print(f✅ {filename})
else:
print(f❌ HTTP {resp.status_code}: {url})
except Exception as e:
print(f❌ {e}: {url})

使用方式

download_images([url1, url2], images/)

图片命名：

- 顺序命名：image1.jpg、image2.png、...
按内容命名：cover.jpg、screenshot.png、...

步骤 5：生成 Markdown

模板：

markdown

{文章标题}

来源：{URL}
作者：{作者}
发布时间：{日期}

{内容}

图片

图 1：{描述}
图 2：{描述}

保存时间：{时间戳}

图片引用格式：
markdown

步骤 6：转换为 PDF（可选）

使用预设样式：

bash

CSS 文件

CSS_FILE=~/.openclaw/workspace/templates/mobile-friendly.css

转换为 HTML

pandoc {文章名称}.md -o {文章名称}.html --standalone --css=$CSS_FILE

生成 PDF

weasyprint {文章名称}.html {文章名称}.pdf

PDF 配置：

- 正文：16pt，行高 1.8
页面：6×9 英寸，页边距 1.5cm
字体：Noto Sans CJK SC

⚠️ 图片溢出解决方案（重要）

问题：图片过大（如 1200px 宽），超出 PDF 页面宽度（约 432pt/6 英寸）

解决方案：创建 CSS 文件限制图片最大宽度

所需 CSS：
css
/ 防止图片溢出 /
img {
max-width: 100%;
height: auto;
display: block;
margin: 1em auto;
}

/ images/ 目录下的图片 - 90% 宽度 /
img[src^=images/] {
max-width: 90%;
margin: 0.5em auto;
}

/ 正文样式 /
body {
max-width: 100%;
padding: 1cm;
}

正确的 PDF 生成流程：
bash

1. 创建 CSS 文件（在文章目录中）

cat > style.css << EOF
img { max-width: 100%; height: auto; }
img[src^=images/] { max-width: 90%; }
EOF

2. 使用 CSS 生成 HTML

pandoc {文章名称}.md -o {文章名称}.html --standalone --css=style.css

3. 生成 PDF

weasyprint {文章名称}.html {文章名称}.pdf

关键点：

- ✅ 必须添加 max-width: 100% 或 max-width: 90%
✅ 使用相对路径 images/xxx.jpg
❌ 不要按原始尺寸渲染图片（会溢出）

步骤 7：发送到飞书

发送 Markdown：

message action=send channel=feishu target=user:ou_xxx filePath=path/to/file.md

发送 PDF：

message action=send channel=feishu target=user:ou_xxx filePath=path/to/file.pdf

平台特定处理

来源	获取方法	图片处理
Twitter/X	Jina Reader	下载 pbs.twimg.com 图片
微信公众号

Twitter/X 文章

图片 URL 格式：

https://pbs.twimg.com/media/XXXXX?format=jpg&name=small

下载命令：
bash

获取最佳质量

curl -o images/image1.jpg https://pbs.twimg.com/media/XXXXX?format=jpg&name=large

微信公众号文章

问题：微信有防盗链，直接下载失败

解决方案：

1. 使用浏览器打开文章
保存截图
或使用 Camoufox 工具

bash

使用 agent-reach 工具

cd ~/.agent-reach/tools/wechat-article-for-ai
python3 main.py https://mp.weixin.qq.com/s/ARTICLE_ID

检查清单

保存后，验证：

□ Markdown 文件已生成
□ 所有图片下载成功
□ 图片相对路径正确
□ 图片显示正常（本地预览）
□ PDF 生成成功（可选）
□ 文件已发送到飞书

错误处理

错误	原因	解决方案
图片下载失败	防盗链/网络问题	使用浏览器或降低质量
PDF 生成失败

缺少字体/依赖 | 检查 weasyprint

save-article-with-images保存图文文章