youtube-apify-transcript

通过 APIFY API 获取 YouTube 字幕（适用于云 IP，可绕过 YouTube 机器人检测）。

为什么选择 APIFY？

YouTube 会屏蔽来自云 IP（AWS、GCP 等）的字幕请求。APIFY 通过住宅代理发送请求，可靠地绕过机器人检测。

免费套餐

- 每月 $5 免费额度（约 714 个视频）
无需信用卡
非常适合个人使用

费用

- 每个视频 $0.007（不到 1 美分！）
用量查询：https://console.apify.com/billing

链接

设置

1. 创建免费 APIFY 账号：https://apify.com/
获取 API 令牌：https://console.apify.com/account/integrations
设置环境变量：

bash

添加到 ~/.bashrc 或 ~/.zshrc

export APIFYAPITOKEN=apifyapiYOURTOKENHERE

安装 Python 依赖

pip install requests

或使用 .env 文件（切勿提交此文件！）

echo APIFYAPITOKEN=apifyapiYOURTOKENHERE >> .env

使用方法

基本用法

bash

以文本形式获取字幕（默认使用缓存）

python3 scripts/fetchtranscript.py https://www.youtube.com/watch?v=VIDEOID

短链接同样适用

python3 scripts/fetchtranscript.py https://youtu.be/VIDEOID

选项

bash

输出到文件

python3 scripts/fetch_transcript.py URL --output transcript.txt

JSON 格式（包含时间戳）

python3 scripts/fetch_transcript.py URL --json

两者结合：JSON 输出到文件

python3 scripts/fetch_transcript.py URL --json --output transcript.json

指定语言偏好

python3 scripts/fetch_transcript.py URL --lang de

缓存（省钱！）

字幕默认在本地缓存。重复请求同一视频不产生费用。

bash

第一次请求：从 APIFY 获取（$0.007）

python3 scripts/fetch_transcript.py URL

第二次请求：使用缓存（免费！）

python3 scripts/fetch_transcript.py URL

输出：[cached] Transcript for: VIDEO_ID

绕过缓存（强制重新获取）

python3 scripts/fetch_transcript.py URL --no-cache

查看缓存统计

python3 scripts/fetch_transcript.py --cache-stats

清除所有缓存字幕

python3 scripts/fetch_transcript.py --clear-cache

缓存位置：技能目录下的 .cache/（可通过 YTTRANSCRIPTCACHE_DIR 环境变量覆盖）

批量模式

同时处理多个视频：

bash

创建包含 URL 的文件（每行一个）

cat > urls.txt << EOF
https://youtube.com/watch?v=VIDEO1
https://youtu.be/VIDEO2
https://youtube.com/watch?v=VIDEO3
EOF

处理所有 URL

python3 scripts/fetch_transcript.py --batch urls.txt

输出：

[1/3] Fetching VIDEO1...

[2/3] [cached] VIDEO2

[3/3] Fetching VIDEO3...

Batch complete: 2 fetched, 1 cached, 0 failed

[Cost: ~$0.014 for 2 API call(s)]

批量处理并输出 JSON 到文件

python3 scripts/fetchtranscript.py --batch urls.txt --json --output alltranscripts.json

输出格式

文本（默认）：

Hello and welcome to this video.
Today were going to talk about...

JSON（--json）：
json
{
video_id: dQw4w9WgXcQ,
title: Video Title,
transcript: [
{start: 0.0, duration: 2.5, text: Hello and welcome},
{start: 2.5, duration: 3.0, text: to this video}
],
full_text: Hello and welcome to this video...
}

代理指令

当用户要求总结 YouTube 视频时，先使用脚本获取字幕，然后直接利用自身模型能力对字幕文本进行总结。不要使用 --summarize 标志。

错误处理

脚本可处理常见错误：

- 无效的 YouTube URL
视频无字幕
API 配额超限
网络错误

元数据

yaml
metadata:
clawdbot:
emoji: 📹
requires:
env: [APIFYAPITOKEN]
bins: [python3]
python:
packages: [requests]

youtube-apify-transcriptYouTube转录获取