youtube-apify-transcript
Fetch YouTube transcripts via APIFY API (works from cloud IPs, bypasses YouTube bot detection).
Why APIFY?
YouTube blocks transcript requests from cloud IPs (AWS, GCP, etc.). APIFY runs the request through residential proxies, bypassing bot detection reliably.
Free Tier
- - $5/month free credits (~714 videos)
- No credit card required
- Perfect for personal use
Cost
- - $0.007 per video (less than 1 cent!)
- Track usage at: https://console.apify.com/billing
Links
Setup
- 1. Create free APIFY account: https://apify.com/
- Get your API token: https://console.apify.com/account/integrations
- Set environment variable:
CODEBLOCK0
Usage
Basic Usage
CODEBLOCK1
Options
CODEBLOCK2
Caching (saves money!)
Transcripts are cached locally by default. Repeat requests for the same video cost $0.
CODEBLOCK3
Cache location: .cache/ in skill directory (override with YT_TRANSCRIPT_CACHE_DIR env var)
Batch Mode
Process multiple videos at once:
CODEBLOCK4
Output Formats
Text (default):
CODEBLOCK5
JSON (--json):
CODEBLOCK6
Agent Instructions
When the user asks to summarize a YouTube video, first fetch the transcript using the script, then summarize the transcript text directly using your own model capabilities. Do NOT use --summarize flag.
Error Handling
The script handles common errors:
- - Invalid YouTube URL
- Video has no transcript
- API quota exceeded
- Network errors
Metadata
CODEBLOCK7
youtube-apify-transcript
通过 APIFY API 获取 YouTube 字幕(适用于云 IP,可绕过 YouTube 机器人检测)。
为什么选择 APIFY?
YouTube 会屏蔽来自云 IP(AWS、GCP 等)的字幕请求。APIFY 通过住宅代理发送请求,可靠地绕过机器人检测。
免费套餐
- - 每月 $5 免费额度(约 714 个视频)
- 无需信用卡
- 非常适合个人使用
费用
- - 每个视频 $0.007(不到 1 美分!)
- 用量查询:https://console.apify.com/billing
链接
设置
- 1. 创建免费 APIFY 账号:https://apify.com/
- 获取 API 令牌:https://console.apify.com/account/integrations
- 设置环境变量:
bash
添加到 ~/.bashrc 或 ~/.zshrc
export APIFY
APITOKEN=apify
apiYOUR
TOKENHERE
安装 Python 依赖
pip install requests
或使用 .env 文件(切勿提交此文件!)
echo APIFY
APITOKEN=apify
apiYOUR
TOKENHERE >> .env
使用方法
基本用法
bash
以文本形式获取字幕(默认使用缓存)
python3 scripts/fetch
transcript.py https://www.youtube.com/watch?v=VIDEOID
短链接同样适用
python3 scripts/fetch
transcript.py https://youtu.be/VIDEOID
选项
bash
输出到文件
python3 scripts/fetch_transcript.py URL --output transcript.txt
JSON 格式(包含时间戳)
python3 scripts/fetch_transcript.py URL --json
两者结合:JSON 输出到文件
python3 scripts/fetch_transcript.py URL --json --output transcript.json
指定语言偏好
python3 scripts/fetch_transcript.py URL --lang de
缓存(省钱!)
字幕默认在本地缓存。重复请求同一视频不产生费用。
bash
第一次请求:从 APIFY 获取($0.007)
python3 scripts/fetch_transcript.py URL
第二次请求:使用缓存(免费!)
python3 scripts/fetch_transcript.py URL
输出:[cached] Transcript for: VIDEO_ID
绕过缓存(强制重新获取)
python3 scripts/fetch_transcript.py URL --no-cache
查看缓存统计
python3 scripts/fetch_transcript.py --cache-stats
清除所有缓存字幕
python3 scripts/fetch_transcript.py --clear-cache
缓存位置:技能目录下的 .cache/(可通过 YTTRANSCRIPTCACHE_DIR 环境变量覆盖)
批量模式
同时处理多个视频:
bash
创建包含 URL 的文件(每行一个)
cat > urls.txt << EOF
https://youtube.com/watch?v=VIDEO1
https://youtu.be/VIDEO2
https://youtube.com/watch?v=VIDEO3
EOF
处理所有 URL
python3 scripts/fetch_transcript.py --batch urls.txt
输出:
[1/3] Fetching VIDEO1...
[2/3] [cached] VIDEO2
[3/3] Fetching VIDEO3...
Batch complete: 2 fetched, 1 cached, 0 failed
[Cost: ~$0.014 for 2 API call(s)]
批量处理并输出 JSON 到文件
python3 scripts/fetch
transcript.py --batch urls.txt --json --output alltranscripts.json
输出格式
文本(默认):
Hello and welcome to this video.
Today were going to talk about...
JSON(--json):
json
{
video_id: dQw4w9WgXcQ,
title: Video Title,
transcript: [
{start: 0.0, duration: 2.5, text: Hello and welcome},
{start: 2.5, duration: 3.0, text: to this video}
],
full_text: Hello and welcome to this video...
}
代理指令
当用户要求总结 YouTube 视频时,先使用脚本获取字幕,然后直接利用自身模型能力对字幕文本进行总结。不要使用 --summarize 标志。
错误处理
脚本可处理常见错误:
- - 无效的 YouTube URL
- 视频无字幕
- API 配额超限
- 网络错误
元数据
yaml
metadata:
clawdbot:
emoji: 📹
requires:
env: [APIFYAPITOKEN]
bins: [python3]
python:
packages: [requests]