YouTube Transcript (Captions-Only)
This skill extracts transcripts from existing YouTube captions.
Primary behavior
- - Prefer manual subtitles when available.
- Fall back to auto-generated captions.
- Output either:
-
JSON segments (default) or
-
plain text (
--text)
- - Cache results locally in SQLite for speed.
Reliability behavior
- - If YouTube blocks anonymous access (bot-check), provide cookies.txt.
- If
yt-dlp reports no captions for a video, the script tries a fallback:
1) YouTube’s
transcript panel (youtubei
get_transcript) when accessible
This published version intentionally does not call third-party transcript providers.
Privacy note: This published version only contacts YouTube directly (via yt-dlp and the transcript panel fallback). It does not send video IDs/URLs to third-party transcript providers.
Cookies: Cookies are treated as secrets.
- - The script supports
--cookies / YT_TRANSCRIPT_COOKIES, but does not auto-load cookies from inside the skill directory. - Store cookies under
~/.config/yt-transcript/.
Path safety: This skill restricts --cookies and --cache paths to approved directories.
- - cookies allowed under: INLINECODE9
- cache allowed under:
{baseDir}/cache/ and INLINECODE11
How to run
Script path:
Typical usage:
- - INLINECODE13
- INLINECODE14
- INLINECODE15
- INLINECODE16
Cookies (optional, but often required on VPS IPs):
- - INLINECODE17
- or set env var: INLINECODE18
Publishing safety note: Cookies are optional, so YT_TRANSCRIPT_COOKIES is intentionally not required by skill metadata. Only set it if you need authenticated access.
Best practice: store cookies outside the skill folder (so you never accidentally publish them), e.g. ~/.config/yt-transcript/youtube-cookies.txt, and point to it via --cookies or YT_TRANSCRIPT_COOKIES.
What the script returns
JSON mode (default)
A JSON object:
- -
video_id: 11-char id - INLINECODE24 : chosen language
- INLINECODE25 :
manual | auto | INLINECODE28 - INLINECODE29 : list of
{ start, duration, text } (or text-only when --no-ts)
Text mode (--text)
A newline-separated transcript.
- - By default timestamps are included as
[12.34s]. - Use
--no-ts to output only the text lines.
Caching
Default cache DB:
Cache key includes:
- -
video_id, lang, source, include_timestamp, INLINECODE40
Cookie handling (important)
- - Cookies must be in Netscape cookies.txt format.
- Treat cookies as secrets.
- Never commit / publish cookies to ClawHub.
Recommended local path (ignored by git/publish):
- -
{baseDir}/cache/youtube-cookies.txt (chmod 600)
Notes (safety + reliability)
- - Only accept a YouTube URL or an 11-character video ID.
- Do not forward arbitrary user-provided flags into the command.
- If
yt-dlp is missing, instruct the user to install it (recommended):
- install pipx
-
pipx install yt-dlp
- ensure
yt-dlp is on PATH
技能名称: yt_transcript
详细描述:
YouTube 转录(仅字幕)
此技能从 现有 YouTube 字幕中提取转录文本。
主要行为
- - 优先使用 手动字幕(如有)。
- 回退至 自动生成字幕。
- 输出格式:
-
JSON 分段(默认)或
-
纯文本(--text)
可靠性行为
- - 若 YouTube 阻止匿名访问(机器人检测),需提供 cookies.txt。
- 若 yt-dlp 报告视频无字幕,脚本将尝试回退:
1) YouTube 的
转录面板(youtubei get_transcript)(如可访问)
此发布版本有意 不 调用第三方转录提供商。
隐私说明: 此发布版本仅直接联系 YouTube(通过 yt-dlp 和转录面板回退)。它 不会 将视频 ID/URL 发送给第三方转录提供商。
Cookies: Cookies 被视为机密信息。
- - 脚本支持 --cookies / YTTRANSCRIPTCOOKIES,但 不会 从技能目录内自动加载 cookies。
- 将 cookies 存储在 ~/.config/yt-transcript/ 下。
路径安全: 此技能将 --cookies 和 --cache 路径限制在批准的目录内。
- - cookies 允许路径:~/.config/yt-transcript/
- 缓存允许路径:{baseDir}/cache/ 和 ~/.config/yt-transcript/
如何运行
脚本路径:
- - {baseDir}/scripts/yt_transcript.py
典型用法:
- - python3 {baseDir}/scripts/yttranscript.py urlorid>
- python3 {baseDir}/scripts/yttranscript.py --lang en
- python3 {baseDir}/scripts/yttranscript.py --text
- python3 {baseDir}/scripts/yt_transcript.py --no-ts
Cookies(可选,但在 VPS IP 上通常需要):
- - python3 {baseDir}/scripts/yttranscript.py --cookies /path/to/youtube-cookies.txt
- 或设置环境变量:YTTRANSCRIPT_COOKIES=/path/to/youtube-cookies.txt
发布安全说明: Cookies 是可选的,因此技能元数据有意 不要求 YTTRANSCRIPTCOOKIES。仅在需要认证访问时设置。
最佳实践: 将 cookies 存储在 技能文件夹之外(以避免意外发布),例如 ~/.config/yt-transcript/youtube-cookies.txt,并通过 --cookies 或 YTTRANSCRIPTCOOKIES 指向该路径。
脚本返回内容
JSON 模式(默认)
一个 JSON 对象:
- - video_id:11 字符 ID
- lang:所选语言
- source:manual | auto | panel
- segments:{ start, duration, text } 列表(使用 --no-ts 时仅包含文本)
文本模式(--text)
换行符分隔的转录文本。
- - 默认包含时间戳,格式为 [12.34s]。
- 使用 --no-ts 仅输出文本行。
缓存
默认缓存数据库:
- - {baseDir}/cache/transcripts.sqlite
缓存键包含:
- - videoid、lang、source、includetimestamp、format
Cookie 处理(重要)
- - Cookies 必须采用 Netscape cookies.txt 格式。
- 将 cookies 视为 机密信息。
- 切勿 将 cookies 提交或发布到 ClawHub。
推荐的本地路径(被 git/发布忽略):
- - {baseDir}/cache/youtube-cookies.txt(chmod 600)
备注(安全 + 可靠性)
- - 仅接受 YouTube URL 或 11 字符视频 ID。
- 不要将用户提供的任意标志转发到命令中。
- 若缺少 yt-dlp,指导用户安装(推荐):
- 安装 pipx
- pipx install yt-dlp
- 确保 yt-dlp 在 PATH 中