TubeScribe 🎬

Turn any YouTube video into a polished document + audio summary.

Drop a YouTube link → get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.

💸 Free & No Paid APIs

- No subscriptions or API keys — works out of the box
Local processing — transcription, speaker detection, and TTS run on your machine
Network access — fetching from YouTube (captions, metadata, comments) requires internet
No data uploaded — nothing is sent to external services; all processing stays on your machine
Safe sub-agent — spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube

✨ Features

- 📄 Transcript with summary and key quotes — Export as DOCX, HTML, or Markdown
🎯 Smart Speaker Detection — Automatically identifies participants
🔊 Audio Summaries — Listen to key points (MP3/WAV)
📝 Clickable Timestamps — Every quote links directly to that moment in the video
💬 YouTube Comments — Viewer sentiment analysis and best comments
📋 Queue Support — Send multiple links, they get processed in order
🚀 Non-Blocking Workflow — Conversation continues while video processes in background

🎬 Works With Any Video

- Interviews & podcasts (multi-speaker detection)
Lectures & tutorials (single speaker)
Music videos (lyrics extraction)
News & documentaries
Any YouTube content with captions

Quick Start

When user sends a YouTube URL:

1. Spawn sub-agent with the full pipeline task immediately
Reply: "🎬 TubeScribe is processing — I'll let you know when it's ready!"
Continue conversation (don't wait!)
Sub-agent notification will announce completion with title and details

DO NOT BLOCK — spawn and move on instantly.

First-Time Setup

Run setup to check dependencies and configure defaults:

CODEBLOCK0

This checks: summarize CLI, pandoc, ffmpeg, INLINECODE3

Full Workflow (Single Sub-Agent)

Spawn ONE sub-agent that does the entire pipeline:

CODEBLOCK1bash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"

Note the **Source** and **Output** paths printed by the script. Use those exact paths in subsequent steps.

### Step 2: Read source JSON
Read the Source path from Step 1 output and note:
- metadata.title (for filename)
- metadata.video_id
- metadata.channel, upload_date, duration_string

### Step 3: Create formatted markdown
Write to the Output path from Step 1:

1. `# **<title>**`
---
2. Video info block — Channel, Date, Duration, URL (clickable). Empty line between each field.
---
3. `## **Participants**` — table with bold headers:

| Name | Role | Description |
|----------|----------|-----------------|

---
4. `## **Summary**` — 3-5 paragraphs of prose
---
5. `## **Key Quotes**` — 5 best with clickable YouTube timestamps. Format each as:

"Quote text here." - 12:34

"Another quote." - 25:10

   Use regular dash `-`, NOT em dash `—`. Do NOT use blockquotes `>`. Plain paragraphs only.
---
6. `## **Viewer Sentiment**` (if comments exist)
---
7. `## **Best Comments**` (if comments exist) — Top 5, NO lines between them:

Comment text here.

- ▲ 123 @AuthorName

Next comment text here.

- ▲ 45 @AnotherAuthor

   Attribution line: dash + italic. Just blank line between comments, NO `---` separators.

---
8. `## **Full Transcript**` — merge segments, speaker labels, clickable timestamps

### Step 4: Create DOCX
Clean the title for filename (remove special chars), then:

bash
pandoc path> -o ~/Documents/TubeScribe/title>.docx


### Step 5: Generate audio
Write the summary text to a temp file, then use TubeScribe's built-in audio generation:

bash

Write summary to temp file (use python3 to write, avoids shell escaping issues)

python3 -c "
text = '''YOUR SUMMARY TEXT HERE'''
with open('dir>/tubescribeid>summary.txt', 'w') as f:
f.write(text)
"

Generate audio (auto-detects engine, voice, format from config)

python3 skills/tubescribe/scripts/tubescribe.py \ --generate-audio dir>/tubescribeid>summary.txt \ --audio-output ~/Documents/TubeScribe/title>summary

This reads `~/.tubescribe/config.json` and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.

### Step 6: Cleanup

bash python3 skills/tubescribe/scripts/tubescribe.py --cleanup


### Step 7: Open folder

bash open ~/Documents/TubeScribe/ CODEBLOCK9

After spawning, reply immediately:

🎬 TubeScribe is processing - I'll let you know when it's ready!

Then continue the conversation. The sub-agent notification announces completion.

Configuration

Config file: INLINECODE4

CODEBLOCK10

Output Options
Option Default Description
INLINECODE5 INLINECODE6 Where to save files
INLINECODE7
`true` | Open output folder when done |

Option	Default	Description
INLINECODE5	INLINECODE6	Where to save files
INLINECODE7

Document Options
Option Default Values Description
INLINECODE13 INLINECODE14 INLINECODE15, `html`, INLINECODE17 Output format
INLINECODE18
`pandoc` | `pandoc` | Converter for DOCX (falls back to HTML) |

Option	Default	Values	Description
INLINECODE13	INLINECODE14	INLINECODE15, `html`, INLINECODE17	Output format
INLINECODE18

Audio Options
Option Default Values Description
INLINECODE21 INLINECODE22 INLINECODE23, INLINECODE24 Generate audio summary
INLINECODE25
`mp3` | `mp3`, `wav` | Audio format (mp3 needs ffmpeg) |

Option	Default	Values	Description
INLINECODE21	INLINECODE22	INLINECODE23, INLINECODE24	Generate audio summary
INLINECODE25

MLX-Audio Options (preferred on Apple Silicon)
Option Default Description
INLINECODE34 INLINECODE35 mlx-audio venv location
INLINECODE36
`mlx-community/Kokoro-82M-bf16` | MLX model to use |

Option	Default	Description
INLINECODE34	INLINECODE35	mlx-audio venv location
INLINECODE36

| mlx_audio.voice | af_heart | Voice preset (used if no voice_blend) | | mlx_audio.voice_blend | {af_heart: 0.6, af_sky: 0.4} | Custom voice mix (weighted blend) | | mlx_audio.lang_code | a | Language code (a=US English) | | mlx_audio.speed | 1.05 | Playback speed (1.0 = normal, 1.05 = 5% faster) |

Kokoro PyTorch Options (fallback)
Option Default Description
INLINECODE46 INLINECODE47 Kokoro repo location
INLINECODE48
`{af_heart: 0.6, af_sky: 0.4}` | Custom voice mix |

Option	Default	Description
INLINECODE46	INLINECODE47	Kokoro repo location
INLINECODE48

| kokoro.speed | 1.05 | Playback speed (1.0 = normal, 1.05 = 5% faster) |

Processing Options
Option Default Description
INLINECODE52 INLINECODE53 Seconds for sub-agent (increase for long videos)
INLINECODE54
`true` | Remove /tmp files after completion |

Option	Default	Description
INLINECODE52	INLINECODE53	Seconds for sub-agent (increase for long videos)
INLINECODE54

Comment Options
Option Default Description
INLINECODE56 INLINECODE57 Number of comments to fetch
INLINECODE58
`90` | Timeout for comment fetching (seconds) |

Option	Default	Description
INLINECODE56	INLINECODE57	Number of comments to fetch
INLINECODE58

Queue Options
Option Default Description
INLINECODE60 INLINECODE61 Consider a processing job stale after this many minutes

Option	Default	Description
INLINECODE60	INLINECODE61	Consider a processing job stale after this many minutes

Output Structure

CODEBLOCK11

After generation, opens the folder (not individual files) so you can access everything.

Dependencies

Required:

- summarize CLI — INLINECODE63
Python 3.8+

Optional (better quality):

- pandoc — DOCX output: INLINECODE65
INLINECODE66 — MP3 audio: INLINECODE67
INLINECODE68 — YouTube comments: INLINECODE69
mlx-audio — Fastest TTS on Apple Silicon: pip install mlx-audio (uses MLX backend for Kokoro)
Kokoro TTS — PyTorch fallback: see https://github.com/hexgrad/kokoro

yt-dlp Search Paths

TubeScribe checks these locations (in order):

Priority	Path	Source
1	INLINECODE71	System PATH
2

If not found, setup downloads a standalone binary to the tools directory.
The tools directory version doesn't conflict with system installations.

Queue Handling

When user sends multiple YouTube URLs while one is processing:

Check Before Starting

CODEBLOCK12

If Already Processing

CODEBLOCK13

After Completion

CODEBLOCK14

Queue Commands
Command Description
INLINECODE77 Show what's processing + queued items
INLINECODE78
Add URL to queue |

Command	Description
INLINECODE77	Show what's processing + queued items
INLINECODE78

Batch Processing (multiple URLs at once)

python skills/tubescribe/scripts/tubescribe.py url1 url2 url3

Processes all URLs sequentially with a summary at the end.

Error Handling

The script detects and reports these errors with clear messages:

Error	Message
Invalid URL	❌ Not a valid YouTube URL
Private video

When an error occurs, report it to the user and don't proceed with that video.

Tips

- For long videos (>30 min), increase sub-agent timeout to 900s
Speaker detection works best with clear interview/podcast formats
Single-speaker videos (tutorials, lectures) skip speaker labels automatically
Timestamps link directly to YouTube at that moment
Use batch mode for multiple videos: INLINECODE81

TubeScribe 🎬

将任意YouTube视频转化为精美的文档+音频摘要。

输入YouTube链接→获取带有说话人标签、关键引用、可回溯视频的时间戳的精美转录文本，以及可随时随地收听的音频摘要。

💸 免费且无需付费API

- 无需订阅或API密钥——开箱即用
本地处理——转录、说话人检测和TTS均在本地运行
网络访问——从YouTube获取内容（字幕、元数据、评论）需要联网
不上传数据——不会向外部服务发送任何数据；所有处理均在本地完成
安全子代理——生成的子代理有严格指令：不安装软件，不进行YouTube以外的网络调用

✨ 功能特性

- 📄 带摘要和关键引用的转录文本——可导出为DOCX、HTML或Markdown格式
🎯 智能说话人检测——自动识别参与者
🔊 音频摘要——收听关键要点（MP3/WAV格式）
📝 可点击时间戳——每条引用都直接链接到视频中的对应时刻
💬 YouTube评论——观众情感分析和精选评论
📋 队列支持——可发送多个链接，按顺序处理
🚀 非阻塞工作流——视频后台处理期间可继续对话

🎬 适用于任何视频

- 访谈和播客（多说话人检测）
讲座和教程（单说话人）
音乐视频（歌词提取）
新闻和纪录片
任何带字幕的YouTube内容

快速开始

当用户发送YouTube链接时：

1. 立即生成子代理执行完整处理流程
回复：🎬 TubeScribe正在处理——完成后我会通知您！
继续对话（无需等待！）
子代理通知将告知完成状态，包含标题和详细信息

请勿阻塞——立即生成子代理并继续后续操作。

首次设置

运行设置以检查依赖项并配置默认值：

bash
python skills/tubescribe/scripts/setup.py

此命令检查：summarize CLI、pandoc、ffmpeg、Kokoro TTS

完整工作流（单子代理）

生成一个子代理执行完整处理流程：

python
sessions_spawn(
task=f

TubeScribe：处理 {youtube_url}

⚠️ 关键提示：请勿安装任何软件。
不要使用pip、brew、curl、venv或二进制下载。
如果缺少工具，请停止并报告所需内容。

执行完整处理流程——在所有步骤完成前不要停止。

步骤1：提取

bash python3 skills/tubescribe/scripts/tubescribe.py {youtube_url}

注意脚本输出的源文件路径和输出路径。在后续步骤中使用这些确切路径。

步骤2：读取源JSON文件

读取步骤1输出中的源文件路径，并记录：

- metadata.title（用于文件名）
metadata.videoid
metadata.channel、uploaddate、duration_string

步骤3：创建格式化Markdown文件

写入步骤1中的输出路径：

1. # <标题>

2. 视频信息块——频道、日期、时长、链接（可点击）。每个字段之间空一行。

3. ## 参与者——带粗体标题的表格：

| 姓名 | 角色 | 描述 |
|----------|---------|----------|

4. ## 摘要——3-5段散文式描述

5. ## 关键引用——5条最佳引用，带可点击的YouTube时间戳。每条格式为：

引用文本。 - 12:34

另一条引用。 - 25:10

使用普通短横线-，不要使用长破折号—。不要使用块引用>。仅使用普通段落。

6. ## 观众情感分析（如有评论）

7. ## 精选评论（如有评论）——前5条，之间无空行：

评论内容。

- ▲ 123 @作者名

下一条评论内容。

- ▲ 45 @另一位作者

归属行：短横线+斜体。评论之间仅空一行，不要使用---分隔符。

8. ## 完整转录文本——合并片段、说话人标签、可点击时间戳

步骤4：创建DOCX文件

清理标题用于文件名（移除特殊字符），然后： bash pandoc <输出路径> -o ~/Documents/TubeScribe/<安全标题>.docx

步骤5：生成音频

将摘要文本写入临时文件，然后使用TubeScribe内置音频生成功能： bash

将摘要写入临时文件（使用python3写入，避免shell转义问题）

python3 -c text = 您的摘要文本 with open(<临时目录>/tubescribe<视频ID>summary.txt, w) as f: f.write(text)

生成音频（自动从配置检测引擎、语音、格式）

python3 skills/tubescribe/scripts/tubescribe.py \ --generate-audio <临时目录>/tubescribe<视频ID>summary.txt \ --audio-output ~/Documents/TubeScribe/<安全标题>_summary

此命令读取~/.tubescribe/config.json，自动使用配置的TTS引擎（mlx/kokoro/builtin）、语音混合和速度。输出格式（mp3/wav）来自配置。

步骤6：清理

bash python3 skills/tubescribe/scripts/tubescribe.py --cleanup <视频ID>

步骤7：打开文件夹

bash open ~/Documents/TubeScribe/

报告

告知创建的内容：DOCX文件名、MP3文件名+时长、视频统计信息。 , label=tubescribe, runTimeoutSeconds=900, cleanup=delete )

生成子代理后，立即回复：

🎬 TubeScribe正在处理——完成后我会通知您！

然后继续对话。子代理通知将告知完成状态。

配置

配置文件：~/.tubescribe/config.json

json
{
output: {
folder: ~/Documents/TubeScribe,
openfolderafter: true,
opendocumentafter: false,
openaudioafter: false
},
document: {
format: docx,
engine: pandoc
},
audio: {
enabled: true,
format: mp3,
tts_engine: mlx
},
mlx_audio: {
path: ~/.openclaw/tools/mlx-audio,
model: mlx-community/Kokoro-82M-bf16,
voice: af_heart,
lang_code: a,
speed: 1.05
},
kokoro: {
path: ~/.openclaw/tools/kokoro,
voiceblend: { afheart: 0.6, af_sky: 0.4 },
speed: 1.05
},
processing: {
subagent_timeout: 600,
cleanuptempfiles: true
}
}

输出选项
选项默认值描述
output.folder ~/Documents/TubeScribe 文件保存位置
output.openfolderafter
true | 完成后打开输出文件夹 |

选项	默认值	描述
output.folder	~/Documents/TubeScribe	文件保存位置
output.openfolderafter

文档选项
选项默认值可选值描述
document.format docx docx、html、md 输出格式
document.engine
pandoc | pandoc | DOCX转换器（回退到HTML） |

选项	默认值	可选值	描述
document.format	docx	docx、html、md	输出格式
document.engine

音频选项
选项默认值可选值描述
audio.enabled true true、false 生成音频摘要
audio.format
mp3 | mp3、wav | 音频格式（mp3需要ffmpeg） |

| audio.tts_engine | mlx

TubeScribeTubeScribe

TubeScribe

TubeScribe 🎬

💸 Free & No Paid APIs

✨ Features

🎬 Works With Any Video

Quick Start

First-Time Setup

Full Workflow (Single Sub-Agent)

Write summary to temp file (use python3 to write, avoids shell escaping issues)

Generate audio (auto-detects engine, voice, format from config)

Configuration

Output OptionsOptionDefaultDescriptionINLINECODE5INLINECODE6Where to save filesINLINECODE7 true | Open output folder when done |

Document OptionsOptionDefaultValuesDescriptionINLINECODE13INLINECODE14INLINECODE15, html, INLINECODE17Output formatINLINECODE18 pandoc | pandoc | Converter for DOCX (falls back to HTML) |

Audio OptionsOptionDefaultValuesDescriptionINLINECODE21INLINECODE22INLINECODE23, INLINECODE24Generate audio summaryINLINECODE25 mp3 | mp3, wav | Audio format (mp3 needs ffmpeg) |

MLX-Audio Options (preferred on Apple Silicon)OptionDefaultDescriptionINLINECODE34INLINECODE35mlx-audio venv locationINLINECODE36 mlx-community/Kokoro-82M-bf16 | MLX model to use |

Kokoro PyTorch Options (fallback)OptionDefaultDescriptionINLINECODE46INLINECODE47Kokoro repo locationINLINECODE48 {af_heart: 0.6, af_sky: 0.4} | Custom voice mix |

Processing OptionsOptionDefaultDescriptionINLINECODE52INLINECODE53Seconds for sub-agent (increase for long videos)INLINECODE54 true | Remove /tmp files after completion |

Comment OptionsOptionDefaultDescriptionINLINECODE56INLINECODE57Number of comments to fetchINLINECODE58 90 | Timeout for comment fetching (seconds) |

Queue OptionsOptionDefaultDescriptionINLINECODE60INLINECODE61Consider a processing job stale after this many minutes

Output Structure

Dependencies

yt-dlp Search Paths

Queue Handling

Check Before Starting

If Already Processing

After Completion

Queue CommandsCommandDescriptionINLINECODE77Show what's processing + queued itemsINLINECODE78 Add URL to queue |

Batch Processing (multiple URLs at once)

Error Handling

Tips

TubeScribe 🎬

💸 免费且无需付费API

✨ 功能特性

🎬 适用于任何视频

快速开始

首次设置

完整工作流（单子代理）

TubeScribe：处理 {youtube_url}

步骤1：提取

步骤2：读取源JSON文件

步骤3：创建格式化Markdown文件

步骤4：创建DOCX文件

步骤5：生成音频

将摘要写入临时文件（使用python3写入，避免shell转义问题）

生成音频（自动从配置检测引擎、语音、格式）

步骤6：清理

步骤7：打开文件夹

报告

配置

输出选项选项默认值描述output.folder~/Documents/TubeScribe文件保存位置output.openfolderafter true | 完成后打开输出文件夹 |

文档选项选项默认值可选值描述document.formatdocxdocx、html、md输出格式document.engine pandoc | pandoc | DOCX转换器（回退到HTML） |

音频选项选项默认值可选值描述audio.enabledtruetrue、false生成音频摘要audio.format mp3 | mp3、wav | 音频格式（mp3需要ffmpeg） |

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Output Options
Option Default Description
INLINECODE5 INLINECODE6 Where to save files
INLINECODE7
`true` | Open output folder when done |

Document Options
Option Default Values Description
INLINECODE13 INLINECODE14 INLINECODE15, `html`, INLINECODE17 Output format
INLINECODE18
`pandoc` | `pandoc` | Converter for DOCX (falls back to HTML) |

Audio Options
Option Default Values Description
INLINECODE21 INLINECODE22 INLINECODE23, INLINECODE24 Generate audio summary
INLINECODE25
`mp3` | `mp3`, `wav` | Audio format (mp3 needs ffmpeg) |

MLX-Audio Options (preferred on Apple Silicon)
Option Default Description
INLINECODE34 INLINECODE35 mlx-audio venv location
INLINECODE36
`mlx-community/Kokoro-82M-bf16` | MLX model to use |

Kokoro PyTorch Options (fallback)
Option Default Description
INLINECODE46 INLINECODE47 Kokoro repo location
INLINECODE48
`{af_heart: 0.6, af_sky: 0.4}` | Custom voice mix |

Processing Options
Option Default Description
INLINECODE52 INLINECODE53 Seconds for sub-agent (increase for long videos)
INLINECODE54
`true` | Remove /tmp files after completion |

Comment Options
Option Default Description
INLINECODE56 INLINECODE57 Number of comments to fetch
INLINECODE58
`90` | Timeout for comment fetching (seconds) |

Queue Options
Option Default Description
INLINECODE60 INLINECODE61 Consider a processing job stale after this many minutes

Queue Commands
Command Description
INLINECODE77 Show what's processing + queued items
INLINECODE78
Add URL to queue |

输出选项
选项默认值描述
output.folder ~/Documents/TubeScribe 文件保存位置
output.openfolderafter
true | 完成后打开输出文件夹 |

文档选项
选项默认值可选值描述
document.format docx docx、html、md 输出格式
document.engine
pandoc | pandoc | DOCX转换器（回退到HTML） |

音频选项
选项默认值可选值描述
audio.enabled true true、false 生成音频摘要
audio.format
mp3 | mp3、wav | 音频格式（mp3需要ffmpeg） |