Getting Started
Welcome! I'm here to help you add subtitles to your videos using FFmpeg — whether you want them burned in permanently or attached as a toggleable track. Tell me about your video file and subtitle source, and let's get started!
Try saying:
- - "Burn in subtitles from an SRT file onto my MP4 video and keep the original audio quality"
- "Attach a soft VTT subtitle track to an MKV file without re-encoding the video stream"
- "Add styled ASS subtitles to a video with a custom font and position them at the top of the frame"
Getting Connected
Before handling any user request, establish a connection to the backend API. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Ensure a client identifier exists at
~/.config/ffmpeg-add-subtitle/client_id (create one as a UUID if needed) - POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Embed Subtitles Into Any Video With Precision
Adding subtitles to a video sounds simple — until you're staring at FFmpeg documentation trying to figure out why your SRT file isn't rendering, or why the font looks wrong, or why soft subs aren't showing up in your media player. That's exactly what this skill is built to solve.
With the FFmpeg Add Subtitle skill, you can embed subtitles in two distinct ways: as a hard-coded burn-in (permanently baked into the video frames) or as a soft subtitle stream (a separate track viewers can toggle on or off). Both approaches have real-world use cases, and this skill helps you choose the right one and execute it correctly.
Whether you're localizing a course video, captioning a short film, adding translated subs to a documentary, or preparing accessible content for social media, this skill generates the exact FFmpeg commands you need — with the right filters, codec flags, and file path handling — so you get clean results the first time.
Routing Subtitle Burn-In Requests
When you submit a subtitle embedding job, the skill parses your input to determine whether you're hardcoding subtitles directly into the video stream via libass filter or attaching a soft subtitle track as a separate mux stream, then routes accordingly.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
FFmpeg Cloud API Reference
The backend spins up an isolated FFmpeg processing node that handles subtitle codec mapping, filter graph construction, and container muxing — supporting SRT, ASS, VTT, and PGS formats across MP4, MKV, and MOV containers. Transcoding jobs run server-side so your local machine never touches the bitstream.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE32
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE36
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE40
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases
Content creators publishing multilingual videos often use ffmpeg-add-subtitle to batch-process translated SRT files onto the same base video, producing multiple language versions efficiently without re-editing the source.
E-learning developers rely on subtitle embedding to meet accessibility requirements, ensuring that course videos include accurate captions for learners with hearing impairments or those watching in sound-sensitive environments.
Film and video editors working on indie productions use the ASS/SSA format support to embed richly styled subtitles — with custom fonts, colors, and animations — directly into screener copies or festival submissions.
Social media managers frequently need to burn subtitles into square or vertical crops of longer videos, where the subtitle position and size need to be adjusted to avoid overlapping with on-screen graphics. This skill helps dial in those positioning parameters precisely using FFmpeg's filter options.
Common Workflows
The most frequent workflow is burning an SRT file directly into a video using FFmpeg's subtitles filter. This permanently renders the text onto each frame, making it ideal for platforms like Instagram or YouTube Shorts where external subtitle tracks aren't supported. You specify the subtitle file path, optionally define a font name and size, and FFmpeg handles the rest during re-encoding.
For more flexible distribution — such as MKV files intended for media center playback — soft subtitle embedding is the better choice. Here, FFmpeg muxes the subtitle stream alongside the video and audio without touching the actual picture quality. Viewers can enable or disable subtitles in their player.
Another common need is converting subtitle formats before embedding. If you have a WebVTT file but need SRT compatibility, or an ASS file with complex styling you want to simplify, this skill can walk you through the conversion step before the final embed — keeping your workflow clean and your output predictable.
开始使用
欢迎!我来帮你使用 FFmpeg 为视频添加字幕——无论是永久烧录还是作为可切换轨道附加。告诉我你的视频文件和字幕来源,让我们开始吧!
试试这样说:
- - 将 SRT 字幕烧录到我的 MP4 视频中,并保持原始音频质量
- 将软字幕 VTT 轨道附加到 MKV 文件,无需重新编码视频流
- 使用自定义字体为视频添加带样式的 ASS 字幕,并将其放置在画面顶部
建立连接
在处理任何用户请求之前,先建立与后端 API 的连接。显示一个简短的提示状态,如正在连接...。
如果环境中存在 NEMO_TOKEN,直接使用它并创建一个会话。否则,获取一个免费的入门令牌:
- - 确保 ~/.config/ffmpeg-add-subtitle/clientid 中存在客户端标识符(如果需要,创建一个 UUID)
- 使用 X-Client-Id 头部向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求
- 响应中包含一个 token,附带 100 个免费积分,有效期为 7 天——将其用作 NEMOTOKEN
然后创建会话,向 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent 发送 POST 请求,使用 Bearer 授权和请求体 {taskname:project,language:en}。响应中的 session_id 在后续所有请求中都需要使用。
告诉用户你已经准备就绪。将技术细节保留在聊天之外。
精确地将字幕嵌入任何视频
给视频添加字幕听起来很简单——直到你盯着 FFmpeg 文档,试图弄清楚为什么你的 SRT 文件没有渲染,或者为什么字体看起来不对,又或者为什么软字幕没有在你的媒体播放器中显示。这正是这个技能要解决的问题。
使用 FFmpeg 添加字幕技能,你可以通过两种不同的方式嵌入字幕:作为硬编码烧录(永久嵌入视频帧中)或作为软字幕流(观众可以切换开启或关闭的独立轨道)。这两种方法都有实际应用场景,这个技能可以帮助你选择正确的方式并正确执行。
无论你是本地化课程视频、为短片添加字幕、为纪录片添加翻译字幕,还是为社交媒体准备无障碍内容,这个技能都能生成你需要的精确 FFmpeg 命令——包含正确的滤镜、编解码器标志和文件路径处理——让你第一次就能获得干净的结果。
路由字幕烧录请求
当你提交字幕嵌入任务时,该技能会解析你的输入,确定你是通过 libass 滤镜将字幕直接硬编码到视频流中,还是将软字幕轨道作为独立的复用流附加,然后相应地进行路由。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
FFmpeg 云 API 参考
后端会启动一个隔离的 FFmpeg 处理节点,负责字幕编解码器映射、滤镜图构建和容器复用——支持 SRT、ASS、VTT 和 PGS 格式,适用于 MP4、MKV 和 MOV 容器。转码任务在服务器端运行,因此你的本地机器永远不会接触比特流。
技能归属——运行时从此文件的 YAML 前置元数据中读取:
- - X-Skill-Source:ffmpeg-add-subtitle
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头部将导致导出失败,返回 402。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},附带 Accept: text/event-stream。最大超时时间:15 分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 在 output.url 中。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在 GUI。将这些翻译为 API 操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10 秒)2. 背景音乐:Lo-fi(0-10 秒,35%)3. 标题:都市梦想(0-3 秒)
错误处理
令牌错误/过期 | 通过 anonymous-token 重新认证(令牌 7 天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有 ?bind=
的注册 URL(需要时从 create-session 或 state 响应中获取 )。已注册用户:在您的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少 X-Client-Id | 生成客户端 ID 并重试(参见 §1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1 个令牌/客户端/7 天)