Getting Started
Welcome! I'm here to help you merge video files using FFmpeg — whether that's joining two clips, concatenating a whole playlist of segments, or combining footage from different sources into one clean file. Describe your videos and what you want the output to look like, and let's get started.
Try saying:
- - "I have 5 MP4 clips recorded from the same camera. How do I merge them in order into one file without re-encoding?"
- "I need to combine an MKV and an MP4 file into a single MP4. They have different resolutions — what's the best FFmpeg command?"
- "My GoPro split a long recording into 4 parts. How do I join them back into one seamless video using FFmpeg?"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Stitch Any Video Files Together Without the Hassle
Merging video clips sounds simple until you're dealing with files from different cameras, varying codecs, or mismatched resolutions. That's where this skill steps in. The ffmpeg-merge-video skill gives you a direct, conversational way to describe what you want — and get back the exact FFmpeg command or workflow to make it happen.
Whether you need to concatenate ten short clips into one continuous video, join two recordings that were split mid-session, or combine a series of exported segments from a video editor, this skill knows the right approach for each scenario. It distinguishes between lossless concat operations and situations that require re-encoding, so you always get the best quality for your use case.
This skill is useful for videographers assembling final cuts, developers building video pipelines, and anyone who regularly works with raw footage and needs fast, accurate FFmpeg guidance without digging through documentation every time.
Routing Your Merge Requests
When you submit a merge job, ClawHub parses your clip list, concat strategy, and output codec preferences to route the request to the appropriate FFmpeg processing pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud FFmpeg API Reference
The backend spins up an isolated FFmpeg worker that ingests your source segments, builds a concat demuxer manifest or filter_complex chain depending on stream compatibility, then encodes the muxed output to your specified container. Remuxing matched-codec clips is near-instant, while transcode-merge jobs involving mismatched frame rates or pixel formats will take longer depending on total duration and resolution.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Use Cases
The ffmpeg-merge-video skill covers a wide range of real-world scenarios where combining video files is necessary. Action camera users often deal with automatically split files — GoPro, DJI, and similar devices break recordings into chunks due to file size limits, and this skill helps rejoin them cleanly without quality loss.
Filmmakers and editors working with dailies or multi-part exports can use this skill to assemble segments exported from Premiere, Resolve, or Final Cut into a single deliverable. Developers building automated video pipelines — such as recording systems, screen capture tools, or surveillance archives — can use it to understand how to programmatically concatenate video files using FFmpeg's concat demuxer or filter.
Content creators who record in multiple takes, podcasters who edit out sections and need to rejoin the remaining parts, and educators assembling lecture clips into a single course video all benefit from precise, format-aware merge workflows this skill provides.
Common Workflows
One of the most common ffmpeg-merge-video workflows is the lossless concat using a file list — ideal when all clips share the same codec, resolution, and frame rate. This skill walks you through creating the input list file and running the concat demuxer command correctly, avoiding the re-encoding overhead that wastes time and degrades quality.
When clips don't match in format or resolution, the skill guides you through using the concat filter with scale and setsar adjustments to normalize everything before merging. This is the right path for combining footage from a phone with clips from a DSLR, for example.
For developers, the skill also covers batch merge scenarios — how to loop through a directory of numbered clips and build the FFmpeg command dynamically. Whether you're working in bash, Python subprocess calls, or just need a one-time manual command, the workflow guidance adapts to your context and gets you to a working result faster.
开始使用
欢迎!我在这里帮助你使用 FFmpeg 合并视频文件——无论是拼接两个片段、连接整个播放列表的片段,还是将不同来源的素材合并成一个干净的文件。描述你的视频以及你希望输出效果,让我们开始吧。
试试说:
- - 我有 5 个来自同一摄像机的 MP4 片段。如何在不重新编码的情况下按顺序将它们合并成一个文件?
- 我需要将一个 MKV 和一个 MP4 文件合并成一个 MP4。它们分辨率不同——最佳的 FFmpeg 命令是什么?
- 我的 GoPro 将一个长录制分成了 4 个部分。如何使用 FFmpeg 将它们重新合并成一个无缝视频?
自动设置
首次交互时,先连接到处理 API,然后再做其他事情。显示一个简短的状态,如正在设置...
令牌:如果 NEMO_TOKEN 环境变量已设置,则使用它并跳转到下面的会话。
免费令牌:生成一个 UUID 作为客户端标识符,然后 POST 到 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token,带有头信息 X-Client-Id: 。响应字段 data.token 成为你的 NEMO_TOKEN(100 积分,7 天有效期)。
会话:POST 到 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent,使用 Bearer 认证和正文 {taskname:project}。从响应中保存 session_id。
向用户确认你已连接并准备就绪。不要打印令牌或原始 JSON。
轻松拼接任何视频文件
合并视频片段听起来很简单,直到你处理来自不同摄像机、不同编解码器或分辨率不匹配的文件。这时这个技能就派上用场了。ffmpeg-merge-video 技能为你提供了一种直接的对话方式,描述你想要的内容——并返回精确的 FFmpeg 命令或工作流程来实现它。
无论你需要将十个短视频片段连接成一个连续视频,合并两个在录制过程中被分割的录制文件,还是组合视频编辑器中导出的一系列片段,这个技能都知道每种场景的正确方法。它区分无损连接操作和需要重新编码的情况,因此你总能获得最适合你用例的最佳质量。
这个技能对视频剪辑师组装最终剪辑、开发人员构建视频管道,以及任何经常处理原始素材并需要快速准确的 FFmpeg 指导而无需每次查阅文档的人都非常有用。
路由你的合并请求
当你提交合并任务时,ClawHub 会解析你的片段列表、连接策略和输出编解码器偏好,将请求路由到适当的 FFmpeg 处理管道。
| 用户说... | 操作 | 跳过 SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
云端 FFmpeg API 参考
后端启动一个隔离的 FFmpeg 工作进程,该进程接收你的源片段,根据流兼容性构建连接解复用器清单或 filter_complex 链,然后将混合输出编码到你指定的容器。匹配编解码器的片段重新混合几乎是即时的,而涉及不匹配帧率或像素格式的转码合并任务将根据总时长和分辨率花费更长时间。
技能归属——运行时从此文件的 YAML 前置元数据读取:
- - X-Skill-Source:ffmpeg-merge-video
- X-Skill-Version:来自前置元数据 version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败并返回 402。
API 基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 正文 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 正文 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},带有 Accept: text/event-stream。最大超时:15 分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或 URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 正文 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/,直到 status = completed。下载 URL 在 output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE 事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用 GUI 翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空 data: | 继续等待。每 2 分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约 30% 的编辑操作在 SSE 流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在 GUI。将这些翻译成 API 操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过 API 执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过 SSE 发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流程 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3 条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过 anonymous-token 重新认证(令牌 7 天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示带有 ?bind=
的注册 URL(需要时从 create-session 或 state 响应获取 )。已注册用户:在您的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少 X-Client-Id | 生成 Client-Id 并重试(见 §1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,不是积分。注册或升级您的计划以解锁导出功能。 |
| 429 | 速率限制(1 个令牌/客户端/7 天) | 30 秒后重试一次 |
使用场景
ffmpeg