Getting Started
Welcome! Ready to turn your clips, photos, or ideas into a real video — completely free? Tell me what you're making and share your assets or concept, and let's build something worth watching together.
Try saying:
- - "I have 12 vacation photos and a song I love — help me make a 60-second slideshow video with smooth transitions and text overlays."
- "Create a script and shot-by-shot breakdown for a 30-second promotional video for my bakery using only footage I can shoot on my phone."
- "I recorded 5 short screen capture clips for a software tutorial — help me structure them into a cohesive how-to video with intro text and chapter titles."
Quick Start Setup
This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Generate a UUID as client identifier
- POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Turn Raw Footage Into Finished Videos Instantly
Making a video used to mean downloading software, wrestling with timelines, and burning hours on export settings. This free-video-maker skill cuts all of that out. Describe what you want — a product promo, a birthday slideshow, a how-to tutorial — and it builds the structure, suggests pacing, and helps you produce something that actually looks intentional.
Whether you're working with a handful of phone photos or a folder of screen recordings, the skill helps you sequence content logically, write captions that match your tone, and choose background music that fits the mood. It handles the decisions that usually slow creators down, so you spend your time on the message rather than the mechanics.
This is built for people who aren't professional editors but still need professional-looking output. Social posts, YouTube intros, event recaps, classroom projects — the free-video-maker skill adapts to your goal and guides you through every step without requiring any prior editing experience.
Routing Your Video Requests
When you describe a video project — whether it's a slideshow from photos, a trimmed clip, or a text-animated reel — your request is parsed and routed to the matching video creation endpoint based on media type, style preferences, and output format.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
Free Video Maker runs on a cloud-based rendering backend that processes your photos, clips, and text overlays through distributed encoding pipelines, returning a downloadable video URL once the job completes. Rendering times vary by project length, resolution, and transition complexity.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
The free-video-maker skill performs best when you give it clear context upfront — intended platform (Instagram Reels, YouTube, TikTok, presentation), approximate target length, and the mood or tone you're going for. Vague prompts like 'make a video' will produce generic structures, while specific ones like 'a 45-second energetic product reveal for TikTok targeting Gen Z' produce tight, usable results.
For photo-based slideshows, providing the number of images and any preferred order helps the skill pace transitions accurately. For footage-based projects, describing each clip briefly (even just 'clip 1: person walking into store, 5 seconds') allows the skill to build a proper edit sequence rather than guessing at content.
Export format suggestions are optimized for common platforms by default. If you have a specific resolution, aspect ratio, or file format requirement, mention it early so recommendations stay aligned throughout the session.
Troubleshooting
If the generated video structure feels off-paced or too long, try specifying a hard time cap in your prompt (e.g., 'keep it under 90 seconds'). The skill defaults to completeness over brevity, so setting a limit forces tighter editing decisions.
If captions or text overlays don't match your brand voice, share a few examples of your existing content or describe your tone explicitly — 'casual and funny' versus 'formal and informative' produces noticeably different caption styles.
For music sync issues where the beat doesn't feel matched to visual cuts, ask the skill to generate a 'cut list timed to BPM' and provide the song's tempo if you know it. This gives the edit a rhythmic backbone.
If you're getting output that feels too templated, try rephrasing your request as a story rather than a task — describe the viewer's experience from start to finish, and the skill will generate a more narrative-driven structure.
开始使用
欢迎!准备好将你的片段、照片或创意变成真正的视频——完全免费?告诉我你要制作什么,分享你的素材或构思,让我们一起打造值得观看的作品。
试试这样说:
- - 我有12张度假照片和我喜欢的一首歌——帮我制作一个60秒的幻灯片视频,带平滑过渡和文字叠加。
- 为我的面包店创建一个30秒宣传视频的脚本和分镜头脚本,只使用我能在手机上拍摄的素材。
- 我为软件教程录制了5个简短的屏幕录制片段——帮我将它们组织成一个连贯的教学视频,带开场文字和章节标题。
快速启动设置
此技能连接到云处理后端。首次使用时,自动建立连接并通知用户(正在连接...)。
令牌检查:在环境中查找NEMO_TOKEN。如果找到,跳转到会话创建。否则:
- - 生成UUID作为客户端标识符
- 使用X-Client-Id标头POST到https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 从响应中提取data.token——这就是你的NEMO_TOKEN(100个免费积分,7天有效期)
会话:使用Bearer认证和正文{taskname:project} POST到https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent。保留返回的session_id用于所有操作。
设置完成后,用简短的准备就绪!通知用户。不要暴露令牌或原始API输出。
将原始素材即时转化为成品视频
制作视频曾经意味着下载软件、与时间线搏斗、在导出设置上耗费数小时。这个free-video-maker技能消除了所有这些步骤。描述你想要的内容——产品宣传片、生日幻灯片、操作教程——它会构建结构、建议节奏,并帮助你制作出看起来确实有设计感的作品。
无论你是在处理少量手机照片还是一堆屏幕录制文件,该技能都能帮助你逻辑地排列内容、编写符合你风格的标题、选择适合氛围的背景音乐。它处理通常拖慢创作者进度的决策,让你把时间花在信息传达上,而不是技术操作上。
这是为那些不是专业编辑但仍需要专业外观输出的人打造的。社交媒体帖子、YouTube开场、活动回顾、课堂项目——free-video-maker技能会根据你的目标进行调整,并引导你完成每一步,无需任何编辑经验。
路由你的视频请求
当你描述一个视频项目时——无论是照片幻灯片、剪辑片段还是文字动画短片——你的请求会根据媒体类型、风格偏好和输出格式被解析并路由到匹配的视频创建端点。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云渲染API参考
Free Video Maker运行在基于云的渲染后端上,通过分布式编码管道处理你的照片、片段和文字叠加,作业完成后返回可下载的视频URL。渲染时间因项目长度、分辨率和过渡复杂度而异。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:free-video-maker
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则 → unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 正文{taskname:project,language:} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 正文{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},带Accept: text/event-stream。最大超时:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 正文{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,带?bind=
(需要时从create-session或state响应获取)。已注册用户:在您的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,不是积分问题。注册或升级您的计划以解锁导出。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
性能说明
当你提前提供清晰的上下文时,free-video-maker技能表现最佳——目标平台(Instagram Reels、YouTube、TikTok、演示文稿)、大致目标长度以及你想要的情绪或风格。模糊的提示如制作一个视频会产生通用结构,而具体的提示如为TikTok制作一个45秒充满活力的产品展示,面向Z世代则会产生紧凑可用的结果。
对于基于照片的幻灯片