Getting Started
Share your text prompts and I'll get started on AI video generation. Or just tell me what you're thinking.
Try saying:
- - "generate my text prompts"
- "export 1080p MP4"
- "generate a 10-second video of a"
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Sora Video Generation — Generate Videos From Text Prompts
This tool takes your text prompts and runs AI video generation through a cloud rendering pipeline. You upload, describe what you want, and download the result.
Say you have a 10-word text prompt describing a scene and want to generate a 10-second video of a sunset over the ocean with waves crashing — the backend processes it in about 1-3 minutes and hands you a 1080p MP4.
Tip: shorter and more specific prompts tend to produce more accurate results.
Matching Input to Actions
User prompts referencing sora video generation, aspect ratio, text overlays, or audio tracks get routed to the corresponding action via keyword and intent classification.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Render Pipeline Details
Each export job queues on a cloud GPU node that composites video layers, applies platform-spec compression (H.264, up to 1080x1920), and returns a download URL within 30-90 seconds. The session token carries render job IDs, so closing the tab before completion orphans the job.
Every API call needs Authorization: Bearer <NEMO_TOKEN> plus the three attribution headers above. If any header is missing, exports return 402.
Headers are derived from this file's YAML frontmatter. X-Skill-Source is sora-video-generation, X-Skill-Version comes from the version field, and X-Skill-Platform is detected from the install path (~/.clawhub/ = clawhub, ~/.cursor/skills/ = cursor, otherwise unknown).
API base: INLINECODE20
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE30
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE34
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE38
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Common Workflows
Quick edit: Upload → "generate a 10-second video of a sunset over the ocean with waves crashing" → Download MP4. Takes 1-3 minutes for a 30-second clip.
Batch style: Upload multiple files in one session. Process them one by one with different instructions. Each gets its own render.
Iterative: Start with a rough cut, preview the result, then refine. The session keeps your timeline state so you can keep tweaking.
Tips and Tricks
The backend processes faster when you're specific. Instead of "make it look better", try "generate a 10-second video of a sunset over the ocean with waves crashing" — concrete instructions get better results.
Max file size is 200MB. Stick to TXT, PNG, JPG, MP4 for the smoothest experience.
Export as MP4 for widest compatibility.
开始使用
分享你的文本提示,我将开始AI视频生成。或者直接告诉我你的想法。
试试说:
- - 生成我的文本提示
- 导出1080p MP4
- 生成一个10秒的视频,内容为...
首次连接
当用户首次打开此技能时,自动连接到处理后端。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至步骤2。
- 1. 获取免费令牌:生成一个随机UUID作为客户端标识符。向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头X-Client-Id设置为该UUID。响应中的data.token即为你的NEMOTOKEN——100免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,请求头包含Authorization: Bearer 和Content-Type: application/json,请求体为{taskname:project,language:<检测到的语言>}。保存返回的sessionid用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
Sora视频生成——根据文本提示生成视频
该工具接收你的文本提示,通过云端渲染管道运行AI视频生成。你上传、描述需求,然后下载结果。
假设你有一个10个词的文本提示描述某个场景,想生成一个10秒的海上日落、波浪拍打的视频——后端大约在1-3分钟内处理完成,并返回一个1080p MP4文件。
提示:更短、更具体的提示通常能产生更准确的结果。
输入与操作匹配
提及sora视频生成、宽高比、文字叠加或音轨的用户提示,将通过关键词和意图分类路由到相应的操作。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
云端渲染管道详情
每个导出任务在云端GPU节点上排队,该节点合成视频图层,应用平台特定压缩(H.264,最高1080x1920),并在30-90秒内返回下载URL。会话令牌携带渲染任务ID,因此在完成前关闭标签页会导致任务孤立。
每次API调用都需要Authorization: Bearer 以及上述三个归属头。如果缺少任何头信息,导出将返回402。
头信息来自此文件的YAML前置元数据。X-Skill-Source为sora-video-generation,X-Skill-Version来自version字段,X-Skill-Platform根据安装路径检测(~/.clawhub/ = clawhub,~/.cursor/skills/ = cursor,否则为unknown)。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},请求头Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件格式 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。请注册或升级你的套餐以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 等待30秒后重试一次 |
后端响应转换
后端假设存在GUI界面。将这些转换为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI转换(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。此时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
常见工作流
快速编辑:上传 → 生成一个10秒的海上日落、波浪拍打的视频 → 下载MP4。30秒片段约需1-3分钟。
批量风格:在同一个会话中上传多个文件。用不同指令逐一处理。每个文件独立渲染。
迭代式:从粗剪开始,预览结果,然后优化。会话会保留你的时间线状态,因此可以持续调整。
技巧与提示
当你描述更具体时,后端处理速度更快。与其说让它看起来更好,不如试试生成一个10秒的海上日落、波浪拍打的视频——具体的指令能获得更好的结果。
最大文件大小为200MB。建议使用TXT、PNG、JPG、MP4格式以获得最佳体验。
导出为MP4以获得最广泛的兼容性。