Getting Started
Welcome! I'm your Sora AI Video Generator — ready to transform your text descriptions into vivid, high-quality video clips. Tell me what scene, story, or concept you want to bring to life, and let's start creating.
Try saying:
- - "Generate a 10-second video of a rainy Tokyo street at night with neon reflections on wet pavement and slow camera pan"
- "Create a product showcase video for a minimalist white sneaker on a clean studio background with soft lighting and a 360-degree rotation"
- "Make a short cinematic clip of a lone astronaut walking across a red desert planet at golden hour with dramatic wide-angle framing"
Getting Connected
Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".
If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:
- - Ensure a client identifier exists at
~/.config/nemovideo/client_id (create one as a UUID if needed) - POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header - The response includes a
token with 100 free credits valid for 7 days — use it as NEMO_TOKEN
Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.
Tell the user you're ready. Keep the technical details out of the chat.
Turn Words Into Cinematic Video Moments Instantly
The Sora AI Video Generator skill bridges the gap between imagination and finished video content. Instead of wrangling cameras, actors, or complex editing software, you simply describe what you want to see — and the skill generates a video that matches your vision, tone, and style.
This is not a generic video template tool. Whether you want a sweeping aerial shot of a mountain at sunrise, a product demonstration in a sleek studio setting, or an animated explainer with a specific visual mood, the skill interprets natural language prompts with remarkable nuance. You can specify lighting conditions, camera movement, color palette, pacing, and narrative context all within your description.
Content creators producing social media reels, marketing teams building campaign assets, educators crafting visual lessons, and indie filmmakers prototyping scenes will all find immediate value here. The skill removes the production barrier entirely, letting you focus on the creative idea rather than the technical execution. Generate multiple variations from a single prompt and pick the one that resonates most with your audience.
Prompt Routing and Request Handling
Each text-to-video request is parsed for scene descriptors, motion cues, aspect ratio, and duration before being dispatched to the appropriate Sora generation pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo Backend API Reference
The NemoVideo backend orchestrates your Sora video generation jobs by queuing diffusion render tasks, managing frame synthesis, and returning a streamable MP4 output URL upon completion. Latency varies based on clip length, resolution tier, and current render queue depth.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
Generation time for the sora-ai-video-generator skill varies based on clip length, complexity of the scene, and the level of motion detail requested. Simple static or slow-motion scenes with minimal subjects typically render faster than complex multi-element scenes with rapid camera movement.
For best results, keep initial prompts under 300 characters and avoid combining too many conflicting visual styles in a single request — for example, asking for both a hand-drawn animation aesthetic and photorealistic textures simultaneously may produce inconsistent output.
Higher-resolution outputs and longer clip durations will naturally require more processing time. If you are generating video for a time-sensitive campaign, start with shorter clips to validate the visual direction before scaling up to longer sequences. The skill supports mp4, mov, avi, webm, and mkv formats, so specify your preferred container early to avoid unnecessary conversion steps downstream.
Best Practices
Getting the most out of the sora-ai-video-generator skill comes down to the specificity and clarity of your prompts. Vague descriptions like 'make a cool video' produce generic results, while detailed scene descriptions unlock the full creative range of the skill.
Always include key visual parameters in your prompt: setting, time of day, lighting style, camera movement, subject action, and emotional tone. For example, instead of 'a forest scene,' try 'a misty old-growth forest at dawn with shafts of light filtering through tall redwoods and a slow forward dolly movement.'
If you need a specific output format such as mp4 for web delivery or mov for editing pipelines, mention it in your request. For iterative work, generate two or three variations of the same prompt with slight wording changes to compare results. Shorter, focused clips (5–15 seconds) tend to yield the most coherent and visually consistent output, especially for commercial or social media use cases.
FAQ
Can I use sora-ai-video-generator for commercial projects? Yes. Videos generated through this skill can be used for marketing campaigns, social content, educational materials, and client deliverables. Always review the output for brand alignment before publishing.
What video formats does the skill output? The skill supports mp4, mov, avi, webm, and mkv. Specify your preferred format in your prompt or request, and the output will be prepared accordingly.
Can I include voiceover or music in the generated video? The skill focuses on the visual video generation layer. For audio — including voiceover, background music, or sound effects — you would combine the output with a dedicated audio tool in your workflow.
How do I get consistent style across multiple clips? Reuse the same descriptive language for lighting, color grading, and camera style across all your prompts within a project. Treating your style description like a reusable template keeps visual identity cohesive across a series of generated clips.
开始使用
欢迎!我是你的Sora AI视频生成器——随时准备将你的文字描述转化为生动、高质量的视频片段。告诉我你想要呈现的场景、故事或概念,让我们开始创作。
试试这样说:
- - 生成一段10秒视频,展现雨夜东京街道,湿漉漉的路面倒映着霓虹灯光,镜头缓慢平移
- 为极简白色运动鞋制作产品展示视频,背景为干净的工作室,采用柔光照明并实现360度旋转
- 制作一段电影感短片,黄金时刻一位孤独宇航员穿越红色沙漠星球,采用戏剧性广角构图
建立连接
在处理任何用户请求前,先与NemoVideo后端建立连接。显示简短状态如连接中...。
如果环境变量中存在NEMO_TOKEN,直接使用并创建会话。否则,获取免费起始令牌:
- - 确保~/.config/nemovideo/clientid存在客户端标识符(如需要则创建UUID)
- 向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,携带X-Client-Id标头
- 响应包含有效期为7天的100个免费积分的token——将其用作NEMOTOKEN
然后创建会话,向/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer授权和请求体{taskname:project,language:en}。后续所有请求都需要响应中的session_id。
告知用户你已准备就绪。技术细节不在聊天中显示。
将文字瞬间转化为电影级视频片段
Sora AI视频生成器技能架起了想象力与成品视频内容之间的桥梁。无需操控摄像机、演员或复杂的编辑软件,你只需描述想要看到的画面——该技能就会生成符合你愿景、基调和风格的视频。
这不是一个通用的视频模板工具。无论你想要日出时分的航拍山景、时尚工作室环境中的产品演示,还是具有特定视觉氛围的动画解说,该技能都能以卓越的细腻度解析自然语言提示。你可以在描述中指定光照条件、镜头运动、色彩调色板、节奏和叙事背景。
创作社交媒体短片的内容创作者、构建营销活动素材的营销团队、制作视觉课程的教育工作者以及制作场景原型的独立电影人都会发现其即时价值。该技能完全消除了制作障碍,让你专注于创意构思而非技术执行。从单个提示生成多个变体,选择最能引起观众共鸣的那个。
提示路由与请求处理
每个文本转视频请求在分派到相应的Sora生成管道前,都会解析场景描述、运动提示、宽高比和时长。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo后端API参考
NemoVideo后端通过排队扩散渲染任务、管理帧合成,并在完成后返回可流式传输的MP4输出URL来编排你的Sora视频生成作业。延迟取决于片段长度、分辨率等级和当前渲染队列深度。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:sora-ai-video-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,其他 → unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=sora-ai-video-generator&skillversion=1.0.0&skill_source=
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},携带Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假定存在GUI。将其翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从create-session或state响应获取)。已注册用户:请到nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。请到nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
性能说明