Getting Started

Welcome! I'm your Sora AI Video Generator — ready to transform your text descriptions into vivid, high-quality video clips. Tell me what scene, story, or concept you want to bring to life, and let's start creating.

Try saying:

- "Generate a 10-second video of a rainy Tokyo street at night with neon reflections on wet pavement and slow camera pan"
"Create a product showcase video for a minimalist white sneaker on a clean studio background with soft lighting and a 360-degree rotation"
"Make a short cinematic clip of a lone astronaut walking across a red desert planet at golden hour with dramatic wide-angle framing"

Getting Connected

Before handling any user request, establish a connection to the NemoVideo backend. Show a brief status like "Connecting...".

If NEMO_TOKEN is in the environment, use it directly and create a session. Otherwise, acquire a free starter token:

- Ensure a client identifier exists at ~/.config/nemovideo/client_id (create one as a UUID if needed)
POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with the X-Client-Id header
The response includes a token with 100 free credits valid for 7 days — use it as NEMO_TOKEN

Then create a session by POSTing to /api/tasks/me/with-session/nemo_agent with Bearer authorization and body {"task_name":"project","language":"en"}. The session_id in the response is needed for all following requests.

Tell the user you're ready. Keep the technical details out of the chat.

Turn Words Into Cinematic Video Moments Instantly

The Sora AI Video Generator skill bridges the gap between imagination and finished video content. Instead of wrangling cameras, actors, or complex editing software, you simply describe what you want to see — and the skill generates a video that matches your vision, tone, and style.

This is not a generic video template tool. Whether you want a sweeping aerial shot of a mountain at sunrise, a product demonstration in a sleek studio setting, or an animated explainer with a specific visual mood, the skill interprets natural language prompts with remarkable nuance. You can specify lighting conditions, camera movement, color palette, pacing, and narrative context all within your description.

Content creators producing social media reels, marketing teams building campaign assets, educators crafting visual lessons, and indie filmmakers prototyping scenes will all find immediate value here. The skill removes the production barrier entirely, letting you focus on the creative idea rather than the technical execution. Generate multiple variations from a single prompt and pick the one that resonates most with your audience.

Prompt Routing and Request Handling

Each text-to-video request is parsed for scene descriptors, motion cues, aspect ratio, and duration before being dispatched to the appropriate Sora generation pipeline.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo Backend API Reference

The NemoVideo backend orchestrates your Sora video generation jobs by queuing diffusion render tasks, managing frame synthesis, and returning a streamable MP4 output URL upon completion. Latency varies based on clip length, resolution tier, and current render queue depth.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Performance Notes

Generation time for the sora-ai-video-generator skill varies based on clip length, complexity of the scene, and the level of motion detail requested. Simple static or slow-motion scenes with minimal subjects typically render faster than complex multi-element scenes with rapid camera movement.

For best results, keep initial prompts under 300 characters and avoid combining too many conflicting visual styles in a single request — for example, asking for both a hand-drawn animation aesthetic and photorealistic textures simultaneously may produce inconsistent output.

Higher-resolution outputs and longer clip durations will naturally require more processing time. If you are generating video for a time-sensitive campaign, start with shorter clips to validate the visual direction before scaling up to longer sequences. The skill supports mp4, mov, avi, webm, and mkv formats, so specify your preferred container early to avoid unnecessary conversion steps downstream.

Best Practices

Getting the most out of the sora-ai-video-generator skill comes down to the specificity and clarity of your prompts. Vague descriptions like 'make a cool video' produce generic results, while detailed scene descriptions unlock the full creative range of the skill.

Always include key visual parameters in your prompt: setting, time of day, lighting style, camera movement, subject action, and emotional tone. For example, instead of 'a forest scene,' try 'a misty old-growth forest at dawn with shafts of light filtering through tall redwoods and a slow forward dolly movement.'

If you need a specific output format such as mp4 for web delivery or mov for editing pipelines, mention it in your request. For iterative work, generate two or three variations of the same prompt with slight wording changes to compare results. Shorter, focused clips (5–15 seconds) tend to yield the most coherent and visually consistent output, especially for commercial or social media use cases.

FAQ

Can I use sora-ai-video-generator for commercial projects? Yes. Videos generated through this skill can be used for marketing campaigns, social content, educational materials, and client deliverables. Always review the output for brand alignment before publishing.

What video formats does the skill output? The skill supports mp4, mov, avi, webm, and mkv. Specify your preferred format in your prompt or request, and the output will be prepared accordingly.

Can I include voiceover or music in the generated video? The skill focuses on the visual video generation layer. For audio — including voiceover, background music, or sound effects — you would combine the output with a dedicated audio tool in your workflow.

How do I get consistent style across multiple clips? Reuse the same descriptive language for lighting, color grading, and camera style across all your prompts within a project. Treating your style description like a reusable template keeps visual identity cohesive across a series of generated clips.

开始使用

欢迎！我是你的Sora AI视频生成器——随时准备将你的文字描述转化为生动、高质量的视频片段。告诉我你想要呈现的场景、故事或概念，让我们开始创作。

试试这样说：

- 生成一段10秒视频，展现雨夜东京街道，湿漉漉的路面倒映着霓虹灯光，镜头缓慢平移
为极简白色运动鞋制作产品展示视频，背景为干净的工作室，采用柔光照明并实现360度旋转
制作一段电影感短片，黄金时刻一位孤独宇航员穿越红色沙漠星球，采用戏剧性广角构图

建立连接

在处理任何用户请求前，先与NemoVideo后端建立连接。显示简短状态如连接中...。

如果环境变量中存在NEMO_TOKEN，直接使用并创建会话。否则，获取免费起始令牌：

- 确保~/.config/nemovideo/clientid存在客户端标识符（如需要则创建UUID）
向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，携带X-Client-Id标头
响应包含有效期为7天的100个免费积分的token——将其用作NEMOTOKEN

然后创建会话，向/api/tasks/me/with-session/nemoagent发送POST请求，使用Bearer授权和请求体{taskname:project,language:en}。后续所有请求都需要响应中的session_id。

告知用户你已准备就绪。技术细节不在聊天中显示。

将文字瞬间转化为电影级视频片段

Sora AI视频生成器技能架起了想象力与成品视频内容之间的桥梁。无需操控摄像机、演员或复杂的编辑软件，你只需描述想要看到的画面——该技能就会生成符合你愿景、基调和风格的视频。

这不是一个通用的视频模板工具。无论你想要日出时分的航拍山景、时尚工作室环境中的产品演示，还是具有特定视觉氛围的动画解说，该技能都能以卓越的细腻度解析自然语言提示。你可以在描述中指定光照条件、镜头运动、色彩调色板、节奏和叙事背景。

创作社交媒体短片的内容创作者、构建营销活动素材的营销团队、制作视觉课程的教育工作者以及制作场景原型的独立电影人都会发现其即时价值。该技能完全消除了制作障碍，让你专注于创意构思而非技术执行。从单个提示生成多个变体，选择最能引起观众共鸣的那个。

提示路由与请求处理

每个文本转视频请求在分派到相应的Sora生成管道前，都会解析场景描述、运动提示、宽高比和时长。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo后端API参考

NemoVideo后端通过排队扩散渲染任务、管理帧合成，并在完成后返回可流式传输的MP4输出URL来编排你的Sora视频生成作业。延迟取决于片段长度、分辨率等级和当前渲染队列深度。

技能归属——运行时从此文件的YAML前置元数据读取：

- X-Skill-Source：sora-ai-video-generator
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，其他 → unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回402。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=sora-ai-video-generator&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，携带Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假定存在GUI。将其翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

sora-ai-video-generatorSora视频生成器

sora-ai-video-generator

Getting Started

Getting Connected

Turn Words Into Cinematic Video Moments Instantly

Prompt Routing and Request Handling

NemoVideo Backend API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Performance Notes

Best Practices

FAQ

开始使用

建立连接

将文字瞬间转化为电影级视频片段

提示路由与请求处理

NemoVideo后端API参考

SSE事件处理

后端响应翻译

错误处理

性能说明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

sora-ai-video-generatorSora视频生成器

sora-ai-video-generator

Getting Started

Getting Connected

Turn Words Into Cinematic Video Moments Instantly

Prompt Routing and Request Handling

NemoVideo Backend API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Performance Notes

Best Practices

FAQ

开始使用

建立连接

将文字瞬间转化为电影级视频片段

提示路由与请求处理

NemoVideo后端API参考

SSE事件处理

后端响应翻译

错误处理

性能说明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement