Getting Started
Send me your script, concept, or images and I'll craft the best AI video generation prompt to bring it to life. No assets? Just describe your video idea and we'll start from scratch.
Try saying:
- - "I have a 200-word product description for a skincare brand — help me turn it into a 30-second AI-generated video with a clean, modern aesthetic for Instagram Reels."
- "Generate a detailed text-to-video prompt for a cinematic nature documentary intro showing a misty forest at dawn with slow camera movement and ambient sound cues."
- "I want to create an AI explainer video for a SaaS onboarding flow — help me structure the scenes, choose a visual style, and write generation prompts for each segment."
First-Time Connection
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
- 1. Obtain a free token: Generate a random UUID as client identifier. POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Generate Professional Videos From Any Starting Point
Whether you have a fully written script, a rough idea, or just a handful of images, this skill transforms your input into compelling video content. Using the best AI video generation approaches available, it guides you through prompt crafting, scene sequencing, visual style selection, and output formatting — so your final video matches your vision and your platform.
This isn't a one-size-fits-all generator. The skill adapts to your specific use case — whether you're producing a product demo, a short-form social reel, an explainer video, a cinematic brand story, or a training module. You describe what you want, and the skill helps you build the most effective generation prompt or workflow to get there.
Creators who are new to AI video tools will find clear guidance on how to structure requests for maximum quality. Experienced users can go deeper — fine-tuning motion styles, aspect ratios, pacing cues, and narrative arcs. The goal is always the same: fewer iterations, better results, and video that actually gets used.
Routing Your Video Prompts
Each request — whether text-to-video, image-to-video, or style transfer — is parsed and dispatched to the optimal generation model based on your input type, resolution target, and motion complexity.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Video Generation API Reference
All rendering jobs run on distributed GPU clusters in the cloud, handling diffusion sampling, temporal coherence, and frame interpolation server-side so your device never bottlenecks the output quality. Latency scales with clip duration, resolution, and the number of motion keyframes requested.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE10 - INLINECODE11 : from frontmatter INLINECODE12
- INLINECODE13 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE23
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Performance Notes
AI video generation quality varies significantly based on how prompts are structured. Vague inputs like 'make a cool video' produce inconsistent, generic results — while specific prompts that define subject, motion, lighting, mood, camera angle, and duration consistently yield higher-quality outputs across all major generation platforms.
This skill is optimized to help you extract the best performance from tools like Runway, Sora, Kling, Pika, and Luma by building prompts with the right level of detail. Longer videos (over 10 seconds) benefit from scene-by-scene breakdowns rather than single monolithic prompts. For character consistency across scenes, the skill will flag when you need to use reference images or seed locking features.
Expect the best results when you provide clear context: the platform you're publishing to, the audience you're targeting, and any brand style guidelines. The more specific your input, the fewer generation attempts you'll need to reach a usable final output.
Integration Guide
This skill works as a prompt engineering and creative direction layer that sits in front of any AI video generation platform you're already using. You don't need to switch tools — you use this skill to build better inputs for the generator of your choice.
For teams using Runway Gen-3 or Kling, the skill produces motion-aware prompts that respect each platform's syntax preferences, including camera motion descriptors and negative prompt fields. For text-to-video workflows in Pika or Luma Dream Machine, it structures prompts around their scene duration limits and style tokens.
If you're building a content pipeline — for example, generating weekly social videos from a content calendar — this skill can help you create a reusable prompt template system so your outputs stay visually consistent across batches. Bring your content brief, brand guidelines, or existing video examples and the skill will reverse-engineer a repeatable generation framework for your team.
开始使用
将你的脚本、概念或图片发送给我,我会精心制作最佳的AI视频生成提示,将其变为现实。没有素材?只需描述你的视频创意,我们可以从零开始。
试试这样说:
- - 我有一段200字的护肤品品牌产品描述——帮我将其转化为一段30秒的AI生成视频,采用简洁现代的美学风格,适用于Instagram Reels。
- 生成一个详细的文本转视频提示,用于制作电影级自然纪录片开场,展现黎明时分的雾林,配合缓慢的镜头运动和氛围音效。
- 我想为SaaS产品引导流程创建一个AI讲解视频——帮我规划场景结构、选择视觉风格,并为每个片段编写生成提示。
首次连接
当用户首次打开此技能时,自动连接到处理后端。简要告知用户(例如正在设置...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至步骤2。
- 1. 获取免费令牌:生成一个随机UUID作为客户端标识符。向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头X-Client-Id设置为该UUID。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,包含Authorization: Bearer 、Content-Type: application/json以及请求体{taskname:project,language:<检测到的语言>}。保存返回的sessionid用于所有后续请求。
保持设置过程的沟通简洁。不要向用户显示原始API响应或令牌值。
从任何起点生成专业视频
无论你拥有完整的脚本、粗略的想法,还是只有几张图片,此技能都能将你的输入转化为引人入胜的视频内容。利用现有最佳的AI视频生成方法,它引导你完成提示编写、场景排序、视觉风格选择和输出格式化——让你的最终视频符合你的愿景和平台需求。
这不是一个一刀切的生成器。该技能会适应你的具体使用场景——无论你是在制作产品演示、短视频社交媒体Reels、讲解视频、电影级品牌故事还是培训模块。你描述你想要的内容,该技能会帮助你构建最有效的生成提示或工作流程来实现目标。
初次接触AI视频工具的创作者将获得关于如何构建请求以获得最高质量的清晰指导。有经验的用户可以更深入——微调运动风格、宽高比、节奏提示和叙事弧线。目标始终如一:更少的迭代次数、更好的结果,以及真正可用的视频。
路由你的视频提示
每个请求——无论是文本转视频、图片转视频还是风格迁移——都会根据你的输入类型、分辨率目标和运动复杂度被解析并分派到最优的生成模型。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
视频生成API参考
所有渲染任务在云端分布式GPU集群上运行,处理扩散采样、时间一致性和帧插值,因此你的设备永远不会成为输出质量的瓶颈。延迟随剪辑时长、分辨率和请求的运动关键帧数量而变化。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:best-ai-video-generation
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},包含Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流程 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从创建会话或状态响应中获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。请注册或升级你的计划以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 30秒后重试一次 |
性能说明
AI视频生成质量因提示结构的不同而有显著差异。像制作一个酷炫视频这样模糊的