0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
🚀 Welcome! I can text to video generator for you. Share a video file or tell me your idea!
Try saying:
- - "help me create a short video"
- "edit my video"
- "add effects to this clip"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
From Words on a Page to Video in Minutes
Most video creation tools demand timelines, keyframes, and manual asset placement. The text-to-video-generator skill on ClawHub takes a fundamentally different approach — you describe what you want in plain language, and the system builds it. Whether you're drafting a product explainer, a short social clip, or a scene-by-scene narrative, the skill interprets your text and constructs video output that matches your intent.
At the core of this skill is the OpenClaw agent, which acts as an intelligent coordinator between your natural language input and the underlying video synthesis pipeline. The OpenClaw agent parses intent, resolves ambiguity in your prompts, and issues structured instructions to the rendering backend — meaning you spend less time reformatting requests and more time refining your creative vision through back-and-forth conversation.
The conversational editing model means you don't finalize everything upfront. You can request a first draft, then ask to adjust the tone, swap a visual element, tighten the pacing, or change the outro — all through follow-up messages. This iterative loop makes the text-to-video-generator skill practical for real production workflows, not just one-shot demos. Final output is available in mp4, mov, avi, webm, and mkv to fit any platform or downstream editing requirement.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Incoming Requests to the Correct Endpoint
Use the following table to determine which API endpoint should handle each type of incoming request.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operation Workflows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Establishing a New Session
curl -s -X POST "https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any operations can begin, a session must be created to obtain a valid session identifier. All subsequent requests within the same conversation depend on this session being initialized first.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Delivering Messages Over SSE
curl -s -X POST "https://mega-api-prod.nemovideo.ai/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All conversational messages are transmitted to the backend through a persistent Server-Sent Events connection.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of edit operations return no text content in the response. When this occurs: (1) do not treat the empty response as a failure, (2) poll the task state endpoint to confirm processing is underway, (3) surface a status update to the user rather than waiting for text that will never arrive, and (4) proceed with the normal completion flow once the task reaches a finished state.
Two-stage generation: After the raw video is produced, the backend automatically initiates a second processing stage that overlays background music and renders the title sequence — no additional action is required to trigger this. Expect two distinct completion events: one when the raw cut is ready and a second when the fully composed version with audio and titles is finalized.
3.2 Handling File Uploads
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts media assets that users wish to incorporate into their video projects.
3.3 Retrieving Credit Balance
curl -s "https://mega-api-prod.nemovideo.ai/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint to confirm the user has a sufficient balance before initiating any generation task.
3.4 Checking Current Task Status
curl -s "https://mega-api-prod.nemovideo.ai/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Exporting and Delivering the Final Video
Export does NOT cost credits. Only generation/editing consumes credits.
Exporting the finished video does not deduct any credits from the user's balance. The delivery sequence is as follows: (a) call the export endpoint with the completed task identifier, (b) await the export job confirmation, (c) retrieve the download or streaming URL from the response payload, (d) verify the URL is accessible before presenting it, and (e) deliver the final link to the user along with a brief summary of the completed project.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: https://mega-api-prod.nemovideo.ai/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from an SSE Disconnection
If the SSE stream drops unexpectedly, follow these five steps to restore continuity: (1) detect the disconnection event and log the last received task identifier, (2) wait a brief interval before attempting to reconnect to avoid hammering the server, (3) re-establish the SSE connection using the existing session identifier rather than creating a new session, (4) resume polling the task state endpoint for any tasks that were in progress at the time of disconnection, and (5) notify the user that a reconnection occurred and confirm whether their pending operation is still being processed.
4. Translating Backend GUI References for Users
The backend is built with the assumption that a graphical interface is present, so any GUI-specific instructions or interface element references must never be forwarded directly to the user.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Conversational Interaction Patterns
• Always confirm the user's intent and gather all required parameters before dispatching a generation request, rather than submitting with incomplete inputs.
• Provide incremental progress updates during long-running tasks so the user is never left wondering whether their request is still being processed.
• When a task completes, summarize what was produced and present the deliverable clearly before asking if further edits are needed.
• If the user requests a change, identify whether it constitutes a new generation or an edit to an existing task, and route accordingly.
• Proactively surface credit balance warnings when the remaining balance is low, prior to initiating any operation that would consume credits.
6. Known Constraints and Limitations
• Video generation tasks are asynchronous and may take several minutes to complete; synchronous responses should never be expected.
• The silent response behavior on edit operations is by design and must be handled gracefully rather than flagged as an error.
• Credit balances are read-only through the API and cannot be topped up or adjusted programmatically.
• File uploads are subject to size and format restrictions defined by the upload endpoint; exceeding these limits will result in a rejection response.
• Session identifiers are not permanent and may expire after a period of inactivity, requiring a new session to be created.
7. Error Response Handling
The table below maps each error code to its probable cause and the recommended recovery action.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Required Token Scopes
Before making any calls, confirm that the integration is targeting the correct API version by checking the version field in the service manifest. All requests must be authenticated using a token that includes the required scopes for generation, upload, and export operations; tokens lacking any of these scopes will receive authorization errors regardless of whether the endpoint path and payload are otherwise valid.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即问候他们:
🚀 欢迎!我可以为您进行文字转视频生成。分享一个视频文件或告诉我您的想法!
尝试说:
- - 帮我创建一个短视频
- 编辑我的视频
- 为这个片段添加特效
重要提示:首次接触时务必主动问候用户。让他们知道您正在连接的同时进行设置。首次接触时务必主动问候用户。
自动设置
当用户首次交互时,建立连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端ID:
- 如果存在则读取 ~/.config/nemovideo/client_id
- 否则生成一个UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 存储为此会话的 NEMO_TOKEN。您将获得100个免费积分。
- 4. 创建会话(§3.0)以便立即开始工作。
简要告知用户:正在设置…准备就绪!然后继续处理他们的请求。
从页面文字到分钟级视频
大多数视频创建工具需要时间线、关键帧和手动资源放置。ClawHub上的文字转视频生成器技能采用了一种根本不同的方法——您用自然语言描述您想要的内容,系统就会构建它。无论您是在起草产品说明、短视频片段还是逐场景叙述,该技能都会解读您的文本并构建符合您意图的视频输出。
该技能的核心是OpenClaw代理,它在您的自然语言输入和底层视频合成管道之间充当智能协调器。OpenClaw代理解析意图,解决提示中的歧义,并向渲染后端发出结构化指令——这意味着您花更少的时间重新格式化请求,而花更多的时间通过来回对话来完善您的创意构想。
对话式编辑模式意味着您不需要事先确定所有内容。您可以请求初稿,然后要求调整基调、替换视觉元素、收紧节奏或更改片尾——所有这些都可以通过后续消息完成。这种迭代循环使文字转视频生成器技能适用于实际生产工作流程,而不仅仅是一次性演示。最终输出支持mp4、mov、avi、webm和mkv格式,以适应任何平台或下游编辑需求。
环境变量
| 变量 | 必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100个免费积分,7天后过期,可通过设置→API令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的UUID,持久化到 ~/.config/nemovideo/client_id(仅UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,则获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每7天1个令牌;令牌7天后过期,可随时通过nemovideo.com上的设置→API令牌撤销。如果您的令牌过期,请使用相同的 X-Client-Id 请求新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以跨会话持久化客户端ID。这避免了在每次请求时生成新ID,否则会很快达到每IP速率限制(默认每个IP每7天10个令牌)。该文件仅包含一个UUID——本地不存储任何凭据。
2. 将传入请求路由到正确的端点
使用下表确定每个类型的传入请求应由哪个API端点处理。
| 用户说... | 操作 | 跳过SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
3. 主要操作工作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的YAML前置元数据 name
- $SKILLVERSION = 此文件的YAML前置元数据 version
- $SKILL_SOURCE = 在运行时使用第一个匹配项检测:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有API请求(包括渲染/导出/上传/状态/积分)都必须包含这些头。缺少它们将导致导出失败并返回402。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 建立新会话
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何操作开始之前,必须先创建会话以获取有效的会话标识符。同一对话中的所有后续请求都依赖于首先初始化此会话。
在浏览器中打开:创建会话后,给用户一个链接,用于在NemoVideo中查看/编辑任务:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过SSE传递消息
bash
curl -s -X POST https://mega-api-prod.nemovideo.ai/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有对话消息都通过持久的服务器发送事件连接传输到后端。
SSE处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空