0. First Contact

When the user opens this skill or sends their first message, greet them immediately:

🎵 Ai Text To Video Generator at your service! Upload a video or tell me what you're looking for.

Try saying:

- "add effects to this clip"
"help me create a short video"
"edit my video"

IMPORTANT: Do NOT wait silently. Always greet the user proactively on first contact.

Auto-Setup

When the user first interacts, set up the connection:

1. Check token: If NEMO_TOKEN env var is set, use it. Otherwise:
Read or generate Client-ID:

- Read ~/.config/nemovideo/client_id if it exists - Otherwise generate a UUID, save it to ~/.config/nemovideo/client_id

3. Acquire anonymous token:

   curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"

Store the returned token as NEMO_TOKEN for this session. You get 100 free credits.

4. Create a session (§3.0) so you're ready to work immediately.

Let the user know briefly: "Setting things up… ready!" then proceed with their request.

From a Single Sentence to a Finished Video

Most video creation tools demand timelines, keyframes, and export settings before you see a single frame. The ai-text-to-video-generator skill flips that entirely — you describe what you want in plain conversational language, and the skill handles the translation from words to moving images. Want a cinematic product reveal with warm lighting and slow motion? Just say so.

Behind the scenes, the OpenClaw agent acts as the orchestration layer between your natural-language instructions and the video rendering pipeline. It parses your intent, resolves ambiguities by asking clarifying follow-up questions when needed, and passes structured directives to the generation backend. This means your prompt isn't just keyword-matched — it's genuinely understood in context, preserving nuance like mood, pacing, and narrative arc across multi-scene requests.

The skill is particularly well-suited for iterative workflows. You can refine a generated clip by continuing the conversation — adjusting duration, swapping visual styles, or adding captions — without starting over. Each exchange builds on prior context, making the process feel less like operating a tool and more like collaborating with a director who remembers everything you've discussed.

Environment Variables

Variable	Required	Default
INLINECODE5	No	Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens)
INLINECODE6

If NEMO_TOKEN is not set, get one (requires X-Client-Id header):

# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
  CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
  mkdir -p ~/.config/nemovideo && echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}

Save token as NEMO_TOKEN, CLIENT_ID as NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same X-Client-Id.

Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.

2. Routing Incoming Requests to the Correct Endpoint

Use the table below to determine which API endpoint should handle each type of incoming request.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

3. Primary Operation Flows

$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.

Skill attribution variables — do NOT hardcode:

- $SKILL_NAME = this file's YAML frontmatter INLINECODE26
INLINECODE27 = this file's YAML frontmatter INLINECODE28
INLINECODE29 = detect at runtime using the first match:

1. Env SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g. ~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor)
3. Fallback: INLINECODE35

CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2

3.0 Establishing a New Session

curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
  -d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}

Before any other operations can proceed, a session must be initialized. This session token anchors all subsequent requests within the same working context.

Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36

3.1 Delivering Messages Over SSE

curl -s -X POST "$API/run_sse" \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
  -d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'

All conversational messages are transmitted to the backend through a persistent Server-Sent Events connection.

SSE Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.

Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).

Ignore trailing "I encountered a temporary issue" if prior responses were normal.

Silent Response Fallback (CRITICAL)

Approximately 30% of edit operations return no visible text in the response stream. When this occurs: (1) do not treat the absence of text as a failure; (2) poll the task state endpoint to confirm the job is progressing; (3) once the task reaches a completed state, proceed directly to the export flow; (4) inform the user that processing is underway rather than implying an error has occurred.

Two-stage generation: The backend automatically enriches the raw video output in a second pass — background music and a title overlay are appended without any additional instruction from the AI. Expect two distinct completion events: the first signals that the raw video clip is ready, and the second signals that the fully decorated version with BGM and title is available. Always wait for the second stage before presenting the final result to the user.

3.2 Handling File Uploads

File upload: INLINECODE39

URL upload: INLINECODE40

Use me in the path; backend resolves user from token.

Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

The upload endpoint accepts user-supplied media assets that can be referenced as source material during video generation.

3.3 Checking Available Credits

curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}

Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance to cover the operation.

3.4 Polling Current Task State

curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
  -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"

Use me for user in path; backend resolves from token. Key fields: data.state.draft, data.state.video_infos, data.state.canvas_config, data.state.generated_media.

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

Draft ready for export when draft.t exists with at least one track with non-empty sg.

Track summary format:
CODEBLOCK7

3.5 Exporting and Delivering the Final Video

Export does NOT cost credits. Only generation/editing consumes credits.

Triggering an export consumes no credits. To deliver the finished video: (a) call the export endpoint with the completed task identifier; (b) await the signed download URL in the response; (c) verify the URL resolves to a playable file; (d) present the URL or an embedded player to the user; (e) include the video title and any relevant metadata alongside the delivered link.

b) Submit: INLINECODE52

Note: sessionId is camelCase (exception). On failure → new id, retry once.

c) Poll (every 30s, max 10 polls): INLINECODE55

Status at top-level status: pending → processing → completed / failed. Download URL at output.url.

d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.

e) When delivering the video, always also give the task detail link: INLINECODE60

Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.

3.6 Recovering from an SSE Disconnection

If the SSE stream drops unexpectedly, follow these steps: (1) capture the last known task ID before the connection was lost; (2) wait a minimum of three seconds before attempting to reconnect to avoid hammering the server; (3) re-open the SSE stream using the same session token; (4) resume polling the task state endpoint with the saved task ID to determine current progress; (5) once a terminal state is confirmed, continue with the normal export and delivery flow as if no interruption occurred.

4. Translating Backend GUI References

The backend is designed around a graphical interface and will occasionally reference UI elements — never relay these GUI-specific instructions verbatim to the user.

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Keep content descriptions. Strip GUI actions.

5. Recommended Interaction Patterns

• Acknowledge the user's request immediately and set clear expectations about generation time before the task begins.
• Provide periodic progress updates by surfacing status information retrieved from the task state endpoint rather than leaving the user in silence.
• When a silent response is received, reassure the user that work is in progress instead of suggesting something went wrong.
• After the second-stage enrichment completes, present the final video alongside its title and a brief summary of what was created.
• If a recoverable error occurs, explain what happened in plain language and automatically retry using the established recovery flow before asking the user to take any action.

6. Known Limitations

• Video generation is asynchronous and cannot be accelerated — estimated wait times should be communicated honestly to the user.
• The AI cannot modify or override background music or title overlays applied during the automatic second-stage enrichment pass.
• File uploads are subject to size and format restrictions enforced by the upload endpoint; unsupported files will be rejected before generation begins.
• Credit balances are read-only from the AI's perspective; the AI can report a balance but cannot add, transfer, or adjust credits.
• SSE connections may be dropped by network intermediaries on long-running tasks; the disconnect recovery flow must be implemented to ensure reliable delivery.

7. Error Identification and Handling

The table below maps common HTTP status codes and API error identifiers to their causes and the recommended corrective action.

Code	Meaning	Action
0	Success	Continue
1001

Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.

8. API Version and Required Token Scopes

Always verify that requests target the documented API version before going live; calls made against a deprecated version may return unexpected results or fail silently. The access token supplied with each request must carry all required scopes for the operations being performed — missing scopes will result in authorization errors even when the token itself is otherwise valid. Confirm both the version header and scope grants during initial integration testing.

0. 首次接触

当用户打开此技能或发送第一条消息时，立即问候他们：

🎵 AI文本转视频生成器为您服务！上传视频或告诉我您想要什么。

试试说：

- 给这个片段添加特效
帮我创建一个短视频
编辑我的视频

重要提示：不要沉默等待。首次接触时务必主动问候用户。

自动设置

当用户首次交互时，建立连接：

1. 检查令牌：如果设置了NEMO_TOKEN环境变量，则使用它。否则：
读取或生成客户端ID：

- 如果存在，读取~/.config/nemovideo/client_id - 否则生成一个UUID，保存到~/.config/nemovideo/client_id

3. 获取匿名令牌：

bash curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

将返回的token存储为本会话的NEMO_TOKEN。您将获得100个免费积分。

4. 创建会话（§3.0），以便立即开始工作。

简要告知用户：正在设置…准备就绪！然后继续处理他们的请求。

从一句话到完成视频

大多数视频创建工具要求您先设置时间线、关键帧和导出设置，然后才能看到一帧画面。ai-text-to-video-generator技能完全颠覆了这一过程——您用日常对话语言描述您想要的内容，该技能负责将文字转化为动态影像。想要一个带有暖光和慢动作的电影级产品展示？只需说出来即可。

在幕后，OpenClaw代理充当您的自然语言指令与视频渲染管线之间的编排层。它解析您的意图，在必要时通过询问澄清性问题来解决歧义，并将结构化指令传递给生成后端。这意味着您的提示不仅仅是关键词匹配——而是在上下文中被真正理解，保留了跨多场景请求中的情绪、节奏和叙事弧线等细微差别。

该技能特别适合迭代工作流程。您可以通过继续对话来优化生成的片段——调整时长、切换视觉风格或添加字幕——而无需重新开始。每次交流都基于先前的上下文，使整个过程感觉不像是在操作工具，更像是与一位记得您讨论过的所有内容的导演合作。

环境变量

变量	必需	默认值
NEMOTOKEN	否	自动生成（100个免费积分，7天后过期，可通过设置→API令牌撤销）
NEMOAPI_URL

如果未设置NEMO_TOKEN，获取一个（需要X-Client-Id头）：
bash

生成或读取持久化的客户端ID

CLIENTID=${NEMOCLIENTID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo && echo $CLIENTID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

将token保存为NEMOTOKEN，CLIENTID保存为NEMOCLIENTID。匿名：每个客户端每7天1个令牌；令牌在7天后过期，可随时通过nemovideo.com上的设置→API令牌撤销。如果您的令牌过期，使用相同的X-Client-Id请求一个新令牌。

本地持久化： 此技能写入~/.config/nemovideo/client_id以在会话间持久化客户端ID。这避免了每次请求都生成新ID，从而快速达到每IP速率限制（默认每IP每7天10个令牌）。该文件仅包含一个UUID——本地不存储任何凭据。

2. 将传入请求路由到正确的端点

使用下表确定每种类型的传入请求应由哪个API端点处理。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有（生成、编辑、添加背景音乐…） | → §3.1 SSE | ❌ |

3. 主要操作流程

$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai}，$TOKEN = ${NEMOTOKEN}，$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。

技能归属变量——不要硬编码：

- $SKILLNAME = 此文件的YAML前置元数据name
$SKILLVERSION = 此文件的YAML前置元数据version
$SKILL_SOURCE = 在运行时使用第一个匹配项检测：

1. 如果设置了环境变量SKILL_SOURCE
2. 如果此文件的路径包含已知的平台目录（例如~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor）
3. 回退：unknown

关键：所有API请求（包括渲染/导出/上传/状态/积分）必须包含这些头。缺少它们将导致导出失败并返回402。

X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE

3.0 建立新会话

bash curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \ -d {task_name:project,language:}

→ {code:0,data:{taskid:...,sessionid:...}}

在任何其他操作进行之前，必须先初始化一个会话。此会话令牌将所有后续请求锚定在同一个工作上下文中。

在浏览器中打开：创建会话后，给用户一个链接，以便在NemoVideo中查看/编辑任务：
$WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE

3.1 通过SSE传递消息

bash curl -s -X POST $API/run_sse \ -H Authorization: Bearer $TOKEN -H Content-Type: application/json \ -H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \ -d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}

所有对话消息通过持久的服务器发送事件连接传输到后端。

SSE处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

典型时长：文本5-15秒，视频生成100-300秒，编辑10-30秒。

超时：仅心跳持续10分钟→视为超时。生成期间切勿重新发送（重复+双倍收费）。

如果之前的响应正常，忽略末尾的我遇到了

ai-text-to-video-generatorAI文字转视频

ai-text-to-video-generator

0. First Contact

Auto-Setup

From a Single Sentence to a Finished Video

Environment Variables

2. Routing Incoming Requests to the Correct Endpoint

3. Primary Operation Flows

3.0 Establishing a New Session

3.1 Delivering Messages Over SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling Current Task State

3.5 Exporting and Delivering the Final Video

3.6 Recovering from an SSE Disconnection

4. Translating Backend GUI References

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Identification and Handling

8. API Version and Required Token Scopes

0. 首次接触

自动设置

从一句话到完成视频

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 将传入请求路由到正确的端点

3. 主要操作流程

3.0 建立新会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ai-text-to-video-generatorAI文字转视频

ai-text-to-video-generator

0. First Contact

Auto-Setup

From a Single Sentence to a Finished Video

Environment Variables

2. Routing Incoming Requests to the Correct Endpoint

3. Primary Operation Flows

3.0 Establishing a New Session

3.1 Delivering Messages Over SSE

SSE Handling

Silent Response Fallback (CRITICAL)

3.2 Handling File Uploads

3.3 Checking Available Credits

3.4 Polling Current Task State

3.5 Exporting and Delivering the Final Video

3.6 Recovering from an SSE Disconnection

4. Translating Backend GUI References

5. Recommended Interaction Patterns

6. Known Limitations

7. Error Identification and Handling

8. API Version and Required Token Scopes

0. 首次接触

自动设置

从一句话到完成视频

环境变量

生成或读取持久化的客户端ID

→ {code:0,data:{token:nmvusrxxx,credits:100,...}}

2. 将传入请求路由到正确的端点

3. 主要操作流程

3.0 建立新会话

→ {code:0,data:{taskid:...,sessionid:...}}

3.1 通过SSE传递消息

SSE处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement