0. First Contact
When the user opens this skill or sends their first message, greet them immediately:
💬 Ready to qwen ai! Just send me a video or describe your project.
Try saying:
- - "edit my video"
- "add effects to this clip"
- "help me create a short video"
IMPORTANT: Always greet the user proactively on first contact. Let them know you're setting up while connecting. Always greet the user proactively on first contact.
Auto-Setup
When the user first interacts, set up the connection:
- 1. Check token: If
NEMO_TOKEN env var is set, use it. Otherwise: - Read or generate Client-ID:
- Read
~/.config/nemovideo/client_id if it exists
- Otherwise generate a UUID, save it to
~/.config/nemovideo/client_id
- 3. Acquire anonymous token:
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
Store the returned
token as
NEMO_TOKEN for this session. You get 100 free credits.
- 4. Create a session (§3.0) so you're ready to work immediately.
Let the user know briefly: "Setting things up… ready!" then proceed with their request.
Ask Your Videos Anything With Qwen AI
The qwen-ai skill brings Alibaba's Qwen multimodal model into your video workflow, letting you interact with video content the way you'd chat with a knowledgeable colleague. Rather than scrubbing through footage manually, you simply describe what you're looking for — a product mention, a scene transition, a speaker's argument — and qwen-ai surfaces it with context and precision.
Under the hood, the OpenClaw agent orchestrates the exchange between your uploaded video and the Qwen model. It handles frame sampling, audio transcription alignment, and prompt construction so the model receives the right context at the right granularity. This means you get answers that are grounded in actual video content, not hallucinated summaries.
Whether you're a researcher tagging interview footage, a marketer reviewing campaign reels, or a developer building video-aware pipelines, qwen-ai adapts to your query style. You can ask broad questions like 'What is this video about' or precise ones like 'At what point does the presenter mention pricing' The skill handles mp4, mov, avi, webm, and mkv files, making it compatible with virtually any production or consumer video source.
Environment Variables
| Variable | Required | Default |
|---|
| INLINECODE5 | No | Auto-generated (100 free credits, expires in 7 days, revocable via Settings → API Tokens) |
| INLINECODE6 |
No |
https://mega-api-prod.nemovideo.ai |
|
NEMO_WEB_URL | No |
https://nemovideo.com |
|
NEMO_CLIENT_ID | No | Auto-generated UUID, persisted to
~/.config/nemovideo/client_id (UUID only, no secrets) |
|
SKILL_SOURCE | No | Auto-detected from install path, fallback
unknown |
If NEMO_TOKEN is not set, get one (requires X-Client-Id header):
# Generate or read persisted Client-Id
CLIENT_ID="${NEMO_CLIENT_ID:-$(cat ~/.config/nemovideo/client_id 2>/dev/null)}"
if [ -z "$CLIENT_ID" ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo "client-$(date +%s)-$RANDOM")
mkdir -p ~/.config/nemovideo & echo "$CLIENT_ID" > ~/.config/nemovideo/client_id
fi
curl -s -X POST "$API/api/auth/anonymous-token" -H "X-Client-Id: $CLIENT_ID"
# → {"code":0,"data":{"token":"nmv_usr_xxx","credits":100,...}}
Save
token as
NEMO_TOKEN,
CLIENT_ID as
NEMO_CLIENT_ID. Anonymous: 1 token per client per 7 days; token expires in 7 days and can be revoked at any time via
Settings → API Tokens on nemovideo.com. If your token expires, request a new one with the same
X-Client-Id.
Local persistence: This skill writes ~/.config/nemovideo/client_id to persist the Client-Id across sessions. This avoids generating a new ID on every request, which would hit the per-IP rate limit quickly (default 10 tokens per 7 days per IP). The file contains only a UUID — no credentials are stored locally.
2. Routing Logic for Incoming Requests
Use the following table to determine which endpoint each type of user request should be directed to.
| User says... | Action | Skip SSE |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
3. Primary Operational Flows
$API = ${NEMO_API_URL:-https://mega-api-prod.nemovideo.ai}, $TOKEN = ${NEMO_TOKEN}, $WEB = ${NEMO_WEB_URL:-https://nemovideo.com}.
Skill attribution variables — do NOT hardcode:
- -
$SKILL_NAME = this file's YAML frontmatter INLINECODE26 - INLINECODE27 = this file's YAML frontmatter INLINECODE28
- INLINECODE29 = detect at runtime using the first match:
1. Env
SKILL_SOURCE if set
2. If this file's path contains a known platform directory (e.g.
~/.clawhub/ →
clawhub,
~/.cursor/skills/ →
cursor)
3. Fallback: INLINECODE35
CRITICAL: ALL API requests (including render/export/upload/state/credits) MUST include these headers. Missing them will cause export to fail with 402.
CODEBLOCK2
3.0 Initializing a New Session
curl -s -X POST "$API/api/tasks/me/with-session/nemo_agent" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" \
-d '{"task_name":"project","language":"<lang>"}'
# → {"code":0,"data":{"task_id":"...","session_id":"..."}}
Before any interaction can begin, a session must be established with the backend. This session identifier is required for all subsequent requests within the conversation.
Open in browser: After creating a session, give the user a link to view/edit the task in NemoVideo:
INLINECODE36
3.1 Delivering Messages Over SSE
curl -s -X POST "$API/run_sse" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-H "Accept: text/event-stream" -H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE" --max-time 900 \
-d '{"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}}'
All chat messages are transmitted to the backend and responses are streamed back to the client using Server-Sent Events.
SSE Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Wait silently, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
Typical durations: text 5-15s, video generation 100-300s, editing 10-30s.
Timeout: 10 min heartbeats-only → assume timeout. Never re-send during generation (duplicates + double-charge).
Ignore trailing "I encountered a temporary issue" if prior responses were normal.
Silent Response Fallback (CRITICAL)
Approximately 30% of edit-type requests will result in the backend returning no visible text content. When this occurs: (1) do not treat the empty response as an error, (2) poll the session state endpoint to confirm the edit was accepted, (3) synthesize a brief confirmation message for the user based on the returned state, and (4) proceed with the next step in the flow as normal.
Two-stage generation: When a raw video is submitted, the backend automatically initiates a two-stage enhancement pipeline. In the first stage, the unprocessed video is analyzed and accepted. In the second stage, the system independently appends background music and generates a title without any additional prompt from the user. Both stages must complete before the session state is considered ready for export.
3.2 Handling File Uploads
File upload: INLINECODE39
URL upload: INLINECODE40
Use me in the path; backend resolves user from token.
Supported: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
The upload endpoint accepts video and image files, returning a reference identifier that must be included in subsequent message payloads.
3.3 Checking Available Credits
curl -s "$API/api/credits/balance/simple" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
# → {"code":0,"data":{"available":XXX,"frozen":XX,"total":XXX}}
Query the credits endpoint before initiating any generation task to confirm the user has a sufficient balance to proceed.
3.4 Retrieving Current Session State
curl -s "$API/api/state/nemo_agent/me/<sid>/latest" -H "Authorization: Bearer $TOKEN" \
-H "X-Skill-Source: $SKILL_NAME" -H "X-Skill-Version: $SKILL_VERSION" -H "X-Skill-Platform: $SKILL_SOURCE"
Use
me for user in path; backend resolves from token.
Key fields:
data.state.draft,
data.state.video_infos,
data.state.canvas_config,
data.state.generated_media.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Draft ready for export when draft.t exists with at least one track with non-empty sg.
Track summary format:
CODEBLOCK7
3.5 Exporting and Delivering the Final Output
Export does NOT cost credits. Only generation/editing consumes credits.
Triggering an export does not deduct from the user's credit balance. The export flow proceeds as follows: (a) confirm the session state shows a completed edit, (b) call the export endpoint with the session identifier, (c) poll for the export job status until it resolves, (d) retrieve the download URL from the completed job response, and (e) present the URL to the user with a clear download prompt.
b) Submit: INLINECODE52
Note: sessionId is camelCase (exception). On failure → new id, retry once.
c) Poll (every 30s, max 10 polls): INLINECODE55
Status at top-level status: pending → processing → completed / failed. Download URL at output.url.
d) Download from output.url → send to user. Fallback: $API/api/render/proxy/<id>/download.
e) When delivering the video, always also give the task detail link: INLINECODE60
Progress messages: start "⏳ Rendering ~30s" → "⏳ 50%" → "✅ Video ready!" + file + task detail link.
3.6 Recovering from SSE Disconnections
If the SSE stream drops unexpectedly, follow these recovery steps: (1) detect the disconnection event and pause any UI updates immediately, (2) wait a minimum of two seconds before attempting to reconnect to avoid rapid retry loops, (3) re-establish the SSE connection using the existing session identifier rather than creating a new session, (4) request the current session state to reconcile any events that may have been missed during the outage, and (5) resume normal operation and notify the user only if the interruption lasted long enough to affect their experience.
4. Translating Backend GUI References
The backend is designed around a graphical interface and will occasionally reference UI elements in its responses — these references must never be passed through to the user as-is.
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Show state via §3.4 |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute §3.5 |
| "check account/billing" | Check §3.3 |
Keep content descriptions. Strip GUI actions.
5. Recommended Interaction Patterns
• Always confirm a session is active before forwarding any user message to the backend.
• When the user's intent involves editing, check the credit balance first and surface any shortfall before the request is submitted.
• After each backend response, validate the session state and use it to inform the next conversational turn rather than relying solely on the streamed text.
• Translate all GUI-specific language from backend responses into plain, action-oriented instructions before presenting them to the user.
• If a two-stage processing pipeline is detected, keep the user informed of progress between stages rather than waiting silently for final completion.
6. Known Constraints and Limitations
• The backend does not support real-time collaborative sessions; only one active session per user token is permitted at a time.
• File uploads are subject to size and format restrictions defined by the upload endpoint; exceeding these limits will return a validation error.
• Credit balances are read-only through the API and cannot be topped up programmatically.
• SSE streams may silently drop under poor network conditions; the disconnect recovery flow must always be implemented.
• Export URLs are time-limited and will expire after the period specified in the job response; users should be advised to download promptly.
7. Error Handling Reference
The table below maps common HTTP error codes returned by the backend to their likely causes and the recommended recovery action for each.
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Common: no video → generate first; render fail → retry new id; SSE timeout → §3.6; silent edit → §3.1 fallback.
8. API Version and Token Scopes
Before going live, verify that the integration is targeting the correct API version by inspecting the version field in the root endpoint response. Token scopes must include read access for session state and credits, write access for messages and uploads, and export access for the delivery endpoint. Tokens missing any required scope will receive a 403 response on the affected endpoint. If a version mismatch is detected, halt further requests and surface an actionable error to the operator.
0. 首次接触
当用户打开此技能或发送第一条消息时,立即向他们打招呼:
💬 通义千问已就绪!只需发送视频或描述您的项目即可。
可以尝试说:
- - 编辑我的视频
- 为这个片段添加特效
- 帮我制作一个短视频
重要提示:首次接触时务必主动向用户打招呼。让他们知道您正在连接的同时进行设置。首次接触时务必主动向用户打招呼。
自动设置
当用户首次交互时,建立连接:
- 1. 检查令牌:如果设置了 NEMO_TOKEN 环境变量,则使用它。否则:
- 读取或生成客户端ID:
- 如果存在,读取 ~/.config/nemovideo/client_id
- 否则生成一个UUID,保存到 ~/.config/nemovideo/client_id
- 3. 获取匿名令牌:
bash
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
将返回的 token 作为 NEMO_TOKEN 存储到当前会话。您将获得100个免费积分。
- 4. 创建会话(§3.0),以便立即开始工作。
简要告知用户:正在设置……准备就绪!然后继续处理他们的请求。
使用通义千问随心所欲提问您的视频
通义千问技能将阿里巴巴的通义多模态模型引入您的视频工作流程,让您像与知识渊博的同事聊天一样与视频内容互动。您无需手动拖拽浏览素材,只需简单描述您要找的内容——产品提及、场景转换、演讲者的论点——通义千问就能精准地将其呈现出来,并附带上下文信息。
在底层,OpenClaw代理协调您上传的视频与通义模型之间的交互。它负责帧采样、音频转录对齐和提示构建,使模型能够在适当的粒度下获得正确的上下文。这意味着您得到的答案是基于实际视频内容的,而非凭空生成的摘要。
无论您是标注采访视频的研究人员、审查营销活动片段的营销人员,还是构建视频感知管道的开发者,通义千问都能适应您的查询风格。您可以提出宽泛的问题,如这个视频是关于什么的,也可以提出精确的问题,如演讲者在什么时候提到了定价。该技能支持mp4、mov、avi、webm和mkv文件,几乎兼容任何制作或消费级视频源。
环境变量
| 变量 | 是否必需 | 默认值 |
|---|
| NEMOTOKEN | 否 | 自动生成(100个免费积分,7天后过期,可通过设置→API令牌撤销) |
| NEMOAPI_URL |
否 | https://mega-api-prod.nemovideo.ai |
| NEMO
WEBURL | 否 | https://nemovideo.com |
| NEMO
CLIENTID | 否 | 自动生成的UUID,持久化到 ~/.config/nemovideo/client_id(仅UUID,无密钥) |
| SKILL_SOURCE | 否 | 从安装路径自动检测,回退为 unknown |
如果未设置 NEMO_TOKEN,则获取一个(需要 X-Client-Id 头):
bash
生成或读取持久化的客户端ID
CLIENT
ID=${NEMOCLIENT
ID:-$(cat ~/.config/nemovideo/clientid 2>/dev/null)}
if [ -z $CLIENT_ID ]; then
CLIENT_ID=$(uuidgen 2>/dev/null || echo client-$(date +%s)-$RANDOM)
mkdir -p ~/.config/nemovideo & echo $CLIENT
ID > ~/.config/nemovideo/clientid
fi
curl -s -X POST $API/api/auth/anonymous-token -H X-Client-Id: $CLIENT_ID
→ {code:0,data:{token:nmvusrxxx,credits:100,...}}
将 token 保存为 NEMOTOKEN,CLIENTID 保存为 NEMOCLIENTID。匿名:每个客户端每7天1个令牌;令牌7天后过期,可随时通过 nemovideo.com 上的设置→API令牌撤销。如果令牌过期,使用相同的 X-Client-Id 请求新令牌。
本地持久化: 此技能写入 ~/.config/nemovideo/client_id 以在会话间持久化客户端ID。这避免了每次请求都生成新ID,否则会很快达到每IP速率限制(默认每个IP每7天10个令牌)。该文件仅包含一个UUID——本地不存储任何凭据。
2. 传入请求的路由逻辑
使用下表确定每种用户请求应指向哪个端点。
| 用户说... | 操作 | 跳过SSE |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
3. 主要操作流程
$API = ${NEMOAPIURL:-https://mega-api-prod.nemovideo.ai},$TOKEN = ${NEMOTOKEN},$WEB = ${NEMOWEB_URL:-https://nemovideo.com}。
技能归属变量——请勿硬编码:
- - $SKILLNAME = 此文件的YAML前置元数据 name
- $SKILLVERSION = 此文件的YAML前置元数据 version
- $SKILL_SOURCE = 运行时检测,使用第一个匹配项:
1. 如果设置了环境变量 SKILL_SOURCE
2. 如果此文件的路径包含已知平台目录(例如 ~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor)
3. 回退:unknown
关键:所有API请求(包括渲染/导出/上传/状态/积分)必须包含这些头。缺少它们将导致导出失败并返回402错误。
X-Skill-Source: $SKILL_NAME
X-Skill-Version: $SKILL_VERSION
X-Skill-Platform: $SKILL_SOURCE
3.0 初始化新会话
bash
curl -s -X POST $API/api/tasks/me/with-session/nemo_agent \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H X-Skill-Source: $SKILL
NAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE \
-d {task_name:project,language:
}
→ {code:0,data:{taskid:...,sessionid:...}}
在任何交互开始之前,必须与后端建立会话。此会话标识符是对话中所有后续请求所必需的。
在浏览器中打开:创建会话后,给用户一个链接,用于在NemoVideo中查看/编辑任务:
$WEB/workspace/claim?task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
3.1 通过SSE传递消息
bash
curl -s -X POST $API/run_sse \
-H Authorization: Bearer $TOKEN -H Content-Type: application/json \
-H Accept: text/event-stream -H X-Skill-Source: $SKILLNAME -H X-Skill-Version: $SKILLVERSION -H X-Skill-Platform: $SKILL_SOURCE --max-time 900 \
-d {appname:nemoagent,userid:me,sessionid:,new_message:{parts:[{text:}]}}
所有聊天消息都传输到后端,响应通过服务器发送事件流式返回给客户端。
SSE处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
静默等待,不转发 |
| heartbeat / 空 data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
典型时长:文本5-15秒,视频生成100-300秒,编辑10-30秒