Getting Started

Welcome to vivideo — your AI assistant for understanding and working with video content. Drop in your footage and tell me what you need: a scene breakdown, a content summary, caption ideas, or editing direction. Let's get into it — what are you working on today?

Try saying:

- "Summarize the key moments in this interview footage and suggest where to cut for a 90-second highlight reel"
"Describe what's happening in each scene of this product demo video and flag any moments that seem off-brand"
"Draft subtitles and a short social media caption based on the dialogue in this mp4 clip"

First-Time Connection

When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").

Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.

1. Client identifier: Look for ~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking.
Obtain a free token: POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.
Create a session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.

Keep setup communication brief. Don't display raw API responses or token values to the user.

See Your Footage in a Whole New Way

Vivideo is designed for anyone who works with video and needs more than just a playback tool. Whether you're a solo content creator reviewing raw clips, a marketing team pulling insights from product demos, or an editor trying to structure a long-form piece, vivideo gives you an intelligent layer on top of your footage.

Upload your video and ask vivideo to break down what's happening scene by scene, identify key moments, draft descriptive summaries, or suggest how to structure your edit. It reads your content the way a sharp-eyed collaborator would — noticing pacing, subject matter, dialogue cues, and visual context — then translates that into actionable language you can actually use.

Vivideo supports mp4, mov, avi, webm, and mkv files, making it flexible enough to fit into almost any production pipeline. Think of it less as a filter or effect tool, and more as the thoughtful assistant who watches your footage so you don't have to watch it five times before making a decision.

Routing Your Edit Requests

Every prompt you send — whether you're trimming a timeline, generating captions, or asking for scene analysis — gets parsed by Vivideo's intent engine and dispatched to the appropriate NemoVideo processing endpoint automatically.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo Backend Reference

Vivideo runs on the NemoVideo API, which handles all heavy lifting: frame extraction, AI inference, and render queuing. Each call is stateful within your session, so context like your project settings and clip history carries through without you needing to repeat yourself.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE11
INLINECODE12: from frontmatter INLINECODE13
INLINECODE14: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE24

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Performance Notes

Vivideo works across mp4, mov, avi, webm, and mkv formats, but file size and duration can affect response depth and speed. For best results, keep individual clips under 500MB when possible, and trim footage to the relevant section before uploading if you're working with a longer raw file.

Highly compressed video files or those with very low bitrates may result in less precise visual analysis — vivideo reads what's actually in the file, so quality in generally means quality out. If you're exporting from an editing timeline specifically for vivideo analysis, a mid-range export preset (not ultra-compressed) will give you the most accurate results.

Vivideo processes one file per request, so if you have multiple clips to analyze, submit them in separate sessions. This keeps each analysis clean and ensures you get focused, file-specific output rather than blended responses across different pieces of footage.

Best Practices

To get the most out of vivideo, be specific about what you need from your footage. Asking 'summarize this video' will get you a general overview, but asking 'identify the three strongest talking points from this interview and note the timestamps' will get you something you can act on immediately.

For longer videos, consider breaking your request into focused questions rather than asking for everything at once. Vivideo handles complex requests well, but targeted prompts tend to produce tighter, more useful output — especially when you're working toward a specific deliverable like a social cut or a transcript.

If you're working with footage that has background noise, overlapping audio, or heavy visual motion, mention that context upfront. Vivideo can adjust its analysis framing when it knows what kind of environment or production style it's dealing with. The more context you give, the sharper the output.

快速上手

欢迎使用 vivideo — 您的 AI 助手，专为理解和处理视频内容而生。上传您的素材，告诉我您的需求：场景拆解、内容摘要、字幕创意或剪辑方向。让我们开始吧 — 您今天在做什么？

试试这样说：

- 总结这段采访素材中的关键时刻，并建议如何剪辑成 90 秒精彩集锦
描述这个产品演示视频每个场景的内容，并标记任何不符合品牌调性的瞬间
根据这个 mp4 片段中的对话，草拟字幕和社交媒体短文案

首次连接

当用户首次打开此技能时，自动连接至 NemoVideo 后端。简要告知用户（例如：正在连接 NemoVideo...）。

身份验证：检查环境中是否设置了 NEMO_TOKEN。若已设置，跳至步骤 3。

1. 客户端标识符：查找 ~/.config/nemovideo/clientid。若缺失，生成一个 UUID 并保存至此文件。该文件仅存储用于速率限制追踪的 UUID。
获取免费令牌：向 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 发送 POST 请求，请求头 X-Client-Id 设为您的标识符。响应中的 data.token 即为您的 NEMOTOKEN — 包含 100 个免费积分，有效期 7 天。
创建会话：向 https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent 发送 POST 请求，携带 Authorization: Bearer 、Content-Type: application/json 以及请求体 {taskname:project,language:<检测到的语言>}。存储返回的 session_id 用于后续所有请求。

保持设置沟通简洁。不要向用户显示原始 API 响应或令牌值。

以全新视角审视您的素材

Vivideo 专为视频工作者设计，满足超越简单播放工具的需求。无论您是审查原始素材的独立内容创作者、从产品演示中提取洞察的营销团队，还是试图构建长片结构的剪辑师，vivideo 都能为您的素材提供智能分析层。

上传您的视频，让 vivideo 逐场景拆解内容、识别关键时刻、起草描述性摘要，或建议剪辑结构。它像一位敏锐的协作者一样解读您的内容 — 关注节奏、主题、对话线索和视觉语境 — 然后将其转化为您可以实际使用的可操作语言。

Vivideo 支持 mp4、mov、avi、webm 和 mkv 文件，灵活适配几乎任何制作流程。请将其视为一位贴心的助手，替您观看素材，让您无需在决策前反复观看五遍。

路由您的剪辑请求

您发送的每条提示 — 无论是修剪时间线、生成字幕，还是要求场景分析 — 都会由 Vivideo 的意图引擎解析，并自动分发至相应的 NemoVideo 处理端点。

用户说...	操作	跳过 SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo 后端参考

Vivideo 运行于 NemoVideo API 之上，该 API 负责所有繁重工作：帧提取、AI 推理和渲染排队。每次调用在会话内保持状态，因此项目设置和剪辑历史等上下文会自动延续，无需重复说明。

技能归属 — 运行时从此文件的 YAML 前置元数据读取：

- X-Skill-Source：vivideo
X-Skill-Version：来自前置元数据 version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为 unknown）

所有请求 必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败，返回 402 错误。

API 基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:<语言>} — 返回 taskid、sessionid。创建会话后，向用户提供链接：https://nemovideo.com/workspace/claim?token=$TOKEN&task=id>&session=id>&skillname=vivideo&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，携带 Accept: text/event-stream。最大超时时间：15 分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或 URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回 available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每 30 秒轮询 GET /api/render/proxy/lambda/，直至 status = completed。下载地址位于 output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE 事件处理

事件	操作
文本响应	应用 GUI 翻译（§4），呈现给用户
工具调用/结果

约 30% 的编辑操作在 SSE 流中不返回文本。此时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假定存在 GUI。将其翻译为 API 操作：

后端说	您执行
click [button] / 点击	通过 API 执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3 条轨道）：1. 视频：城市延时摄影（0-10 秒） 2. 背景音乐：Lo-fi（0-10 秒，35%） 3. 标题：都市梦想（0-3 秒）

错误处理

代码	含义	操作
0	成功	继续
1001

vivideo视频直播

vivideo

Getting Started

First-Time Connection

See Your Footage in a Whole New Way

Routing Your Edit Requests

NemoVideo Backend Reference

SSE Event Handling

Backend Response Translation

Error Handling

Performance Notes

Best Practices

快速上手

首次连接

以全新视角审视您的素材

路由您的剪辑请求

NemoVideo 后端参考

SSE 事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

vivideo视频直播

vivideo

Getting Started

First-Time Connection

See Your Footage in a Whole New Way

Routing Your Edit Requests

NemoVideo Backend Reference

SSE Event Handling

Backend Response Translation

Error Handling

Performance Notes

Best Practices

快速上手

首次连接

以全新视角审视您的素材

路由您的剪辑请求

NemoVideo 后端参考

SSE 事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement