Getting Started
Welcome! Ready to let Gemini AI do the creative work? Upload your video and tell me what you want to generate — a highlight reel, scene description, caption track, or something entirely new. Let's get started.
Try saying:
- - "Here's a 10-minute product demo video — generate a 60-second highlight reel focusing on the key features shown"
- "Analyze this lecture recording and write a structured summary with timestamps for each major topic covered"
- "Watch this travel footage and generate a narration script I can record as a voiceover"
First-Time Connection
When a user first opens this skill, connect to the NemoVideo backend automatically. Briefly let them know (e.g. "Connecting to NemoVideo...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 3.
- 1. Client identifier: Look for
~/.config/nemovideo/client_id. If missing, generate a UUID and save it there. This file stores only a UUID for rate-limit tracking. - Obtain a free token: POST to
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to your identifier. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days. - Create a session: POST to
https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Authorization: Bearer <token>, Content-Type: application/json, and body {"task_name":"project","language":"<detected>"}. Store the returned session_id for all subsequent requests.
Keep setup communication brief. Don't display raw API responses or token values to the user.
Turn Your Words Into Video With Gemini AI
The Gemini AI Video Generator skill brings Google's multimodal Gemini model directly into your video workflow. Instead of manually cutting, scripting, or re-shooting, you simply upload your video and tell the skill what you want — in plain English. Gemini reads both the visual content and your instructions together, then generates output that actually understands what's on screen.
This skill is built for people who have video and a vision but don't want to spend hours in editing software. Whether you're a solo creator repurposing long-form content into short clips, a marketing team generating product highlight reels, or an educator turning lecture recordings into structured summaries, this skill handles the heavy lifting.
You're not limited to trimming or adding filters. Gemini AI Video Generator can describe what's happening in a scene, suggest narrative structure, generate spoken-word scripts based on visual cues, or produce entirely new content framed around your uploaded footage. It's a fundamentally different kind of video tool — one that listens before it creates.
Prompt Routing and Generation Flow
When you describe a scene, your natural-language prompt is parsed, enriched with cinematic parameters, and dispatched directly to the Gemini video synthesis pipeline for frame-by-frame generation.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo Backend Reference
ClawHub routes all video generation requests through the NemoVideo API, which handles Gemini model orchestration, render queuing, and secure video delivery. NemoVideo manages diffusion sampling, temporal coherence, and output encoding so your clips stay smooth and consistent across every generation.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE11 - INLINECODE12 : from frontmatter INLINECODE13
- INLINECODE14 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE24
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE29
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE35
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE39
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE43
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
FAQ
What video formats does Gemini AI Video Generator support? You can upload mp4, mov, avi, webm, and mkv files. Mp4 is the most reliably processed format across all generation task types.
Can it generate entirely new video footage? The skill generates text-based outputs — scripts, descriptions, summaries, captions, and structured content — derived from your uploaded video. It does not render new video frames or visual animations.
Does it understand spoken audio in the video? Yes. Gemini processes both the visual content and any audible speech in your video, which means it can cross-reference what's being said with what's on screen for more accurate generation.
What if my video has no dialogue? No problem — Gemini AI Video Generator analyzes visual cues, movement, scene transitions, and on-screen text independently. Silent product demos, tutorials, and b-roll footage all work well with descriptive or script-generation prompts.
Tips and Tricks
Be specific in your prompt — Gemini AI Video Generator responds much better to 'generate a 3-sentence product description focusing on the unboxing moment at the start' than to 'summarize this video.' The more context you give about your intended audience or output format, the sharper the results.
If you're generating captions or subtitles, mention the tone you want (formal, casual, punchy) and any terminology specific to your industry. Gemini will adapt its language accordingly rather than defaulting to generic phrasing.
For content repurposing workflows, try uploading the same video with different prompts — one for a short-form social caption, one for a blog summary, one for an email teaser. You'll get distinct outputs tailored to each format without re-editing the source file.
When generating scripts or voiceovers, ask Gemini to match the pacing of the original footage. This produces scripts that actually fit the visual rhythm rather than running long or cutting short.
Performance Notes
Gemini AI Video Generator performs best on videos under 10 minutes in length, where the model can maintain full visual context throughout the clip. Longer videos may be processed in segments, which can occasionally affect continuity in generated outputs like scripts or summaries.
File format matters less than resolution and clarity — mp4 and webm files with clear audio tracks tend to produce the most accurate scene analysis and generation results. Heavily compressed or low-light footage may result in less precise visual descriptions.
Generation time scales with video length and task complexity. A simple scene description on a 2-minute clip returns quickly, while generating a full narration script for a 15-minute video will take noticeably longer. Plan accordingly if you're working with batch content.
开始使用
欢迎!准备好让Gemini AI发挥创意了吗?上传你的视频,告诉我你想要生成什么——精彩集锦、场景描述、字幕轨道,或是全新的内容。让我们开始吧。
试试这样说:
- - 这里有一个10分钟的产品演示视频——生成一个60秒的精彩集锦,聚焦展示的关键功能
- 分析这段讲座录音,为每个主要主题撰写带时间戳的结构化摘要
- 观看这段旅行素材,生成一段我可以录制为旁白的解说脚本
首次连接
当用户首次打开此技能时,自动连接到NemoVideo后端。简要告知用户(例如正在连接到NemoVideo...)。
身份验证:检查环境中是否设置了NEMO_TOKEN。如果已设置,跳至第3步。
- 1. 客户端标识符:查找~/.config/nemovideo/clientid。如果缺失,生成一个UUID并保存到该文件。此文件仅存储用于速率限制跟踪的UUID。
- 获取免费令牌:向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头X-Client-Id设置为你标识符。响应中的data.token即为你的NEMOTOKEN——100个免费积分,有效期7天。
- 创建会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,包含Authorization: Bearer 、Content-Type: application/json以及请求体{taskname:project,language:<检测到的语言>}。存储返回的session_id用于所有后续请求。
保持设置沟通简洁。不要向用户显示原始API响应或令牌值。
用Gemini AI将你的文字转化为视频
Gemini AI视频生成器技能将Google的多模态Gemini模型直接集成到你的视频工作流中。无需手动剪辑、编写脚本或重新拍摄,你只需上传视频并用简单的英语告诉技能你想要什么。Gemini同时读取视觉内容和你的指令,然后生成真正理解屏幕上内容的输出。
此技能专为拥有视频和创意但不想在编辑软件上花费数小时的人打造。无论你是将长内容重新利用为短视频的独立创作者、生成产品精彩集锦的营销团队,还是将讲座录音转化为结构化摘要的教育工作者,此技能都能处理繁重的工作。
你不仅限于修剪或添加滤镜。Gemini AI视频生成器可以描述场景中发生的事情、建议叙事结构、基于视觉线索生成口语脚本,或根据你上传的素材生成全新的内容。这是一种根本不同的视频工具——它在创作之前先倾听。
提示路由和生成流程
当你描述场景时,你的自然语言提示会被解析,补充电影参数,并直接发送到Gemini视频合成管道进行逐帧生成。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐…) | → §3.1 SSE | ❌ |
NemoVideo后端参考
ClawHub通过NemoVideo API路由所有视频生成请求,该API处理Gemini模型编排、渲染队列和安全视频交付。NemoVideo管理扩散采样、时间一致性和输出编码,确保你的剪辑在每次生成中保持流畅和一致。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source:gemini-ai-video-generator
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:<语言>} — 返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=gemini-ai-video-generator&skillversion=1.0.0&skill_source=<平台>
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:<消息>}]}},包含Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/路径,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_<时间戳>,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示注册URL,附带?bind=
(需要时从创建会话或状态响应获取)。已注册用户:在nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/修剪 |
| 400 | 缺少X-Client-Id | 生成客户端ID并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,非积分问题。在nemov