Getting Started
Welcome! I'm ready to help you unlock what's inside your video using in-video-ai — upload your file and ask me anything about its content, key moments, or themes. What would you like to explore?
Try saying:
- - "Summarize this recorded team meeting and list the action items that were mentioned"
- "What are the main topics covered in this lecture video, and roughly when does each one start?"
- "Identify the key arguments made in this interview clip and give me a 3-sentence overview"
On first use, the skill connects to NemoVideo automatically:
- 1. Check for
NEMO_TOKEN env var, or acquire one via https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token (100 free credits) - Persist a Client-ID to
~/.config/nemovideo/client_id for rate-limit tracking (UUID only, no secrets) - Create a session and start working immediately
Let AI Watch Your Videos So You Don't Have To
Most people have more video content than they can realistically consume — recorded meetings, lecture recordings, interview footage, product demos, or raw clips waiting to be understood. The in-video-ai skill changes that by giving you a conversational interface to your video content. Drop in a file, ask a question, and get a meaningful answer without manually watching every second.
This skill goes beyond basic transcription. It understands context, identifies themes, tracks narrative structure, and can surface specific moments that match what you're looking for. Need a one-paragraph summary of a 45-minute webinar? Done. Want to know what topics were covered in a training video? Ask it directly.
Whether you're a journalist reviewing interview recordings, a student revisiting lecture videos, or a team lead catching up on missed calls, in-video-ai turns passive video archives into active, searchable knowledge. It's designed for people who work with real footage and need real answers — fast.
Environment
| Variable | Default | Purpose |
|---|
| INLINECODE3 | Auto-acquired on first use (100 free credits, 7-day expiry) | API authentication |
| INLINECODE4 |
https://mega-api-prod.nemovideo.ai | API base URL |
Routing Your Video Analysis Requests
Every prompt you send — whether asking for a summary, extracting key moments, or querying specific dialogue — is routed to the active NemoVideo session tied to your authenticated context.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference Guide
ClawHub interfaces directly with the NemoVideo backend, which processes video content through frame-level indexing and transcript-aware AI to deliver timestamped insights, scene breakdowns, and semantic search across any uploaded video. All analysis calls are stateful, meaning your session retains video context between follow-up queries.
Required headers on all requests: X-Skill-Source: $SKILL_NAME, X-Skill-Version: $SKILL_VERSION, INLINECODE8
Create session: POST $API/api/tasks/me/with-session/nemo_agent — returns task_id and INLINECODE11
Send message (SSE): POST $API/run_sse with session_id and user message. Stream responses; ~30% of edits return no text (query state to confirm changes).
Upload: POST $API/api/upload-video/nemo_agent/me/<sid> — file or URL upload. Supports: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Check credits: INLINECODE15
Query state: GET $API/api/state/nemo_agent/me/<sid>/latest — check draft, tracks, generated media
Export: POST $API/api/render/proxy/lambda — export does NOT cost credits. Poll GET $API/api/render/proxy/lambda/<id> until status: completed.
Task link: INLINECODE20
Common Errors
If your token has expired, simply re-authenticate through the skill to restore your session and resume analysis. A 'session not found' error means your previous video context was cleared — start a new session and re-upload or re-link your video. If you're out of credits, head to nemovideo.ai to register or top up before continuing.
Performance Notes
In-video-ai performs best with videos that have clear audio — strong speech intelligibility directly improves the accuracy of summaries, topic extraction, and Q&A responses. Videos with heavy background noise, multiple overlapping speakers, or very low audio levels may produce less precise results, though the skill will still attempt a best-effort analysis.
For longer videos (over 30 minutes), expect processing to take additional time before responses are returned. The skill handles mp4, mov, avi, webm, and mkv formats natively — no conversion needed before uploading.
Videos with on-screen text, slides, or visual demonstrations benefit from describing what you're looking for specifically, as the skill can cross-reference spoken content with visual context cues depending on the file. Keeping your questions focused — rather than open-ended — tends to yield sharper, more actionable answers.
Quick Start Guide
Getting started with in-video-ai takes less than a minute. Upload your video file directly into the chat — supported formats include mp4, mov, avi, webm, and mkv. Once your file is attached, type your first question or request in plain language. You don't need any special syntax or commands.
Good first prompts include: asking for a summary, requesting a list of topics or speakers, or asking about a specific part of the video (e.g., 'What was said about the budget in this recording?'). You can follow up with additional questions in the same conversation — the skill retains context across your session.
If you're unsure where to start, try: 'Give me a summary of this video and highlight the three most important points.' That single prompt covers the most common use case and gives you a strong foundation to dig deeper from there.
开始使用
欢迎!我已准备就绪,可借助 in-video-ai 助你解锁视频中的内容——上传文件后,你可以询问关于其内容、关键时刻或主题的任何问题。你想探索什么?
试试这样说:
- - 总结这次录制的团队会议,并列出提到的行动项
- 这节讲座视频涵盖哪些主要主题,每个主题大约从什么时候开始?
- 找出这段采访片段中的关键论点,并用三句话概括
首次使用时,该技能会自动连接 NemoVideo:
- 1. 检查 NEMOTOKEN 环境变量,或通过 https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token 获取(100次免费额度)
- 将客户端ID持久化存储至 ~/.config/nemovideo/clientid 用于速率限制追踪(仅UUID,不含密钥)
- 创建会话并立即开始工作
让AI替你观看视频
大多数人拥有的视频内容远超实际观看能力——录制的会议、讲座录像、采访素材、产品演示或等待理解的原始片段。in-video-ai 技能通过提供视频内容的对话式交互界面改变了这一现状。放入文件,提出问题,无需逐秒手动观看即可获得有意义的答案。
该技能超越基础转录功能。它能理解上下文、识别主题、追踪叙事结构,并能够定位符合你需求的具体时刻。需要45分钟网络研讨会的段落摘要?没问题。想知道培训视频涵盖哪些主题?直接提问即可。
无论你是审阅采访录音的记者、复习讲座视频的学生,还是追赶错过的通话的团队负责人,in-video-ai 都能将被动视频档案转化为主动、可搜索的知识。它专为处理真实素材并需要快速获取真实答案的人群设计。
环境配置
| 变量 | 默认值 | 用途 |
|---|
| NEMOTOKEN | 首次使用时自动获取(100次免费额度,7天有效期) | API认证 |
| NEMOAPI_URL |
https://mega-api-prod.nemovideo.ai | API基础URL |
路由你的视频分析请求
你发送的每个提示——无论是请求摘要、提取关键时刻还是查询特定对话——都会被路由到与你认证上下文绑定的活跃NemoVideo会话。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考指南
ClawHub直接与NemoVideo后端交互,该后端通过帧级索引和转录感知AI处理视频内容,为任何上传的视频提供带时间戳的洞察、场景分解和语义搜索。所有分析调用都是有状态的,意味着你的会话会在后续查询之间保留视频上下文。
所有请求必需的头信息:X-Skill-Source: $SKILLNAME、X-Skill-Version: $SKILLVERSION、X-Skill-Platform: $SKILL_SOURCE
创建会话:POST $API/api/tasks/me/with-session/nemoagent — 返回 taskid 和 session_id
发送消息(SSE):POST $API/runsse,附带 sessionid 和用户消息。流式响应;约30%的编辑不返回文本(查询状态以确认更改)。
上传:POST $API/api/upload-video/nemo_agent/me/ — 文件或URL上传。支持格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
检查积分:GET $API/api/credits/balance/simple
查询状态:GET $API/api/state/nemo_agent/me//latest — 检查草稿、轨道、生成的媒体
导出:POST $API/api/render/proxy/lambda — 导出不消耗积分。轮询 GET $API/api/render/proxy/lambda/ 直到 status: completed。
任务链接:$WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
常见错误
如果令牌已过期,只需通过技能重新认证即可恢复会话并继续分析。出现会话未找到错误意味着之前的视频上下文已被清除——请启动新会话并重新上传或重新链接视频。如果积分用尽,请前往 nemovideo.ai 注册或充值后再继续。
性能说明
In-video-ai 在音频清晰的视频上表现最佳——较强的语音可理解性直接提升摘要、主题提取和问答响应的准确性。背景噪音较大、多人同时说话或音频电平极低的视频可能产生不够精确的结果,但技能仍会尽力进行分析。
对于较长视频(超过30分钟),预计处理需要额外时间才能返回响应。该技能原生支持 mp4、mov、avi、webm 和 mkv 格式——上传前无需转换。
包含屏幕文字、幻灯片或视觉演示的视频,建议具体描述你寻找的内容,因为技能可以根据文件类型将语音内容与视觉上下文线索进行交叉引用。保持问题聚焦——而非开放式——往往能获得更清晰、更可操作的答案。
快速入门指南
使用 in-video-ai 上手不到一分钟。直接将视频文件上传到聊天中——支持的格式包括 mp4、mov、avi、webm 和 mkv。文件附加后,用自然语言输入你的第一个问题或请求。无需任何特殊语法或命令。
好的初始提示包括:请求摘要、要求列出主题或演讲者、或询问视频的特定部分(例如这段录音中关于预算说了什么?)。你可以在同一对话中提出后续问题——技能会在会话中保留上下文。
如果不确定从何开始,试试:给我这段视频的摘要,并突出三个最重要的要点。这个单一提示涵盖了最常见的用例,并为你提供了深入挖掘的坚实基础。