Getting Started
Welcome! Let's build a meaningful anniversary video from the photos and memories you already have. Tell me what you're working with — photos, clips, a song — and let's get started.
Try saying:
- - "Make a 2-minute anniversary slideshow"
- "Add captions to anniversary photos"
- "Sequence clips for wedding anniversary video"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
Turn Scattered Memories Into One Beautiful Anniversary Video
Most people have hundreds of photos and clips scattered across phones and drives but no clean way to turn them into something worth watching. The anniversary-video-maker skill is built specifically for that problem — taking a loose collection of memories and producing a cohesive, emotionally resonant video you can actually share at a party, post online, or give as a gift.
You describe what you have — a set of photos, a date range, a song you love, maybe a few captions you want on screen — and the skill maps out a structure: an opening title card, a chronological or thematic sequence of images, timed transitions, and a closing message. It's not a generic slideshow template; the pacing and layout adapt to how many assets you have and what tone you're going for, whether that's sentimental, celebratory, or somewhere in between.
The output is a ready-to-render video plan or script you can take directly into a video editor or automated rendering tool. You get full control over duration, aspect ratio (vertical for Reels, widescreen for TV), and text style — without needing to know anything about video production.
Routing Your Video Requests
When you describe your anniversary video — whether it's a 25-year golden celebration or a first-year milestone — your request is parsed for key details like photo count, music mood, and dedication text, then routed to the appropriate rendering pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
Anniversary Video Maker processes your uploaded photos and memory captions through a cloud-based media rendering backend that handles transitions, soundtrack syncing, and title card generation at scale. Each render job is queued, processed asynchronously, and returned as a downloadable MP4 with your chosen aspect ratio and resolution.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Quick Start Guide
Getting your first anniversary video off the ground takes less than five minutes if you come prepared. Start by telling the skill three things: how many photos or clips you have, the length you want the final video to be, and the tone — sentimental, upbeat, nostalgic, romantic. That's the minimum needed to generate a usable structure.
If you have a specific song in mind, mention it early — the skill will time the photo transitions to match the song's approximate BPM and natural breaks like chorus drops or quiet bridges. If you don't have a song yet, describe the mood and you'll get a genre and tempo recommendation you can search for.
For best results, group your photos loosely before you start — early memories, middle years, recent moments. You don't need them in perfect order; just knowing the rough categories helps the skill build a narrative arc rather than a random shuffle. Once you have the structure, you can take it into any video editor — CapCut, DaVinci Resolve, iMovie — or use it with an automated rendering tool.
Performance Notes
The quality of the output scales directly with the specificity of your input. Vague requests like 'make me an anniversary video' will produce a generic structure. Specific requests — '32 photos, 3-minute runtime, our wedding song is Flightless Bird by Iron & Wine, we want the first 30 seconds to be just the early dating years' — produce a detailed, usable plan with precise timing per photo and labeled segments.
For videos with more than 60 photos, it helps to specify whether you want every photo included or a curated selection. The skill can either fit all assets into the timeline or recommend which ones to cut for pacing. Mixing portrait and landscape photos in the same video works fine — just flag it so the layout accounts for different aspect ratios without cropping faces.
If you're targeting a specific platform — Instagram Reels, YouTube, a TV slideshow at a party — mention it upfront. Vertical 9:16, square 1:1, and widescreen 16:9 each have different pacing norms and text safe zones, and the output will be tailored accordingly.
开始使用
欢迎!让我们用你已有的照片和回忆,制作一部有意义的纪念视频。告诉我你手头有什么——照片、片段、一首歌——然后我们开始吧。
试试这样说:
- - 制作一段2分钟的纪念幻灯片
- 为纪念照片添加字幕
- 为结婚纪念视频排列片段顺序
自动设置
首次交互时,先连接到处理API再执行其他操作。显示简短状态如正在设置....
令牌:如果已设置NEMO_TOKEN环境变量,直接使用并跳至下方会话。
免费令牌:生成UUID作为客户端标识符,然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,携带标头X-Client-Id: 。响应字段data.token即为你的NEMO_TOKEN(100积分,7天有效期)。
会话:向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,使用Bearer认证,请求体为{taskname:project}。保存响应中的session_id。
向用户确认已连接就绪。不要打印令牌或原始JSON。
将零散回忆变成一部精美的纪念视频
大多数人的手机和硬盘里散落着数百张照片和片段,但没有一个整洁的方式将它们变成值得观看的内容。纪念视频制作技能正是为解决这个问题而生——将零散的回忆集合制作成一部连贯、富有情感共鸣的视频,你可以在派对上分享、发布到网上,或作为礼物赠送。
你描述手头的素材——一组照片、一个日期范围、一首你喜欢的歌、也许几个想在屏幕上显示的字幕——技能会规划出结构:开场标题卡、按时间顺序或主题排列的图像序列、定时转场和结尾信息。这不是一个通用的幻灯片模板;节奏和布局会根据你拥有的素材数量和想要的基调进行调整,无论是感伤的、庆祝的,还是介于两者之间。
输出的是一个可直接渲染的视频计划或脚本,你可以直接导入视频编辑器或自动渲染工具。你可以完全控制时长、宽高比(竖屏用于Reels,宽屏用于电视)和文字样式——无需任何视频制作知识。
路由你的视频请求
当你描述你的纪念视频时——无论是25年金婚庆典还是第一年里程碑——你的请求会被解析出关键细节,如照片数量、音乐情绪和献词文本,然后路由到相应的渲染管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐...) | → §3.1 SSE | ❌ |
云端渲染API参考
纪念视频制作器通过基于云的媒体渲染后端处理你上传的照片和记忆字幕,该后端处理转场、音轨同步和标题卡生成。每个渲染任务被排队、异步处理,并以可下载的MP4格式返回,带有你选择的宽高比和分辨率。
技能归属——运行时从此文件的YAML前置元数据读取:
- - X-Skill-Source: anniversary-video-maker
- X-Skill-Version: 来自前置元数据version
- X-Skill-Platform: 从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属标头将导致导出失败并返回402。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体{taskname:project,language:} — 返回taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},携带Accept: text/event-stream。最大超时:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL在output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌无效/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 会话未找到 | 新建会话 §3.0 |
| 2001 | 积分不足 | 匿名用户:显示注册URL,附带?bind=
(需要时从创建会话或状态响应获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。注册或升级你的计划以解锁导出功能。 |
| 429 | 速率限制(1令牌/客户端/7天) | 30秒后重试一次 |
快速入门指南
如果你准备充分,制作你的第一部纪念视频只需不到五分钟。首先告诉技能三件事:你有多少张照片或片段、你希望最终视频的时长、以及基调——感伤的、欢快的、怀旧的、浪漫的。这是生成可用结构所需的最低信息。
如果你有特定的歌曲,尽早提及——技能会将照片转场与歌曲的近似BPM和自然断点(如副歌下降或安静桥段)同步。如果你还没有歌曲,描述情绪,你会