AI Video Remix Skill
This is an instruction-only skill — it provides guidance and reference documentation for the AI Video Remix CLI tool. The runtime source code lives in the GitHub repository and must be cloned separately (see Quick Start below).
Generate styled video compositions from a local ShotAI video library using natural language.
Important: Video Library Requirement
This skill can only search and use videos that have been imported into ShotAI. Videos simply stored on your hard drive are not searchable — they must be added to a ShotAI collection and fully indexed first.
Before using this skill, make sure you have:
- 1. Opened ShotAI and created a collection
- Added your video files or folders to the collection
- Waited for indexing to complete (shot detection + semantic analysis — progress is shown in ShotAI)
If the search returns no results or low-quality matches, the most common reason is that the relevant videos have not been imported into ShotAI yet.
Prerequisites
See references/setup.md for full installation instructions, including:
- - ShotAI download and setup
- ffmpeg installation
- yt-dlp installation (for auto music)
- Node.js dependencies
Quick Start
Note: This skill does not bundle runtime code. Clone the source repository first.
CODEBLOCK0
Pipeline (8 steps)
- 1. Agent: parseIntent — LLM extracts theme, selects composition, optionally overrides music style
- Agent: refineQueries — LLM rewrites per-slot search terms to match library content
- ShotAI: pickShots — Semantic search per slot via local ShotAI MCP server (localhost only), best shot selected
- Music: resolveMusic — Uses local MP3 via
--bgm (recommended), or optionally downloads from YouTube via yt-dlp - ffmpeg: extractClip — Each shot trimmed to independent
.mp4 clip file (local processing only) - Agent: annotateClips — LLM assigns per-clip visual effect params (tone, dramatic, kenBurns, caption)
- File Server — Localhost-only HTTP server (127.0.0.1) serves clips to Remotion renderer within the same machine
- Remotion: render — Composition rendered to final MP4
CLI Usage
After cloning the repository and running npm install:
CODEBLOCK1
Compositions
| ID | Label | Best For |
|---|
| INLINECODE3 | 赛博朋克夜景 | Neon city, night scenes, sci-fi |
| INLINECODE4 |
旅行 Vlog | Multi-city travel with location cards |
|
MoodDriven | 情绪驱动混剪 | Fast/slow emotion cuts |
|
NatureWild | 自然野生动物 | BBC nature documentary style |
|
SwitzerlandScenic | 瑞士风光 | Alpine/scenic travel with captions |
|
SportsHighlight | 体育集锦 | ESPN-style with goal captions |
Modes
Standard mode (default): LLM picks composition + generates search queries from registry templates.
Probe mode (--probe): Scans library videos first (names, shot samples, mood/scene tags), then LLM generates custom slots tailored to what actually exists.
Choose probe mode when: library content is unknown, user wants "best of my library", or standard slots return low-quality shots.
Environment Variables
See references/config.md for all environment variables and LLM provider setup.
Troubleshooting & Quality Tuning
See references/tuning.md for solutions to:
- - Clip boundary flicker / 1–2 frame flash at cuts
- Red flash artifact in CyberpunkCity (GlitchFlicker on short clips)
- Low-quality or off-topic shots
- Music download failures
Recommended .env defaults for best quality:
CODEBLOCK2
Writing ShotAI Search Queries
ShotAI uses semantic search powered by AI-generated tags and embedding vectors. Query quality is the single biggest factor in shot relevance — invest time here.
Query construction rules
Always write full sentences or rich phrases, never bare keywords.
The search engine understands semantic similarity ("ocean" matches "sea", "waves", "shoreline"), so richer context produces better recall.
| Quality | Example | When to use |
|---|
| ⭐ Detailed description | INLINECODE15 | Best precision — use for hero shots |
| ⭐ Full sentence |
"A seagull flying gracefully over the ocean at sunset" | Good balance of precision and recall |
| Short phrase |
"seagull flying over ocean" | Acceptable fallback |
| Single keyword |
"seagull" | Avoid — low precision, noisy results |
What to include in a query
Describe the visual content of the ideal shot across these dimensions:
- - Subject: what/who is in frame (
a lone hiker, city traffic at night, athlete celebrating) - Action: what is happening (
walking slowly through fog, speeding through intersection, jumping with arms raised) - Environment: location, setting, time of day (
rain-soaked Tokyo street, mountain meadow at golden hour, empty stadium under floodlights) - Mood / atmosphere: emotional tone (
melancholic, tense, euphoric, serene) - Camera feel: implied movement or framing (
wide establishing shot, tight close-up, slow pan, handheld shaky)
Not all dimensions are needed every time — include whichever are most distinctive for the shot you want.
The refineQueries step
When the agent runs refineQueries, it rewrites the composition's default slot queries to better match the user's actual library. Apply these principles:
- 1. Start from the slot's semantic intent — what emotional or narrative role does this shot play in the composition?
- Incorporate any context from the user's request — location names, event names, specific subjects mentioned
- Expand synonyms — if the slot says
"water", try "river flowing through forest" or "lake reflecting mountains" based on what the library likely contains - Avoid negations —
"not indoors" does not work; instead describe the positive version ("outdoor daylight scene") - One query per slot — make it specific rather than trying to cover multiple scenarios
Examples: slot query → refined query
CODEBLOCK3
Adding a New Composition
See references/composition-guide.md to add a new Remotion composition to the registry.
Safety and Fallback
Network & credential scope
- - All credentials stay local.
SHOTAI_TOKEN is sent only to the local ShotAI MCP server (127.0.0.1). LLM API keys (if configured) are sent only to their respective provider endpoints — never to ShotAI, YouTube, or any other service. - The clip file server binds to
127.0.0.1 only (default port 8080). It is not accessible from other machines on the network. It serves temporary clip files to the Remotion renderer running on the same machine and shuts down after rendering completes. - yt-dlp is optional. Use
--bgm /path/to/local.mp3 to skip all YouTube network access. When yt-dlp is used, it only downloads a single background music track — no other data is sent to YouTube. - LLM access is optional. Set
AGENT_PROVIDER=none to run in heuristic mode with zero external network calls (aside from the local ShotAI MCP server).
Error handling
- - If
SHOTAI_URL or SHOTAI_TOKEN is unset, display a warning: "ShotAI MCP server is not configured. Set SHOTAI_URL and SHOTAI_TOKEN in your .env file. Download ShotAI at https://www.shotai.io." - If the ShotAI MCP server returns an error (connection refused, HTTP 4xx/5xx), display the error message and stop — do not fabricate shot results.
- Never fabricate video file paths, shot timestamps, or similarity scores.
- If music download fails (yt-dlp error or network unreachable), suggest using
--bgm <local.mp3> to provide a local audio file instead. - If Remotion render fails, display the error output and suggest checking Node.js version (18+) and that all clip files were extracted successfully.
- If the LLM provider is unreachable, fall back to heuristic mode: use composition default queries directly without refinement, and skip
annotateClips (use composition default effect params).
License
MIT-0 — Free to use, modify, and redistribute. No attribution required.
See https://spdx.org/licenses/MIT-0.html
AI 视频混剪技能
这是一个纯指令技能——它为 AI 视频混剪 CLI 工具提供指导和参考文档。运行时源代码位于 GitHub 仓库 中,需要单独克隆(参见下面的快速开始)。
使用自然语言从本地 ShotAI 视频库生成风格化视频作品。
重要:视频库要求
此技能只能搜索和使用已导入 ShotAI 的视频。仅存储在硬盘上的视频不可搜索——它们必须先添加到 ShotAI 集合中并完成索引。
使用此技能前,请确保您已:
- 1. 打开 ShotAI 并创建一个集合
- 将您的视频文件或文件夹添加到该集合
- 等待索引完成(镜头检测 + 语义分析——进度在 ShotAI 中显示)
如果搜索返回空结果或低质量匹配,最常见的原因是相关视频尚未导入 ShotAI。
先决条件
完整安装说明请参见 references/setup.md,包括:
- - ShotAI 下载和设置
- ffmpeg 安装
- yt-dlp 安装(用于自动音乐)
- Node.js 依赖
快速开始
注意: 此技能不捆绑运行时代码。请先克隆源代码仓库。
bash
git clone https://github.com/abu-ShotAI/ai-video-remix.git
cd ai-video-editor
npm install
cp .env.example .env # 填写 SHOTAIURL、SHOTAITOKEN,以及可选的 AGENT_PROVIDER
npx tsx src/skill/cli.ts 帮我做一个旅行混剪
流程(8 步)
- 1. Agent: parseIntent — LLM 提取主题,选择合成方案,可选地覆盖音乐风格
- Agent: refineQueries — LLM 重写每个槽位的搜索词以匹配库内容
- ShotAI: pickShots — 通过本地 ShotAI MCP 服务器(仅 localhost)对每个槽位进行语义搜索,选择最佳镜头
- Music: resolveMusic — 使用本地 MP3(通过 --bgm,推荐),或可选地通过 yt-dlp 从 YouTube 下载
- ffmpeg: extractClip — 每个镜头裁剪为独立的 .mp4 剪辑文件(仅本地处理)
- Agent: annotateClips — LLM 为每个剪辑分配视觉效果参数(色调、戏剧效果、肯·伯恩斯效果、字幕)
- File Server — 仅 localhost 的 HTTP 服务器(127.0.0.1)将剪辑提供给同一台机器上的 Remotion 渲染器
- Remotion: render — 合成渲染为最终 MP4
CLI 用法
克隆仓库并运行 npm install 后:
bash
npx tsx src/skill/cli.ts <请求> [选项]
选项:
--composition 覆盖合成方案(跳过 LLM 选择)
--bgm <路径> 本地 MP3 路径(跳过 YouTube 搜索)
--output <目录> 输出目录(默认:./output)
--lang 输出语言:zh 中文(默认)/ en 英文
影响:视频标题、每个剪辑的字幕和位置标签、署名行
--probe 先扫描库,让 LLM 根据实际内容规划槽位
合成方案
| ID | 标签 | 最佳用途 |
|---|
| CyberpunkCity | 赛博朋克夜景 | 霓虹城市、夜景、科幻 |
| TravelVlog |
旅行 Vlog | 多城市旅行,带地点卡片 |
| MoodDriven | 情绪驱动混剪 | 快/慢情绪剪辑 |
| NatureWild | 自然野生动物 | BBC 自然纪录片风格 |
| SwitzerlandScenic | 瑞士风光 | 阿尔卑斯/风景旅行,带字幕 |
| SportsHighlight | 体育集锦 | ESPN 风格,带进球字幕 |
模式
标准模式(默认):LLM 选择合成方案 + 从注册表模板生成搜索查询。
探测模式(--probe):先扫描库视频(名称、镜头样本、情绪/场景标签),然后 LLM 根据实际存在的内容生成自定义槽位。
何时选择探测模式:库内容未知、用户想要我的库中最好的内容,或标准槽位返回低质量镜头。
环境变量
所有环境变量和 LLM 提供商设置请参见 references/config.md。
故障排除与质量调优
参见 references/tuning.md 了解以下问题的解决方案:
- - 剪辑边界闪烁 / 剪辑处 1-2 帧闪白
- CyberpunkCity 中的红色闪烁伪影(短剪辑上的 GlitchFlicker)
- 低质量或偏离主题的镜头
- 音乐下载失败
推荐的最佳质量 .env 默认值:
env
MIN_SCORE=0.5 # 过滤短/低质量镜头
编写 ShotAI 搜索查询
ShotAI 使用由 AI 生成的标签和嵌入向量驱动的语义搜索。查询质量是镜头相关性的最大因素——请在此投入时间。
查询构建规则
始终编写完整句子或丰富短语,切勿使用裸关键词。
搜索引擎理解语义相似性(ocean 匹配 sea、waves、shoreline),因此更丰富的上下文能产生更好的召回率。
| 质量 | 示例 | 使用时机 |
|---|
| ⭐ 详细描述 | 一只展开翅膀的白海鸥在平静的蓝色海面上平滑滑翔,金色夕阳反射在波浪上 | 最佳精度——用于主镜头 |
| ⭐ 完整句子 |
一只海鸥在日落时优雅地飞过海洋 | 精度和召回率的良好平衡 |
| 短短语 | 海鸥飞过海洋 | 可接受的备选 |
| 单个关键词 | 海鸥 | 避免——精度低,结果杂乱 |
查询中应包含的内容
从以下维度描述理想镜头的视觉内容:
- - 主体:画面中有什么/谁(孤独的徒步者、夜间城市交通、庆祝的运动员)
- 动作:正在发生什么(在雾中缓慢行走、高速穿过十字路口、双臂举起跳跃)
- 环境:地点、场景、时间(雨中的东京街道、黄金时刻的山间草地、泛光灯下的空体育场)
- 情绪/氛围:情感基调(忧郁、紧张、欣快、宁静)
- 镜头感:暗示的运动或构图(广角定场镜头、紧特写、慢摇、手持抖动)
并非每次都需要所有维度——只包含对您想要的镜头最具区分度的那些。
refineQueries 步骤
当代理运行 refineQueries 时,它会重写合成方案的默认槽位查询,以更好地匹配用户的实际库。应用以下原则:
- 1. 从槽位的语义意图出发——这个镜头在合成方案中扮演什么情感或叙事角色?
- 融入用户请求中的上下文——地点名称、事件名称、提到的特定主体
- 扩展同义词——如果槽位说 水,根据库可能包含的内容尝试 流过森林的河流 或 倒映山脉的湖泊
- 避免否定——不在室内 不起作用;改为描述正面版本(户外白天场景)
- 每个槽位一个查询——使其具体化,而不是试图覆盖多个场景
示例:槽位查询 → 精炼查询
槽位默认:城市夜景
用户请求:帮我做一个东京旅行混剪
精炼: 霓虹灯照亮的东京街道夜景,行人穿过发光的招牌下,雨水反射在人行道上
槽位默认:自然风景
用户请求:上个月的巴塔哥尼亚之旅
精炼: 戏剧性的巴塔哥尼亚山景,暴风云下白雪覆盖的山峰,广阔的荒野
槽位默认:运动员在行动
用户请求:上一场比赛的篮球集锦
精炼: 篮球运动员冲向篮筐,爆发性动作,背景中模糊的观众
添加新合成方案
参见 references/composition-guide.md 向注册表添加新的 Remotion 合成方案。
安全与回退
网络与凭证范围
- - 所有凭证保持本地。 SHOTAI_TOKEN 仅发送到本地 ShotAI MCP 服务器(127.0.0.1)。LLM API 密钥(如果配置)仅发送到各自的提供商端点——