pixcli
Creative toolkit for AI agents. Generate images, videos, voiceover, music, and sound effects — then assemble polished output via Remotion.
Philosophy: The CLI handles complexity (task classification, prompt enrichment, model selection). You just describe what you want.
Setup
1. Install the CLI
CODEBLOCK0
Or use without installing:
CODEBLOCK1
2. Authenticate
CODEBLOCK2
Get your API key at https://pixcli.shellbot.sh. The key covers all capabilities: images, video, voice, music, and sound effects.
3. Verify
CODEBLOCK3
Commands
pixcli image <prompt> — Generate images
CODEBLOCK4
| Option | Default | Description |
|---|
| INLINECODE1 | INLINECODE2 | Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, INLINECODE9 |
| INLINECODE10 |
standard | Quality:
draft,
standard,
high |
|
-t, --transparent |
false | Transparent background (PNG) |
|
-n, --count <number> |
1 | Number of images (1-4) |
|
--from <path-or-url> | — | Source image for image-to-image or reference generation (repeatable, up to 5:
--from a.png --from b.png) |
|
--search |
false | Enable Google Search grounding for real-world accuracy (logos, brands, current events). Only with Nano Banana models |
|
-m, --model <model> | auto | Specific model ID |
|
-o, --output <path> | auto | Output file or directory |
|
--json |
false | Machine-readable JSON output |
|
--no-enrich | — | Skip prompt enrichment |
Models: flux-pro, flux-dev, seedream-v5, nano-banana-pro, nano-banana-2, imagen-4, imagen-4-fast, INLINECODE35
pixcli edit <prompt> — Edit images
CODEBLOCK5
| Option | Default | Description |
|---|
| INLINECODE37 | required | Source image (repeatable: -i a.png -i b.png) |
| INLINECODE39 |
standard | Quality:
draft,
standard,
high |
|
-m, --model <model> | auto | Specific model ID |
|
-o, --output <path> | auto | Output file or directory |
|
--json |
false | Machine-readable JSON output |
|
--no-enrich | — | Skip prompt enrichment |
Models: seedream-v5-edit, phota-enhance, rembg, recraft-upscale, INLINECODE53
pixcli video <prompt> — Generate video
CODEBLOCK6
| Option | Default | Description |
|---|
| INLINECODE55 | — | Source image (I2V) or video (extend) |
| INLINECODE56 |
— | End image — video transitions from
--from to
--to (Kling/PixVerse transition models) |
|
--negative <prompt> | — | Negative prompt describing what to avoid |
|
--audio |
false | Enable native audio generation (BGM, SFX, dialogue) on supported models |
|
-d, --duration <seconds> |
5 | Duration: 1-15 seconds |
|
-r, --ratio <ratio> |
16:9 | Aspect ratio:
16:9,
9:16,
1:1,
4:3,
3:4 |
|
-q, --quality <level> |
standard | Quality:
draft,
standard,
high |
|
-m, --model <model> | auto | Specific model ID |
|
-o, --output <path> | auto | Output file (.mp4) |
|
--json |
false | Machine-readable JSON output |
|
--extend |
false | Extend the source video instead of I2V |
Models: kling-v3-pro-i2v (cinematic, best quality), veo3-i2v (Google, native audio), wan-v2-i2v (cheap, good motion), minimax-i2v (fast), ltx-t2v (text-to-video, cheap), veo3-t2v (text-to-video, premium), grok-extend-video (extend), pixverse-v6-i2v (I2V with audio, multi-clip, styles, $0.075/sec), pixverse-v6-t2v (T2V with audio, multi-clip, styles), pixverse-v6-transition (start-to-end frame transition), pixverse-v6-extend (video extension with audio)
Opinionated approach: Always generate a still first with pixcli image, review it, then animate with pixcli video --from. This gives you control over the starting frame.
pixcli voice <text> — Text-to-speech
CODEBLOCK7
| Option | Default | Description |
|---|
| INLINECODE96 | INLINECODE97 | Voice preset: Rachel, Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, Bill |
| INLINECODE98 |
auto | ISO 639-1 language code (eng, spa, fra, deu, jpn, etc.) |
|
-o, --output <path> | auto | Output file (.mp3) |
|
--json |
false | Machine-readable JSON output |
pixcli music <prompt> — Generate music
CODEBLOCK8
| Option | Default | Description |
|---|
| INLINECODE103 | INLINECODE104 | Duration: 3-120 seconds |
| INLINECODE105 |
auto | Output file (.mp3) |
|
--json |
false | Machine-readable JSON output |
pixcli sfx <prompt> — Generate sound effects
CODEBLOCK9
| Option | Default | Description |
|---|
| INLINECODE109 | INLINECODE110 | Duration: 0.5-22 seconds |
| INLINECODE111 |
auto | Output file (.mp3) |
|
--json |
false | Machine-readable JSON output |
pixcli job <id> — Check job status and download results
CODEBLOCK10
| Option | Default | Description |
|---|
| INLINECODE115 | INLINECODE116 | Wait for the job to complete before returning |
| INLINECODE117 |
auto | Output file path for downloaded result |
|
--json |
false | Machine-readable JSON output |
Use case: Recover timed-out jobs. Video generation can take 5-8 minutes — if the CLI times out, it prints the job ID and a recovery command. Run pixcli job <id> --wait to pick up where you left off.
Global options
| Option | Description |
|---|
| INLINECODE121 | Override PIXCLI_API_KEY env var |
| INLINECODE123 |
Override API URL (default:
https://pixcli.shellbot.sh) |
|
--version | Show CLI version |
|
--help | Show help |
Read references/command-reference.md for the full parameter reference.
Opinionated creative workflow
The full production pipeline
- 1. Generate scene stills with
pixcli image — use -n 4 for variations, pick the best. Use --search for real-world accuracy (correct logos, current brands). Use --from with multiple images to blend references - Edit heroes with
pixcli edit — upscale, remove backgrounds, enhance - Animate 2-3 hero stills with
pixcli video --from — cinematic motion for key moments - Generate voiceover with
pixcli voice — one file per scene - Generate background music with
pixcli music — one track for the full composition - Generate sound effects with
pixcli sfx — transition whooshes, UI sounds (use sparingly) - Assemble everything in Remotion — timing, text, transitions, branding, audio mix
- Render final video with INLINECODE137
When to use AI video vs Remotion
Use pixcli video for:
- - Hero moments: product reveals, cinematic openings, emotional beats (3-8s clips)
- Organic motion that's hard to code: water, fire, fabric, hair, camera orbits
- Image-to-video: animate a still into a living scene
- Transition inserts: short clips between Remotion scenes
Use Remotion for:
- - Text animations, captions, kinetic typography
- Precise timing synced to voiceover
- Brand overlays, logos, consistent color grading
- Data visualizations, metric counters, charts
- Scene transitions (cuts, wipes, dissolves — deterministic)
- Final assembly: compositing AI video clips + stills + audio + text
The ideal combined workflow:
- 1. Generate scene stills with
pixcli image (consistency via shared style prompts) - Animate 2-3 hero stills with
pixcli video --from (cinematic motion) - Generate voiceover + music + SFX
- Assemble everything in Remotion (timing, text, transitions, audio mix)
Audio layering strategy
- - Voiceover at volume 1.0 — clear, intelligible, primary channel
- Music at 0.15-0.25 — duck under voiceover, never compete
- SFX sparse and purposeful — only when they reinforce movement
- Avoid dense music during problem framing
Quality tiers
- -
draft — Fast iteration, concepting, throwaway tests standard — Good for most production work (default)high — Hero shots, final delivery assets
ffmpeg local editing
Use ffmpeg for quick video/audio edits without a full Remotion project. These run locally — no API calls needed.
Video operations
CODEBLOCK11
Audio operations
CODEBLOCK12
Remotion video production
Remotion is the source of truth for timing, layout, animation, and render. Use pixcli to generate the visual and audio assets, then assemble everything in Remotion.
Bootstrapping a Remotion project
CODEBLOCK13
Templates
| Template | Best for | Aspect |
|---|
| INLINECODE145 | Product marketing (AIDA framework) | 1920x1080 |
| INLINECODE146 |
Premium product launches | 1920x1080 |
|
saas-metrics-16x9 | B2B SaaS, dashboard metrics | 1920x1080 |
|
mobile-ugc-9x16 | Reels, TikTok, Stories | 1080x1920 |
|
blank-16x9 | Custom projects | 1920x1080 |
|
explainer-16x9 | How-it-works, tutorials | 1920x1080 |
Integrating AI video clips in Remotion
Use OffthreadVideo for AI-generated clips inside Remotion compositions:
CODEBLOCK14
Remotion principles
- - Keep all Remotion packages on the same pinned version
- Transitions: 8-18 frames, purposeful (not decorative)
- Load fonts explicitly with INLINECODE152
- Always run
npm run verify before INLINECODE154 - Load reference rules from
references/remotion-rules/ as needed
Read references/remotion-playbook.md for detailed Remotion implementation guidance.
Output convention
- -
pixcli downloads generated files to the current directory (or path specified with -o) - Use
--json for machine-readable output (pipe to jq or parse in scripts) - All operations are synchronous from the CLI perspective (the CLI handles async polling internally)
- Video jobs may take 1-8 minutes (the CLI shows progress). If a job times out, use
pixcli job <id> --wait to recover
References
Creative guidance
- -
references/command-reference.md — Full parameter docs for all pixcli commands - INLINECODE163 — Quality standards for productions
- INLINECODE164 — Proven prompt patterns for every task
- INLINECODE165 — End-to-end recipe examples
- INLINECODE166 — Asset generation strategy for Remotion scenes
Remotion
- -
references/remotion-playbook.md — Remotion implementation guide - INLINECODE168 — Template selector guide
- INLINECODE169 — Index of 30+ Remotion rule files
- INLINECODE170 — Detailed rules (animations, audio, text, transitions, etc.)
Templates
- - INLINECODE171
- INLINECODE172
- INLINECODE173
- INLINECODE174
- INLINECODE175
- INLINECODE176
pixcli
面向AI代理的创意工具包。生成图像、视频、配音、音乐和音效——然后通过Remotion组装成精美的输出作品。
核心理念: CLI处理所有复杂性(任务分类、提示词增强、模型选择)。你只需描述你想要的内容。
安装配置
1. 安装CLI
bash
npm install -g pixcli
或者无需安装直接使用:
bash
npx pixcli image 森林中的一只红狐
2. 身份验证
bash
export PIXCLIAPIKEY=pxlive...
在 https://pixcli.shellbot.sh 获取你的API密钥。该密钥涵盖所有功能:图像、视频、语音、音乐和音效。
3. 验证
bash
pixcli --version
pixcli image 测试:白色背景上的简单蓝色圆圈 -o test.png
命令
pixcli image <提示词> — 生成图像
bash
pixcli image 无线耳机的产品棚拍图,柔和光线,白色背景
| 选项 | 默认值 | 描述 |
|---|
| -r, --ratio <比例> | 1:1 | 宽高比:1:1、16:9、9:16、4:3、3:4、3:2、2:3 |
| -q, --quality <质量> |
standard | 质量:draft、standard、high |
| -t, --transparent | false | 透明背景(PNG) |
| -n, --count <数量> | 1 | 图像数量(1-4) |
| --from <路径或URL> | — | 用于图生图或参考生成的源图像(可重复,最多5个:--from a.png --from b.png) |
| --search | false | 启用Google搜索接地,确保真实世界准确性(标志、品牌、当前事件)。仅适用于Nano Banana模型 |
| -m, --model <模型> | auto | 特定模型ID |
| -o, --output <路径> | auto | 输出文件或目录 |
| --json | false | 机器可读的JSON输出 |
| --no-enrich | — | 跳过提示词增强 |
模型: flux-pro、flux-dev、seedream-v5、nano-banana-pro、nano-banana-2、imagen-4、imagen-4-fast、gpt-image-1
pixcli edit <提示词> — 编辑图像
bash
pixcli edit 移除背景 -i product.jpg -o product-nobg.png
| 选项 | 默认值 | 描述 |
|---|
| -i, --image <路径或URL> | 必填 | 源图像(可重复:-i a.png -i b.png) |
| -q, --quality <质量> |
standard | 质量:draft、standard、high |
| -m, --model <模型> | auto | 特定模型ID |
| -o, --output <路径> | auto | 输出文件或目录 |
| --json | false | 机器可读的JSON输出 |
| --no-enrich | — | 跳过提示词增强 |
模型: seedream-v5-edit、phota-enhance、rembg、recraft-upscale、aura-sr
pixcli video <提示词> — 生成视频
bash
图生视频(推荐:先生成静态图,再制作动画)
pixcli video 产品周围的慢速镜头环绕 --from product.png -o reveal.mp4
文生视频(自动生成图像,然后制作动画)
pixcli video 一只猫在日落时分穿过花园 -o cat.mp4
扩展现有视频
pixcli video 猫跳过篱笆 --from cat.mp4 --extend -o cat-extended.mp4
| 选项 | 默认值 | 描述 |
|---|
| --from <路径或URL> | — | 源图像(I2V)或视频(扩展) |
| --to <路径或URL> |
— | 结束图像——视频从--from过渡到--to(Kling/PixVerse过渡模型) |
| --negative <提示词> | — | 描述要避免内容的负面提示词 |
| --audio | false | 在支持的模型上启用原生音频生成(背景音乐、音效、对话) |
| -d, --duration <秒> | 5 | 时长:1-15秒 |
| -r, --ratio <比例> | 16:9 | 宽高比:16:9、9:16、1:1、4:3、3:4 |
| -q, --quality <质量> | standard | 质量:draft、standard、high |
| -m, --model <模型> | auto | 特定模型ID |
| -o, --output <路径> | auto | 输出文件(.mp4) |
| --json | false | 机器可读的JSON输出 |
| --extend | false | 扩展源视频而非I2V |
模型: kling-v3-pro-i2v(电影级,最佳质量)、veo3-i2v(Google,原生音频)、wan-v2-i2v(廉价,运动效果好)、minimax-i2v(快速)、ltx-t2v(文生视频,廉价)、veo3-t2v(文生视频,高级)、grok-extend-video(扩展)、pixverse-v6-i2v(带音频的I2V,多片段,样式,$0.075/秒)、pixverse-v6-t2v(带音频的T2V,多片段,样式)、pixverse-v6-transition(起始到结束帧过渡)、pixverse-v6-extend(带音频的视频扩展)
推荐方法: 始终先用pixcli image生成静态图,检查后再用pixcli video --from制作动画。这样你可以控制起始帧。
pixcli voice <文本> — 文本转语音
bash
pixcli voice 欢迎来到生产力的未来。 -o voiceover.mp3
pixcli voice 欢迎来到未来。 --voice Sarah --language spa -o vo-spanish.mp3
| 选项 | 默认值 | 描述 |
|---|
| --voice <名称> | Rachel | 语音预设:Rachel、Aria、Roger、Sarah、Laura、Charlie、George、Callum、River、Liam、Charlotte、Alice、Matilda、Will、Jessica、Eric、Chris、Brian、Daniel、Lily、Bill |
| --language <代码> |
auto | ISO 639-1语言代码(eng、spa、fra、deu、jpn等) |
| -o, --output <路径> | auto | 输出文件(.mp3) |
| --json | false | 机器可读的JSON输出 |
pixcli music <提示词> — 生成音乐
bash
pixcli music 微妙的氛围电子乐,极简节拍,企业科技感 -d 45 -o bg-music.mp3
| 选项 | 默认值 | 描述 |
|---|
| -d, --duration <秒> | 30 | 时长:3-120秒 |
| -o, --output <路径> |
auto | 输出文件(.mp3) |
| --json | false | 机器可读的JSON输出 |
pixcli sfx <提示词> — 生成音效
bash
pixcli sfx 流畅的电影级嗖声过渡 -d 1.5 -o whoosh.mp3
pixcli sfx 柔和的数字点击声,微妙的UI交互 -d 0.5 -o click.mp3
| 选项 | 默认值 | 描述 |
|---|
| -d, --duration <秒> | 5 | 时长:0.5-22秒 |
| -o, --output <路径> |
auto | 输出文件(.mp3) |
| --json | false