Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate providers.
Script Directory
Agent Execution:
- 1.
{baseDir} = this SKILL.md file's directory - Script path = INLINECODE1
- Resolve
${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
Step 0: Load Preferences ⛔ BLOCKING
CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence (priority: project → user):
CODEBLOCK0
CODEBLOCK1
| Result | Action |
|---|
| Found | Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2) |
| Not found |
⛔ Run first-time setup (
references/config/first-time-setup.md) → Save EXTEND.md → Then continue |
CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.
| Path | Location |
|---|
| INLINECODE8 | Project directory |
| INLINECODE9 |
User home |
Legacy compatibility: if .baoyu-skills/baoyu-image-gen/EXTEND.md exists and the new path does not, runtime renames it to baoyu-imagine. If both files exist, runtime leaves them unchanged and uses the new path.
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits
Schema: INLINECODE12
Usage
CODEBLOCK2
Batch File Format
CODEBLOCK3
Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.
Options
| Option | Description |
|---|
| INLINECODE19 , INLINECODE20 | Prompt text |
| INLINECODE21 |
Read prompt from files (concatenated) |
|
--image <path> | Output image path (required in single-image mode) |
|
--batchfile <path> | JSON batch file for multi-image generation |
|
--jobs <count> | Worker count for batch mode (default: auto, max from config, built-in default 10) |
|
--provider google\|openai\|azure\|openrouter\|dashscope\|minimax\|jimeng\|seedream\|replicate | Force provider (default: auto-detect) |
|
--model <id>,
-m | Model ID (Google:
gemini-3-pro-image-preview; OpenAI:
gpt-image-1.5; Azure: deployment name such as
gpt-image-1.5 or
image-prod; OpenRouter:
google/gemini-3.1-flash-image-preview; DashScope:
qwen-image-2.0-pro; MiniMax:
image-01) |
|
--ar <ratio> | Aspect ratio (e.g.,
16:9,
1:1,
4:3) |
|
--size <WxH> | Size (e.g.,
1024x1024) |
|
--quality normal\|2k | Quality preset (default:
2k) |
|
--imageSize 1K\|2K\|4K | Image size for Google/OpenRouter (default: from quality) |
|
--ref <files...> | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate, MiniMax subject-reference, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0 |
|
--n <count> | Number of images |
|
--json | JSON output |
Environment Variables
| Variable | Description |
|---|
| INLINECODE47 | OpenAI API key |
| INLINECODE48 |
Azure OpenAI API key |
|
OPENROUTER_API_KEY | OpenRouter API key |
|
GOOGLE_API_KEY | Google API key |
|
DASHSCOPE_API_KEY | DashScope API key (阿里云) |
|
MINIMAX_API_KEY | MiniMax API key |
|
REPLICATE_API_TOKEN | Replicate API token |
|
JIMENG_ACCESS_KEY_ID | Jimeng (即梦) Volcengine access key |
|
JIMENG_SECRET_ACCESS_KEY | Jimeng (即梦) Volcengine secret key |
|
ARK_API_KEY | Seedream (豆包) Volcengine ARK API key |
|
OPENAI_IMAGE_MODEL | OpenAI model override |
|
AZURE_OPENAI_DEPLOYMENT | Azure default deployment name |
|
AZURE_OPENAI_IMAGE_MODEL | Backward-compatible alias for Azure default deployment/model name |
|
OPENROUTER_IMAGE_MODEL | OpenRouter model override (default:
google/gemini-3.1-flash-image-preview) |
|
GOOGLE_IMAGE_MODEL | Google model override |
|
DASHSCOPE_IMAGE_MODEL | DashScope model override (default:
qwen-image-2.0-pro) |
|
MINIMAX_IMAGE_MODEL | MiniMax model override (default:
image-01) |
|
REPLICATE_IMAGE_MODEL | Replicate model override (default: google/nano-banana-pro) |
|
JIMENG_IMAGE_MODEL | Jimeng model override (default: jimeng
t2iv40) |
|
SEEDREAM_IMAGE_MODEL | Seedream model override (default: doubao-seedream-5-0-260128) |
|
OPENAI_BASE_URL | Custom OpenAI endpoint |
|
AZURE_OPENAI_BASE_URL | Azure resource endpoint or deployment endpoint |
|
AZURE_API_VERSION | Azure image API version (default:
2025-04-01-preview) |
|
OPENROUTER_BASE_URL | Custom OpenRouter endpoint (default:
https://openrouter.ai/api/v1) |
|
OPENROUTER_HTTP_REFERER | Optional app/site URL for OpenRouter attribution |
|
OPENROUTER_TITLE | Optional app name for OpenRouter attribution |
|
GOOGLE_BASE_URL | Custom Google endpoint |
|
DASHSCOPE_BASE_URL | Custom DashScope endpoint |
|
MINIMAX_BASE_URL | Custom MiniMax endpoint (default:
https://api.minimax.io) |
|
REPLICATE_BASE_URL | Custom Replicate endpoint |
|
JIMENG_BASE_URL | Custom Jimeng endpoint (default:
https://visual.volcengineapi.com) |
|
JIMENG_REGION | Jimeng region (default:
cn-north-1) |
|
SEEDREAM_BASE_URL | Custom Seedream endpoint (default:
https://ark.cn-beijing.volces.com/api/v3) |
|
BAOYU_IMAGE_GEN_MAX_WORKERS | Override batch worker cap |
|
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY | Override provider concurrency, e.g.
BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY |
|
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS | Override provider start gap, e.g.
BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS |
Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > INLINECODE95
Model Resolution
Model priority (highest → lowest), applies to all providers:
- 1. CLI flag: INLINECODE96
- EXTEND.md: INLINECODE97
- Env var:
<PROVIDER>_IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL) - Built-in default
For Azure, --model / default_model.azure should be the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var, and AZURE_OPENAI_IMAGE_MODEL remains as a backward-compatible alias.
EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.
Agent MUST display model info before each generation:
- - Show: INLINECODE106
- Show switch hint: INLINECODE107
DashScope Models
Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior.
Official DashScope model families:
- -
qwen-image-2.0-pro, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0, INLINECODE114
- Free-form
size in
宽*高 format
- Total pixels must stay between
512*512 and
2048*2048
- Default size is approximately
1024*1024
- Best choice for custom ratios such as
21:9 and text-heavy Chinese/English layouts
- -
qwen-image-max, qwen-image-max-2025-12-30, qwen-image-plus, qwen-image-plus-2026-01-09, INLINECODE125
- Fixed sizes only:
1664*928,
1472*1104,
1328*1328,
1104*1472,
928*1664
- Default size is
1664*928
-
qwen-image currently has the same capability as
qwen-image-plus
- - Legacy DashScope models such as
z-image-turbo, z-image-ultra, INLINECODE136
- Keep using them only when the user explicitly asks for legacy behavior or compatibility
When translating CLI args into DashScope behavior:
- -
--size wins over INLINECODE138 - For
qwen-image-2.0*, prefer explicit --size; otherwise infer from --ar and use the official recommended resolutions below - For
qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch to INLINECODE143 - INLINECODE144 is a baoyu-imagine compatibility preset, not a native DashScope API field. Mapping
normal / 2k onto the qwen-image-2.0* table below is an implementation inference, not an official API guarantee
Recommended qwen-image-2.0* sizes for common aspect ratios:
| Ratio | INLINECODE149 | INLINECODE150 |
|---|
| INLINECODE151 | INLINECODE152 | INLINECODE153 |
| INLINECODE154 |
768*1152 |
1024*1536 |
|
3:2 |
1152*768 |
1536*1024 |
|
3:4 |
960*1280 |
1080*1440 |
|
4:3 |
1280*960 |
1440*1080 |
|
9:16 |
720*1280 |
1080*1920 |
|
16:9 |
1280*720 |
1920*1080 |
|
21:9 |
1344*576 |
2048*872 |
DashScope official APIs also expose negative_prompt, prompt_extend, and watermark, but baoyu-imagine does not expose them as dedicated CLI flags today.
Official references:
MiniMax Models
Use --model image-01 or set default_model.minimax / MINIMAX_IMAGE_MODEL when the user wants MiniMax image generation.
Official MiniMax image model options currently documented in the API reference:
- -
image-01 (recommended default)
- Supports text-to-image and subject-reference image generation
- Supports official
aspect_ratio values:
1:1,
16:9,
4:3,
3:2,
2:3,
3:4,
9:16,
21:9
- Supports documented custom
width /
height output sizes when using
--size <WxH>
-
width and
height must both be between
512 and
2048, and both must be divisible by
8
- Lower-latency variant
- Use
--ar for sizing; MiniMax documents custom
width /
height as only effective for INLINECODE204
MiniMax subject reference notes:
- -
--ref files are sent as MiniMax INLINECODE206 - MiniMax docs currently describe
subject_reference[].type as INLINECODE208 - Official docs say
image_file supports public URLs or Base64 Data URLs; baoyu-imagine sends local refs as Data URLs - Official docs recommend front-facing portrait references in JPG/JPEG/PNG under 10MB
Official references:
OpenRouter Models
Use full OpenRouter model IDs, e.g.:
- -
google/gemini-3.1-flash-image-preview (recommended, supports image output and reference-image workflows) - INLINECODE212
- INLINECODE213
- Other OpenRouter image-capable model IDs
Notes:
- - OpenRouter image generation uses
/chat/completions, not the OpenAI /images endpoints - If
--ref is used, choose a multimodal model that supports image input and image output - INLINECODE217 maps to OpenRouter
imageGenerationOptions.size; --size <WxH> is converted to the nearest OpenRouter size and inferred aspect ratio when possible
Replicate Models
Supported model formats:
- -
owner/name (recommended for official models), e.g. INLINECODE221 - INLINECODE222 (community models by version), e.g. INLINECODE223
Examples:
CODEBLOCK4
Provider Selection
- 1.
--ref provided + no --provider → auto-select Google first, then OpenAI, then Azure, then OpenRouter, then Replicate, then Seedream, then MiniMax (MiniMax subject reference is more specialized toward character/portrait consistency) - INLINECODE226 specified → use it (if
--ref, must be google, openai, azure, openrouter, replicate, seedream, or minimax) - Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | OpenRouter size | Replicate resolution | Use Case |
|---|
| INLINECODE235 | 1K | 1024px | 1K | 1K | Quick previews |
| INLINECODE236 (default) |
2K | 2048px | 2K | 2K | Covers, illustrations, infographics |
Google/OpenRouter imageSize: Can be overridden with INLINECODE237
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, INLINECODE243
- - Google multimodal: uses INLINECODE244
- OpenAI: maps to closest supported size
- OpenRouter: sends
imageGenerationOptions.aspect_ratio; if only --size <WxH> is given, aspect ratio is inferred automatically - Replicate: passes
aspect_ratio to model; when --ref is provided without --ar, defaults to INLINECODE250 - MiniMax: sends official
aspect_ratio values directly; if --size <WxH> is given without --ar, width / height are sent for INLINECODE256
Generation Mode
Default: Sequential generation.
Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.
| Mode | When to Use |
|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel batch |
Batch mode with 2+ tasks |
Execution choice:
| Situation | Preferred approach | Why |
|---|
| One image, or 1-2 simple images | Sequential | Lower coordination overhead and easier debugging |
| Multiple images already have saved prompt files |
Batch (
--batchfile) | Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput |
| Each image still needs separate reasoning, prompt writing, or style exploration | Subagents | The work is still exploratory, so each image may need independent analysis before generation |
| Output comes from
baoyu-article-illustrator with
outline.md +
prompts/ | Batch (
build-batch.ts ->
--batchfile) | That workflow already produces prompt files, so direct batch execution is the intended path |
Rule of thumb:
- - Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
- Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration
Parallel behavior:
- - Default worker count is automatic, capped by config, built-in default 10
- Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
- You can override worker count with INLINECODE264
- Each image retries automatically up to 3 attempts
- Final output includes success count, failure count, and per-image failure reasons
Error Handling
- - Missing API key → error with setup instructions
- Generation failure → auto-retry up to 3 attempts per image
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
图像生成(AI SDK)
基于官方API的图像生成。支持OpenAI、Azure OpenAI、Google、OpenRouter、DashScope(阿里通义万象)、MiniMax、即梦、豆包和Replicate提供商。
脚本目录
代理执行:
- 1. {baseDir} = 此SKILL.md文件所在目录
- 脚本路径 = {baseDir}/scripts/main.ts
- 解析 ${BUN_X} 运行时:如果已安装 bun → bun;如果可用 npx → npx -y bun;否则建议安装bun
第0步:加载偏好设置 ⛔ 阻塞
关键:此步骤必须在任何图像生成之前完成。请勿跳过或延迟。
检查EXTEND.md是否存在(优先级:项目 → 用户):
bash
macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-imagine/EXTEND.md && echo project
test -f ${XDG
CONFIGHOME:-$HOME/.config}/baoyu-skills/baoyu-imagine/EXTEND.md && echo xdg
test -f $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md && echo user
powershell
PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-imagine/EXTEND.md) { project }
$xdg = if ($env:XDG
CONFIGHOME) { $env:XDG
CONFIGHOME } else { $HOME/.config }
if (Test-Path $xdg/baoyu-skills/baoyu-imagine/EXTEND.md) { xdg }
if (Test-Path $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md) { user }
| 结果 | 操作 |
|---|
| 找到 | 加载、解析、应用设置。如果 defaultmodel.[provider] 为空 → 仅询问模型(流程2) |
| 未找到 |
⛔ 运行首次设置(
references/config/first-time-setup.md)→ 保存EXTEND.md → 然后继续 |
关键:如果未找到,在生成任何图像之前,使用AskUserQuestion完成完整设置(提供商 + 模型 + 质量 + 保存位置)。在创建EXTEND.md之前,生成被阻塞。
| 路径 | 位置 |
|---|
| .baoyu-skills/baoyu-imagine/EXTEND.md | 项目目录 |
| $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md |
用户主目录 |
旧版兼容性:如果 .baoyu-skills/baoyu-image-gen/EXTEND.md 存在且新路径不存在,运行时将其重命名为 baoyu-imagine。如果两个文件都存在,运行时保持它们不变并使用新路径。
EXTEND.md支持:默认提供商 | 默认质量 | 默认宽高比 | 默认图像尺寸 | 默认模型 | 批量工作线程上限 | 提供商特定批量限制
模式:references/config/preferences-schema.md
用法
bash
基础
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image cat.png
带宽高比
${BUN_X} {baseDir}/scripts/main.ts --prompt 一幅风景 --image out.png --ar 16:9
高质量
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --quality 2k
从提示文件
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
带参考图像(Google、OpenAI、Azure OpenAI、OpenRouter、Replicate、MiniMax或豆包4.0/4.5/5.0)
${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --ref source.png
带参考图像(显式提供商/模型)
${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
Azure OpenAI(模型表示部署名称)
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider azure --model gpt-image-1.5
OpenRouter(推荐的默认模型)
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider openrouter
OpenRouter带参考图像
${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png
特定提供商
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider openai
DashScope(阿里通义万象)
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只可爱的猫 --image out.png --provider dashscope
DashScope Qwen-Image 2.0 Pro(推荐用于自定义尺寸和文本渲染)
${BUN_X} {baseDir}/scripts/main.ts --prompt 为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题 --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872
DashScope旧版Qwen固定尺寸模型
${BUN_X} {baseDir}/scripts/main.ts --prompt 一张电影感海报 --image out.png --provider dashscope --model qwen-image-max --size 1664x928
MiniMax
${BUN_X} {baseDir}/scripts/main.ts --prompt 明亮工作室窗边的时尚编辑肖像 --image out.jpg --provider minimax
MiniMax带主体参考(最适合角色/肖像一致性)
${BUN_X} {baseDir}/scripts/main.ts --prompt 女孩站在图书馆窗边,电影感灯光 --image out.jpg --provider minimax --model image-01 --ref portrait.png --ar 16:9
MiniMax自定义尺寸(为image-01记录)
${BUN_X} {baseDir}/scripts/main.ts --prompt 电影感海报 --image out.jpg --provider minimax --model image-01 --size 1536x1024
Replicate(google/nano-banana-pro)
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate
Replicate带特定模型
${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate --model google/nano-banana
批量模式,使用已保存的提示文件
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json
批量模式,显式工作线程数
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
批量文件格式
json
{
jobs: 4,
tasks: [
{
id: hero,
promptFiles: [prompts/hero.md],
image: out/hero.png,
provider: replicate,
model: google/nano-banana-pro,
ar: 16:9,
quality: 2k
},
{
id: diagram,
promptFiles: [prompts/diagram.md],
image: out/diagram.png,
ref: [references/original.png]
}
]
}
promptFiles、image和ref中的路径相对于批量文件所在目录解析。jobs是可选的(可由CLI --jobs覆盖)。也接受顶层数组格式(不带jobs包装)。
选项
| 选项 | 描述 |
|---|
| --prompt <text>, -p | 提示文本 |
| --promptfiles <files...> |
从文件读取提示(拼接) |
| --image
| 输出图像路径(单图像模式必需) |
| --batchfile | 用于多图像生成的JSON批量文件 |
| --jobs | 批量模式的工作线程数(默认:自动,上限来自配置,内置默认10) |
| --provider google\|openai\|azure\|openrouter\|dashscope\|minimax\|jimeng\|seedream\|replicate | 强制指定提供商(默认:自动检测) |
| --model , -m | 模型ID(Google:gemini-3-pro-image-preview;OpenAI: