Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate providers.

Script Directory

Agent Execution:

1. {baseDir} = this SKILL.md file's directory
Script path = INLINECODE1
Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun

Step 0: Load Preferences ⛔ BLOCKING

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.

Check EXTEND.md existence (priority: project → user):

CODEBLOCK0

CODEBLOCK1

Result	Action
Found	Load, parse, apply settings. If `default_model.[provider]` is null → ask model only (Flow 2)
Not found

⛔ Run first-time setup (references/config/first-time-setup.md) → Save EXTEND.md → Then continue |

CRITICAL: If not found, complete the full setup (provider + model + quality + save location) using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.

Path	Location
INLINECODE8	Project directory
INLINECODE9

User home |

Legacy compatibility: if .baoyu-skills/baoyu-image-gen/EXTEND.md exists and the new path does not, runtime renames it to baoyu-imagine. If both files exist, runtime leaves them unchanged and uses the new path.

Schema: INLINECODE12

Usage

CODEBLOCK2

Batch File Format

CODEBLOCK3

Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). Top-level array format (without jobs wrapper) is also accepted.

Options

Option	Description
INLINECODE19, INLINECODE20	Prompt text
INLINECODE21

Environment Variables

Variable	Description
INLINECODE47	OpenAI API key
INLINECODE48

Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > INLINECODE95

Model Resolution

Model priority (highest → lowest), applies to all providers:

1. CLI flag: INLINECODE96
EXTEND.md: INLINECODE97
Env var: <PROVIDER>_IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL)
Built-in default

For Azure, --model / default_model.azure should be the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var, and AZURE_OPENAI_IMAGE_MODEL remains as a backward-compatible alias.

EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.

Agent MUST display model info before each generation:

- Show: INLINECODE106
Show switch hint: INLINECODE107

DashScope Models

Use --model qwen-image-2.0-pro or set default_model.dashscope / DASHSCOPE_IMAGE_MODEL when the user wants official Qwen-Image behavior.

Official DashScope model families:

- qwen-image-2.0-pro, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0, INLINECODE114

- Free-form size in 宽*高 format - Total pixels must stay between 512*512 and 2048*2048 - Default size is approximately 1024*1024 - Best choice for custom ratios such as 21:9 and text-heavy Chinese/English layouts

- qwen-image-max, qwen-image-max-2025-12-30, qwen-image-plus, qwen-image-plus-2026-01-09, INLINECODE125

- Fixed sizes only: 1664*928, 1472*1104, 1328*1328, 1104*1472, 928*1664 - Default size is 1664*928 - qwen-image currently has the same capability as qwen-image-plus

- Legacy DashScope models such as z-image-turbo, z-image-ultra, INLINECODE136

- Keep using them only when the user explicitly asks for legacy behavior or compatibility

When translating CLI args into DashScope behavior:

- --size wins over INLINECODE138
For qwen-image-2.0*, prefer explicit --size; otherwise infer from --ar and use the official recommended resolutions below
For qwen-image-max/plus/image, only use the five official fixed sizes; if the requested ratio is not covered, switch to INLINECODE143
INLINECODE144 is a baoyu-imagine compatibility preset, not a native DashScope API field. Mapping normal / 2k onto the qwen-image-2.0* table below is an implementation inference, not an official API guarantee

Recommended qwen-image-2.0* sizes for common aspect ratios:

Ratio	INLINECODE149	INLINECODE150
INLINECODE151	INLINECODE152	INLINECODE153
INLINECODE154

768*1152 | 1024*1536 |
| 3:2 | 1152*768 | 1536*1024 |
| 3:4 | 960*1280 | 1080*1440 |
| 4:3 | 1280*960 | 1440*1080 |
| 9:16 | 720*1280 | 1080*1920 |
| 16:9 | 1280*720 | 1920*1080 |
| 21:9 | 1344*576 | 2048*872 |

DashScope official APIs also expose negative_prompt, prompt_extend, and watermark, but baoyu-imagine does not expose them as dedicated CLI flags today.

Official references:

MiniMax Models

Use --model image-01 or set default_model.minimax / MINIMAX_IMAGE_MODEL when the user wants MiniMax image generation.

Official MiniMax image model options currently documented in the API reference:

- image-01 (recommended default)

- Supports text-to-image and subject-reference image generation - Supports official aspect_ratio values: 1:1, 16:9, 4:3, 3:2, 2:3, 3:4, 9:16, 21:9 - Supports documented custom width / height output sizes when using --size <WxH> - width and height must both be between 512 and 2048, and both must be divisible by 8

- INLINECODE200

- Lower-latency variant - Use --ar for sizing; MiniMax documents custom width / height as only effective for INLINECODE204

MiniMax subject reference notes:

- --ref files are sent as MiniMax INLINECODE206
MiniMax docs currently describe subject_reference[].type as INLINECODE208
Official docs say image_file supports public URLs or Base64 Data URLs; baoyu-imagine sends local refs as Data URLs
Official docs recommend front-facing portrait references in JPG/JPEG/PNG under 10MB

Official references:

OpenRouter Models

Use full OpenRouter model IDs, e.g.:

- google/gemini-3.1-flash-image-preview (recommended, supports image output and reference-image workflows)
INLINECODE212
INLINECODE213
Other OpenRouter image-capable model IDs

Notes:

- OpenRouter image generation uses /chat/completions, not the OpenAI /images endpoints
If --ref is used, choose a multimodal model that supports image input and image output
INLINECODE217 maps to OpenRouter imageGenerationOptions.size; --size <WxH> is converted to the nearest OpenRouter size and inferred aspect ratio when possible

Replicate Models

Supported model formats:

- owner/name (recommended for official models), e.g. INLINECODE221
INLINECODE222 (community models by version), e.g. INLINECODE223

Examples:

CODEBLOCK4

Provider Selection

1. --ref provided + no --provider → auto-select Google first, then OpenAI, then Azure, then OpenRouter, then Replicate, then Seedream, then MiniMax (MiniMax subject reference is more specialized toward character/portrait consistency)
INLINECODE226 specified → use it (if --ref, must be google, openai, azure, openrouter, replicate, seedream, or minimax)
Only one API key available → use that provider
Multiple available → default to Google

Quality Presets

Preset	Google imageSize	OpenAI Size	OpenRouter size	Replicate resolution	Use Case
INLINECODE235	1K	1024px	1K	1K	Quick previews
INLINECODE236 (default)

2K | 2048px | 2K | 2K | Covers, illustrations, infographics |

Google/OpenRouter imageSize: Can be overridden with INLINECODE237

Aspect Ratios

Supported: 1:1, 16:9, 9:16, 4:3, 3:4, INLINECODE243

- Google multimodal: uses INLINECODE244
OpenAI: maps to closest supported size
OpenRouter: sends imageGenerationOptions.aspect_ratio; if only --size <WxH> is given, aspect ratio is inferred automatically
Replicate: passes aspect_ratio to model; when --ref is provided without --ar, defaults to INLINECODE250
MiniMax: sends official aspect_ratio values directly; if --size <WxH> is given without --ar, width / height are sent for INLINECODE256

Generation Mode

Default: Sequential generation.

Batch Parallel Generation: When --batchfile contains 2 or more pending tasks, the script automatically enables parallel generation.

Mode	When to Use
Sequential (default)	Normal usage, single images, small batches
Parallel batch

Batch mode with 2+ tasks |

Execution choice:

Situation	Preferred approach	Why
One image, or 1-2 simple images	Sequential	Lower coordination overhead and easier debugging
Multiple images already have saved prompt files

Batch (--batchfile) | Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput |
| Each image still needs separate reasoning, prompt writing, or style exploration | Subagents | The work is still exploratory, so each image may need independent analysis before generation |
| Output comes from baoyu-article-illustrator with outline.md + prompts/ | Batch (build-batch.ts -> --batchfile) | That workflow already produces prompt files, so direct batch execution is the intended path |

Rule of thumb:

- Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration

Parallel behavior:

- Default worker count is automatic, capped by config, built-in default 10
Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
You can override worker count with INLINECODE264
Each image retries automatically up to 3 attempts
Final output includes success count, failure count, and per-image failure reasons

Error Handling

- Missing API key → error with setup instructions
Generation failure → auto-retry up to 3 attempts per image
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

图像生成（AI SDK）

基于官方API的图像生成。支持OpenAI、Azure OpenAI、Google、OpenRouter、DashScope（阿里通义万象）、MiniMax、即梦、豆包和Replicate提供商。

脚本目录

代理执行：

1. {baseDir} = 此SKILL.md文件所在目录
脚本路径 = {baseDir}/scripts/main.ts
解析 ${BUN_X} 运行时：如果已安装 bun → bun；如果可用 npx → npx -y bun；否则建议安装bun

第0步：加载偏好设置 ⛔ 阻塞

关键：此步骤必须在任何图像生成之前完成。请勿跳过或延迟。

检查EXTEND.md是否存在（优先级：项目 → 用户）：

bash

macOS, Linux, WSL, Git Bash

test -f .baoyu-skills/baoyu-imagine/EXTEND.md && echo project
test -f ${XDGCONFIGHOME:-$HOME/.config}/baoyu-skills/baoyu-imagine/EXTEND.md && echo xdg
test -f $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md && echo user

powershell

PowerShell (Windows)

if (Test-Path .baoyu-skills/baoyu-imagine/EXTEND.md) { project }
$xdg = if ($env:XDGCONFIGHOME) { $env:XDGCONFIGHOME } else { $HOME/.config }
if (Test-Path $xdg/baoyu-skills/baoyu-imagine/EXTEND.md) { xdg }
if (Test-Path $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md) { user }

结果	操作
找到	加载、解析、应用设置。如果 defaultmodel.[provider] 为空 → 仅询问模型（流程2）
未找到

⛔ 运行首次设置（references/config/first-time-setup.md）→ 保存EXTEND.md → 然后继续 |

关键：如果未找到，在生成任何图像之前，使用AskUserQuestion完成完整设置（提供商 + 模型 + 质量 + 保存位置）。在创建EXTEND.md之前，生成被阻塞。

路径	位置
.baoyu-skills/baoyu-imagine/EXTEND.md	项目目录
$HOME/.baoyu-skills/baoyu-imagine/EXTEND.md

用户主目录 |

旧版兼容性：如果 .baoyu-skills/baoyu-image-gen/EXTEND.md 存在且新路径不存在，运行时将其重命名为 baoyu-imagine。如果两个文件都存在，运行时保持它们不变并使用新路径。

模式：references/config/preferences-schema.md

用法

bash

基础

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image cat.png

带宽高比

${BUN_X} {baseDir}/scripts/main.ts --prompt 一幅风景 --image out.png --ar 16:9

高质量

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --quality 2k

从提示文件

${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

带参考图像（Google、OpenAI、Azure OpenAI、OpenRouter、Replicate、MiniMax或豆包4.0/4.5/5.0）

${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --ref source.png

带参考图像（显式提供商/模型）

${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

Azure OpenAI（模型表示部署名称）

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider azure --model gpt-image-1.5

OpenRouter（推荐的默认模型）

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider openrouter

OpenRouter带参考图像

${BUN_X} {baseDir}/scripts/main.ts --prompt 变成蓝色 --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

特定提供商

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider openai

DashScope（阿里通义万象）

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只可爱的猫 --image out.png --provider dashscope

DashScope Qwen-Image 2.0 Pro（推荐用于自定义尺寸和文本渲染）

${BUN_X} {baseDir}/scripts/main.ts --prompt 为咖啡品牌设计一张 21:9 横幅海报，包含清晰中文标题 --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

DashScope旧版Qwen固定尺寸模型

${BUN_X} {baseDir}/scripts/main.ts --prompt 一张电影感海报 --image out.png --provider dashscope --model qwen-image-max --size 1664x928

MiniMax

${BUN_X} {baseDir}/scripts/main.ts --prompt 明亮工作室窗边的时尚编辑肖像 --image out.jpg --provider minimax

MiniMax带主体参考（最适合角色/肖像一致性）

${BUN_X} {baseDir}/scripts/main.ts --prompt 女孩站在图书馆窗边，电影感灯光 --image out.jpg --provider minimax --model image-01 --ref portrait.png --ar 16:9

MiniMax自定义尺寸（为image-01记录）

${BUN_X} {baseDir}/scripts/main.ts --prompt 电影感海报 --image out.jpg --provider minimax --model image-01 --size 1536x1024

Replicate（google/nano-banana-pro）

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate

Replicate带特定模型

${BUN_X} {baseDir}/scripts/main.ts --prompt 一只猫 --image out.png --provider replicate --model google/nano-banana

批量模式，使用已保存的提示文件

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json

批量模式，显式工作线程数

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

批量文件格式

json
{
jobs: 4,
tasks: [
{
id: hero,
promptFiles: [prompts/hero.md],
image: out/hero.png,
provider: replicate,
model: google/nano-banana-pro,
ar: 16:9,
quality: 2k
},
{
id: diagram,
promptFiles: [prompts/diagram.md],
image: out/diagram.png,
ref: [references/original.png]
}
]
}

promptFiles、image和ref中的路径相对于批量文件所在目录解析。jobs是可选的（可由CLI --jobs覆盖）。也接受顶层数组格式（不带jobs包装）。

选项

选项	描述
--prompt <text>, -p	提示文本
--promptfiles <files...>

baoyu-imagineAI图像生成

baoyu-imagine

Image Generation (AI SDK)

Script Directory

Step 0: Load Preferences ⛔ BLOCKING

Usage

Batch File Format

Options

Environment Variables

Model Resolution

DashScope Models

MiniMax Models

OpenRouter Models

Replicate Models

Provider Selection

Quality Presets

Aspect Ratios

Generation Mode

Error Handling

Extension Support

图像生成（AI SDK）

脚本目录

第0步：加载偏好设置 ⛔ 阻塞

macOS, Linux, WSL, Git Bash

PowerShell (Windows)

用法

基础

带宽高比

高质量

从提示文件

带参考图像（Google、OpenAI、Azure OpenAI、OpenRouter、Replicate、MiniMax或豆包4.0/4.5/5.0）

带参考图像（显式提供商/模型）

Azure OpenAI（模型表示部署名称）

OpenRouter（推荐的默认模型）

OpenRouter带参考图像

特定提供商

DashScope（阿里通义万象）

DashScope Qwen-Image 2.0 Pro（推荐用于自定义尺寸和文本渲染）

DashScope旧版Qwen固定尺寸模型

MiniMax

MiniMax带主体参考（最适合角色/肖像一致性）

MiniMax自定义尺寸（为image-01记录）

Replicate（google/nano-banana-pro）

Replicate带特定模型

批量模式，使用已保存的提示文件

批量模式，显式工作线程数

批量文件格式

选项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement