IMA AI Creation

⚠️ 重要：模型 ID 参考

CRITICAL: When calling the script, you MUST use the exact modelid (second column), NOT the friendly model name. Do NOT infer modelid from the friendly name (e.g., ❌ nano-banana-pro is WRONG; ✅ gemini-3-pro-image is CORRECT).

Quick Reference Table:

图像模型 (Image Models)

友好名称 (Friendly Name)	model_id	说明 (Notes)
Nano Banana2	INLINECODE2	❌ NOT nano-banana-2, 预算选择 4-13 pts
Nano Banana Pro

视频模型 (Video Models)

友好名称 (Friendly Name)	modelid (t2v)	modelid (i2v)	说明 (Notes)
Wan 2.6	INLINECODE6	INLINECODE7	⚠️ Note -t2v/-i2v suffix
IMA Video Pro (Sevio 1.0)

音乐模型 (Music Models)

友好名称 (Friendly Name)	model_id	说明 (Notes)
Suno (sonic v4)	INLINECODE26	⚠️ Simplified to sonic
DouBao BGM

语音模型 (Speech/TTS Models)

友好名称 (Friendly Name)	model_id	说明 (Notes)
seed-tts-2.0	INLINECODE29	✅ Same as friendly name (default)

How to get the correct model_id:

1. Check this table first
Use --list-models --task-type <type> to query available models
Refer to command examples in this SKILL.md

Runtime truth source: GET /open/v1/product/list (or --list-models).
Any table in this document is guidance; actual availability depends on current product list.

Example:

# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro

# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image

📚 Optional Knowledge Enhancement (ima-knowledge-ai)

This skill is fully runnable as a standalone package.
If ima-knowledge-ai is installed, the agent may read its references for workflow decomposition and consistency guidance.

📥 User Input Parsing (Media Type & Task Routing)

Purpose: So that any agent parses user intent consistently, first determine the media type from the user's request, then choose task_type and model.

1. User phrasing → media type (do this first)

User intent / keywords	Media type	task_type examples
画 / 生成图 / 图片 / image / 画一张 / 图生图	image	INLINECODE38, INLINECODE39
视频 / 生成视频 / video / 图生视频 / 文生视频

If the request mixes media (e.g. "宣传片+配乐"), treat as multi-media workflow: read workflow-design.md, then plan image → video → music steps and use the correct task_type for each step.

2. Model and parameter parsing

- Image: For model name → modelid and size/aspectratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).
Video: For tasktype (t2v / i2v / firstlast / reference), model alias → modelid, and duration/resolution/aspectratio, follow ima-video-ai skill (User Input Parsing section).

Sevio alias normalization in ima-all-ai: - Ima Sevio 1.0 → ima-pro - Ima Sevio 1.0-Fast / Ima Sevio 1.0 Fast → ima-pro-fast Routing rule: - Normalize alias first - Then resolve against runtime product list for the selected task_type - If model is absent in current category, return available model_ids from --list-models

- Music: Suno (sonic) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use modelid sonic, GenBGM, GenSong per "Recommended Defaults" and "Music Generation" tables below.
Speech (TTS): Get modelid from GET /open/v1/product/list?category=text_to_speech or run script with --task-type text_to_speech --list-models. Map user intent to parameters using product form_config:

| User intent / phrasing | Parameter (if in form_config) | Notes |
|------------------------|--------------------------------|--------|
| 女声 / 女声朗读 / female voice | voiceid / voicetype | Use value from form_config options |
| 男声 / 男声朗读 / male voice | voiceid / voicetype | Use value from form_config options |
| 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | volume | If supported |

If the user does not specify, use formconfig defaults. Pass extra params via --extra-params '{"speed":1.0}'. Only send parameters present in the product’s creditrules/attributes or form_config (script reflection strips others on retry).

⚙️ How This Skill Works

For transparency: This skill uses a bundled Python script (scripts/ima_create.py) to call the IMA Open API. The script:

- Sends your prompt to two IMA-owned domains (see "Network Endpoints" below)
Uses --user-id only locally as a key for storing your model preferences
Returns image/video/music URLs when generation is complete

What gets sent to IMA servers:

- ✅ Your prompt/description (image/video/music)
✅ Model selection (SeeDream/Wan/Suno/etc.)
✅ Generation parameters (size, duration, style, etc.)
❌ NO API key in prompts (key is used for authentication only)
❌ NO user_id (it's only used locally)

What's stored locally:

- ~/.openclaw/memory/ima_prefs.json - Your model preferences (< 1 KB)
INLINECODE66 - Generation logs (auto-deleted after 7 days)

🌐 Network Endpoints Used

Domain	Owner	Purpose	Data Sent	Privacy
INLINECODE67	IMA Studio	Main API (product list, task creation, task polling)	Prompts, model IDs, generation params, your API key	Standard HTTPS, data processed for AI generation
INLINECODE68

IMA Studio | Image/Video upload service (presigned URL generation) | Your API key, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only |
| *.aliyuncs.com, *.esxscloud.com | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, NO API key) | IMA-managed OSS buckets, presigned URLs expire after 7 days |

Key Points:

- Music tasks (text_to_music) and TTS tasks (text_to_speech) only use api.imastudio.com.
Image/video tasks require imapi.liveme.com to obtain presigned URLs for uploading input images.
Your API key is sent to both api.imastudio.com and imapi.liveme.com (both owned by IMA Studio).
Verify network calls: tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'. See this document: 🌐 Network Endpoints Used and ⚠️ Credential Security Notice for full disclosure.

⚠️ Credential Security Notice

Your API key is sent to both IMA-owned domains:

1. Authorization: Bearer ima_xxx... → api.imastudio.com (main API)
Query param appUid=ima_xxx... → imapi.liveme.com (upload service)

Security best practices:

- 🧪 Use test keys for experiments: Generate a separate API key for testing.
🔍 Monitor usage: Check https://imastudio.com/dashboard for unauthorized activity.
⏱️ Rotate keys: Regenerate your API key periodically (monthly recommended).
📊 Review logs: Check ~/.openclaw/logs/ima_skills/ for unexpected API calls.

Why two domains? IMA Studio uses a microservices architecture:

- api.imastudio.com: Core AI generation API
imapi.liveme.com: Specialized image/video upload service (shared infrastructure)

Both domains are operated by IMA Studio. The same API key grants access to both services.

Agent Execution (Internal Reference)

Note for users: You can review the script source at scripts/ima_create.py anytime.
The agent uses this script to simplify API calls. Music tasks use only api.imastudio.com, while image/video tasks also call imapi.liveme.com for file uploads (see "Network Endpoints" above).

Use the bundled script internally for all task types — it ensures correct parameter construction:

CODEBLOCK3

The script outputs JSON with url, model_name, credit — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.

Overview

Call IMA Open API to create AI-generated content. All endpoints require an ima_* API key. The core flow is: query products → create task → poll until done.

🔒 Security & Transparency Policy

This skill is community-maintained and open for inspection.

✅ What Users CAN Do

Full transparency:

- ✅ Review all source code: Check scripts/ima_create.py and ima_logger.py anytime
✅ Verify network calls: Music tasks use api.imastudio.com only; image/video tasks also use imapi.liveme.com (see "Network Endpoints" section)
✅ Inspect local data: View ~/.openclaw/memory/ima_prefs.json and log files
✅ Control privacy: Delete preferences/logs anytime, or disable file writes (see below)

Configuration allowed:

- ✅ Set API key in environment or agent config:

- Environment variable: export IMA_API_KEY=ima_your_key_here
- OpenClaw/MCP config: Add IMA_API_KEY to agent's environment configuration
- Get your key at: https://imastudio.com

- ✅ Use scoped/test keys: Test with limited API keys, rotate after testing
✅ Disable file writes: Make prefs/logs read-only or symlink to INLINECODE100

Data control:

- ✅ View stored data: INLINECODE101
✅ Delete preferences: rm ~/.openclaw/memory/ima_prefs.json (resets to defaults)
✅ Delete logs: rm -rf ~/.openclaw/logs/ima_skills/ (auto-cleanup after 7 days anyway)

⚠️ Advanced Users: Fork & Modify

If you need to modify this skill for your use case:

1. Fork the repository (don't modify the original)
Update your fork with your changes
Test thoroughly with limited API keys
Document your changes for troubleshooting

Note: Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.

❌ What to AVOID (Security Risks)

Actions that could compromise security:

- ❌ Sharing API keys publicly or in skill files
❌ Modifying API endpoints to unknown servers
❌ Disabling SSL/TLS certificate verification
❌ Logging sensitive user data (prompts, IDs, etc.)
❌ Bypassing authentication or billing mechanisms

Why this matters:

1. API Compatibility: Skill logic aligns with IMA Open API schema
Security: Malicious modifications could leak credentials or bypass billing
Support: Modified skills may not be supported
Community: Breaking changes affect all users

📋 Privacy & Data Handling Summary

What this skill does with your data:

Data Type	Sent to IMA?	Stored Locally?	User Control
Prompts (image/video/music)	✅ Yes (required for generation)	❌ No	None (required)
API key

Privacy recommendations:

1. Use test/scoped API keys for initial testing
Note: --user-id is never sent to IMA servers - it's only used locally as a key for storing preferences in INLINECODE106
Review source code at scripts/ima_create.py to verify network calls (search for create_task function)
Rotate API keys after testing or if compromised

Get your IMA API key: Visit https://imastudio.com to register and get started.

🔧 For Skill Maintainers Only

Version control:

- All changes must go through Git with proper version bumps (semver)
CHANGELOG.md must document all changes
Production deployments require code review

File checksums (optional):
CODEBLOCK4

If users report issues, verify file integrity first.

🧠 User Preference Memory (Image)

User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.

Storage: `~/.openclaw/memory/ima_prefs.json`

Single file, shared across all IMA skills:

CODEBLOCK5

Model Selection Flow (Image Generation)

Step 1: Get knowledge-ai recommendation (if installed)
CODEBLOCK6

Step 2: Check user preference
CODEBLOCK7

Step 3: Decide which model to use
CODEBLOCK8

Step 4: Check for mismatch (for later hint)
CODEBLOCK9

When to Write (User Explicit Preference ONLY)

✅ Save preference when user explicitly specifies a model:

User says	Action
INLINECODE110 / `换成XXX` / INLINECODE112	Switch to model XXX + save as preference
INLINECODE113 / `默认用XXX` / INLINECODE115

Save + confirm: ✅ 已记住！以后图片生成默认用 [XXX] |
| 我喜欢XXX / 我更喜欢XXX | Save as preference |

❌ Do NOT save when:

- Agent auto-selects from knowledge-ai → not user preference
Agent uses fallback default → not user preference
User says generic quality requests (see "Clear Preference" below) → clear preference instead

When to Clear (User Abandons Preference)

🗑️ Clear preference when user wants automatic selection:

User says	Action
INLINECODE119 / `用最合适的` / `best` / INLINECODE122	Clear pref + use knowledge-ai recommendation
INLINECODE123 / `你选一个` / INLINECODE125

Implementation:

del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)

⭐ Model Selection Priority (Image)

Selection flow:

1. User preference (if exists) → Highest priority, always respect
ima-knowledge-ai skill (if installed) → Professional recommendation based on task
Fallback defaults → Use table below (only if neither 1 nor 2 exists)

Important notes:

- User preference is only saved when user explicitly specifies a model (see "When to Write" above)
Knowledge-ai is always consulted (even when user pref exists) to detect mismatches
When mismatch detected → add gentle hint in success message (does NOT interrupt generation)

The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.

When using user preference for image generation, show a line like:
CODEBLOCK11

Preference Change Confirmation

When user switches to a different model than their saved preference:

CODEBLOCK12

⭐ Recommended Defaults

These are fallback defaults — only used when no user preference exists.
Always default to the newest and most popular model. Do NOT default to the cheapest.

Task Type	Default Model	modelid	versionid	Cost	Why
texttoimage	SeeDream 4.5	INLINECODE131	INLINECODE132	5 pts	Latest doubao flagship, photorealistic 4K
texttoimage (budget)

Premium options:

- Image: Nano Banana Pro — Highest quality with size control (1K/2K/4K), higher cost (10-18 pts for texttoimage, 10 pts for imagetoimage)
Video: Kling O1, Sora 2 Pro, Google Veo 3.1 — Premium quality with longer duration options

Quick selection guide (production as of 2026-02-27, sorted by popularity):

- Image (4 models available) → SeeDream 4.5 (5, default); artistic → Midjourney 🎨 (8-10); budget → Nano Banana2 (4, 512px); premium → Nano Banana Pro (10-18)
🔥 Video from text (most popular) → Wan 2.6 (25, balanced); premium → Hailuo 2.3 (38); budget → Vidu Q2 (5)
🔥 Video from image (most popular) → Wan 2.6 (25)
Music → Suno (25); DouBao BGM/Song (30 each)
Cheapest → Nano Banana2 512px (4) for image; Vidu Q2 (5) for video

Selection guide by use case:

Image Generation:

- General image generation → SeeDream 4.5 (5pts)
Custom aspect ratio (16:9, 9:16, 4:3, etc.) → SeeDream 4.5 🌟 or Nano Banana Pro/2 🆕 (native support)
Budget-conscious / fast generation → Nano Banana2 (4pts)
Highest quality with size control (1K/2K/4K) → Nano Banana Pro (texttoimage: 10-18pts, imagetoimage: 10pts)
Artistic/creative styles, illustrations, paintings → Midjourney 🎨 (8-10pts)
Style transfer / image editing → SeeDream 4.5 (5pts) or Midjourney 🎨 (artistic)

Video Generation:

- General video generation → Wan 2.6 (25pts, most popular)
Premium cinematic quality → Google Veo 3.1 (70-330pts) or Sora 2 Pro (122+pts)
Budget video → Vidu Q2 (5pts) or Hailuo 2.0 (5pts)
With audio support → Kling O1 (48+pts) or Google Veo 3.1 (70+pts)
First/last frame animation → Kling O1 (48+pts)
Reference image consistency → Kling O1 (48+pts) or Google Veo 3.1 (70+pts)

Music Generation:

- Custom song with lyrics, vocals, style → Suno sonic-v5 (25pts, default, ~2min)

- Full control: custommode, lyrics, vocalgender, tags, negative_tags
- Best for: complete songs, vocal tracks, artistic compositions

- Background music / ambient loop → DouBao BGM (30pts, ~30s)

- Simplified: prompt-only, no advanced parameters
- Best for: video backgrounds, ambient music, short loops

- Simple song generation → DouBao Song (30pts, ~30s)

- Simplified: prompt-only
- Best for: quick song generation, structured vocal compositions

- User explicitly asks for cheapest → DouBao BGM/Song (6pts each) — only if explicitly requested

Speech (TTS) Generation:

- Text-to-speech / 语音合成 / 朗读 → text_to_speech. Always query GET /open/v1/product/list?category=text_to_speech (or --list-models) to get current modelid and credit. No fixed default; use first available or user preference. Voice/speed/format parameters: see "Model and parameter parsing" (TTS table) and "Speech (TTS) — textto_speech" in this document.

⚠️ Technical Note for Suno:

INLINECODE167 inside parameters.parameters (e.g., "sonic-v5") is different from the outer model_version field (which is "sonic"). Always set both correctly when creating Suno tasks.

⚠️ Production Image Models (4 available):

- SeeDream 4.5 (doubao-seedream-4.5) — 5 pts, default
Midjourney 🎨 (midjourney) — 8/10 pts for 480p/720p, artistic styles
Nano Banana2 (gemini-3.1-flash-image) — 4/6/10/13 pts for 512px/1K/2K/4K
Nano Banana Pro (gemini-3-pro-image) — 10/10/18 pts for 1K/2K/4K

All other image models mentioned in older documentation are no longer available in production.

🌟 Parameter Support Notes (All Task Types):

Image Models (texttoimage / imagetoimage)

🆕 MAJOR UPDATE: Nano Banana series now has NATIVE aspect_ratio support!

- Nano Banana Pro: ✅ Supports aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
Nano Banana2: ✅ Supports aspect_ratio (1:1, 16:9, 9:16, 4:3, 3:4) NATIVELY
SeeDream 4.5: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
Midjourney: ❌ 1:1 only (fixed 1024x1024)

aspect_ratio support details:

- ✅ aspect_ratio:

- SeeDream 4.5: ✅ Supports 8 ratios via virtual params (1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9)
- Nano Banana2: ✅ Native support for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- Nano Banana Pro: ✅ Native support for 5 ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- Midjourney: ❌ 1:1 only (fixed 1024x1024)

- ✅ size:

- Nano Banana2: 512px, 1K, 2K, 4K (via different attribute_ids, 4-13 pts)
- Nano Banana Pro: 1K, 2K, 4K (via different attribute_ids, 10-18 pts)
- SeeDream 4.5: Adaptive default (5 pts)
- Midjourney: 480p/720p (via attribute_id, 8/10 pts)

- ❌ 8K: No model supports 8K (max is 4K via Nano Banana Pro)
❌ Non-standard aspect ratios (7:3, 8:5, etc.): Not supported. Use closest supported ratio or video models.
✅ n: Multiple outputs supported (1-4), credit × n

When user requests unsupported combinations for images:

- Midjourney + aspect_ratio (16:9, etc.): Recommend SeeDream 4.5 or Nano Banana series instead

  ❌ Midjourney 暂不支持自定义 aspect_ratio（仅支持 1024x1024 方形）
  
  ✅ 推荐方案：
    1. SeeDream 4.5（支持虚拟参数 aspect_ratio）
       • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9
       • 成本：5 积分（性价比最佳）
    2. Nano Banana Pro/2（原生支持 aspect_ratio）
       • 支持比例：1:1, 16:9, 9:16, 4:3, 3:4
       • 成本：4-18 积分（按尺寸）
  
  需要我帮你用 SeeDream 4.5 生成吗？

- Any model + 8K: Inform user no model supports 8K, max is 4K (Nano Banana Pro)
Any model + non-standard ratio (7:3, 8:5, etc.): Non-standard ratio, not supported. Suggest closest supported ratio (e.g., 21:9 for ultra-wide, 2:3 for portrait)

Video Models (texttovideo / imagetovideo / firstlastframe / reference_image)

- ✅ resolution: 540P, 720P, 1080P, 2K, 4K (model-dependent, higher res = higher cost)
✅ aspectratio: 16:9, 9:16, 1:1, 4:3 (model-dependent, check form_config)
✅ duration: 4s, 5s, 10s, 15s (model-dependent, longer = higher cost)
⚠️ generateaudio: Supported by Veo 3.1, Kling O1, Hailuo (check form_config)
✅ promptextend: AI-powered prompt enhancement (most models support)
✅ negativeprompt: Content exclusion (most models support)
✅ shot_type: Single/multi-shot control (model-dependent)
✅ seed: Reproducibility control (most models support, -1 = random)
✅ n: Multiple outputs (1-4), credit × n

🆕 Special Case: Pixverse Model Parameter (v1.0.7+)

Auto-Inference Logic for Pixverse V5.5/V5/V4:

- Problem: Pixverse V5.5, V5, V4 lack model field in form_config from Product List API
Backend Requirement: Backend requires model parameter (e.g., "v5.5", "v5", "v4")
Auto-Fix: System automatically extracts version from model_name and injects it

- Example: model_name: "Pixverse V5.5" → auto-inject model: "v5.5"
- Example: model_name: "Pixverse V4" → auto-inject model: "v4"

- Note: V4.5 and V3.5 include model in form_config (no auto-inference needed)
Relevant Task Types: All video modes (texttovideo, imagetovideo, firstlastframetovideo, referenceimageto_video)

Error Prevention:

- Without auto-inference: INLINECODE196
With auto-inference (v1.0.7+): Pixverse V5.5/V5/V4 work seamlessly ✅

Music Models (texttomusic)

Suno sonic-v5 (Full-Featured):

- ✅ custommode: Suno only (enables vocalgender, lyrics, tags support)
✅ vocalgender: Suno only (male/female/mixed, requires custommode=True)
✅ lyrics: Suno only (custom lyrics support, requires custommode=True)
✅ makeinstrumental: Suno only (force instrumental, no vocals)
✅ autolyrics: Suno only (AI-generated lyrics)
✅ tags: Suno only (genre/style tags)
✅ negativetags: Suno only (exclude unwanted styles)
✅ title: Suno only (song title)
❌ duration: Fixed-length output (DouBao ~30s, Suno ~2min, not user-controllable)
✅ n: Multiple outputs supported (1-2), credit × n

DouBao BGM/Song (Simplified):

- ✅ prompt: Text description only
❌ No advanced parameters (no custom_mode, lyrics, vocal control)
❌ duration: Fixed ~30s output

🎵 Suno Prompt Writing Guide (for gpt_description_prompt):

When using Suno, structure your prompt with these elements:

1. Genre/Style:

- Examples: "lo-fi hip hop", "orchestral cinematic", "upbeat pop", "dark ambient", "indie folk", INLINECODE203

2. Tempo/BPM:

- Examples: "80 BPM", "fast tempo", "slow ballad", INLINECODE207

3. Vocals Control:

- No vocals: "no vocals" → set make_instrumental=true - With vocals: "female vocals" → set vocal_gender="female" - Male vocals: "male vocals" → set vocal_gender="male" - Mixed: Set INLINECODE214

4. Mood/Emotion:

- Examples: "happy and energetic", "melancholic", "tense and dramatic", INLINECODE218

5. Negative Tags (exclude styles):

- Use negative_tags: "heavy metal, distortion, screaming" to exclude unwanted elements

6. Duration Hint:

- Examples: "60 seconds", "30 second loop", "2 minute track" - Note: Suno typically generates ~2min, not strictly controllable

Example Suno prompts:
CODEBLOCK14

⚠️ Technical Note for Suno:

INLINECODE224 inside parameters.parameters (e.g., "sonic-v5") is different from the outer model_version field (which is "sonic"). Always set both correctly.

Common Parameter Patterns

- n (batch generation): Supported by ALL models. Cost = basecredit × n. Creates n independent resources.
seed: Supported by most models (-1 = random, >0 = reproducible results)
promptextend: AI-powered prompt enhancement (video models only)

Decision Tree: When User Requests Unsupported Features

CODEBLOCK15

When user requests unsupported combinations:

- Video + audio (unsupported model) → "该模型不支持音频。建议用 Veo 3.1 或 Kling O1 (支持 generate_audio 参数)"
Music + custom duration → "音乐时长由模型固定(Suno约2分钟,DouBao约30秒),无法自定义"
Video duration > 15s → "当前最长15秒。可选模型：Wan 2.6(15s, 75积分), Kling O1(10s, 96积分)"

Note: Image-specific unsupported combinations (Midjourney + aspect_ratio, 8K, non-standard ratios) are documented in the "Image Models" section above.

🧠 User Preference Memory (Video)

User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.

Storage: `~/.openclaw/memory/ima_prefs.json`

CODEBLOCK16

Model Selection Flow (Video Generation)

Step 1: Get knowledge-ai recommendation (if installed)
CODEBLOCK17

Step 2: Check user preference
CODEBLOCK18

Step 3: Decide which model to use
CODEBLOCK19

Step 4: Check for mismatch (for later hint)
CODEBLOCK20

When to Write (User Explicit Preference ONLY)

✅ Save preference when user explicitly specifies a model:

User says	Action
INLINECODE230 / `换成XXX` / INLINECODE232	Switch to model XXX + save as preference
INLINECODE233 / `默认用XXX` / INLINECODE235

Save + confirm: ✅ 已记住！以后视频生成默认用 [XXX] |
| 我喜欢XXX / 我更喜欢XXX | Save as preference |

❌ Do NOT save when:

- Agent auto-selects from knowledge-ai → not user preference
Agent uses fallback default → not user preference
User says generic quality requests (see "Clear Preference" below) → clear preference instead

When to Clear (User Abandons Preference)

🗑️ Clear preference when user wants automatic selection:

User says	Action
INLINECODE239 / `用最合适的` / `best` / INLINECODE242	Clear pref + use knowledge-ai recommendation
INLINECODE243 / `你选一个` / INLINECODE245

Implementation:

del prefs[f"user_{user_id}"][task_type]
save_prefs(prefs)

⭐ Model Selection Priority (Video)

Selection flow:

1. User preference (if exists) → Highest priority, always respect
ima-knowledge-ai skill (if installed) → Professional recommendation based on task
Fallback defaults → Use table below (only if neither 1 nor 2 exists)

Important notes:

- User preference is only saved when user explicitly specifies a model (see "When to Write" above)
Knowledge-ai is always consulted (even when user pref exists) to detect mismatches
When mismatch detected → add gentle hint in success message (does NOT interrupt generation)

The defaults below are FALLBACK only. User preferences have highest priority, then knowledge-ai recommendations.

💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕

v2.0 Updates (aligned with ima-image-ai v1.3):

- Added Step 0 for correct message ordering (fixes group chat bug)
Added Step 5 for explicit task completion
Enhanced Midjourney support with proper timing estimates
Now 6 steps total (0-5): Acknowledgment → Pre-Gen → Progress → Success/Failure → Done

This skill runs inside IM platforms (Feishu, Discord via OpenClaw).
Generation takes 10 seconds (music) up to 6 minutes (video). Never let users wait in silence.
Always follow all 6 steps below, every single time.

🗣️ User-Friendly First, Transparent on Request

Default to plain-language updates in normal user flows.
If users ask for technical details, provide them transparently (script name, endpoints, and key parameters).

In standard progress messages, prioritize: model name, estimated/actual time, credits consumed, result URL, and natural-language status updates.

Estimated Generation Time (All Task Types)

Task Type	Model	Estimated Time	Poll Every	Send Progress Every
texttoimage	SeeDream 4.5	25~60s	5s	20s

Nano Banana2 💚 | 20~40s | 5s | 15s |
| | Nano Banana Pro | 60~120s | 5s | 30s |
| | Midjourney 🎨 | 40~90s | 8s | 25s |
| imagetoimage | SeeDream 4.5 | 25~60s | 5s | 20s |
| | Nano Banana2 💚 | 20~40s | 5s | 15s |
| | Nano Banana Pro | 60~120s | 5s | 30s |
| | Midjourney 🎨 | 40~90s | 8s | 25s |
| texttovideo | Wan 2.6, Hailuo 2.0/2.3, Vidu Q2, Pixverse | 60~120s | 8s | 30s |
| | SeeDance 1.5 Pro, Kling 2.6, Veo 3.1 | 90~180s | 8s | 40s |
| | Kling O1, Sora 2 Pro | 180~360s | 8s | 60s |
| imagetovideo | Same ranges as texttovideo | — | 8s | 40s |
| firstlastframe / reference | Kling O1, Veo 3.1 | 180~360s | 8s | 60s |
| texttomusic | DouBao BGM / Song | 10~25s | 5s | 10s |
| | Suno (sonic-v5) | 20~45s | 5s | 15s |
| texttospeech | (varies by model) | 5~30s | 3s | 10s |

INLINECODE251 = upper bound of the range (e.g. 60 for SeeDream 4.5, 40 for Nano Banana2, 120 for Nano Banana Pro, 90 for Midjourney, 180 for Kling 2.6, 360 for Kling O1).

Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕

⚠️ CRITICAL: This step is essential for correct message ordering in IM platforms (Feishu, Discord).

Before doing anything else, reply to the user with a friendly acknowledgment message using your normal reply (not message tool). This reply will automatically appear FIRST in the conversation.

Example acknowledgment messages:

For images:

好的!来帮你画一只萌萌的猫咪 🐱

收到！马上为你生成一张 16:9 的风景照 🏔️

CODEBLOCK24

For videos:

好的!来帮你生成一段视频 🎬

CODEBLOCK26

For music:
CODEBLOCK27

Rules:

- Keep it short and warm (< 15 words)
Match the user's language (Chinese/English)
Include relevant emoji (🐱/🎨/🎬/🎵/✨)
This is your ONLY normal reply — all subsequent updates use message tool

Why this matters:

- Normal replies automatically appear FIRST in the conversation thread
INLINECODE254 tool pushes appear in chronological order AFTER your initial reply
This ensures users see: "好的!" → "🎨 开始生成..." → "⏳ 进度..." → "✅ 成功!" (correct order)
Without Step 0, the confirmation might appear LAST, confusing users

Step 1 — Pre-Generation Notification (Push via message tool)

After Step 0 reply, use the message tool to push a notification immediately:

CODEBLOCK28

Emoji by content type:

- 图片 → INLINECODE256
视频 → 🎬（加注:视频生成需要较长时间，我会定时汇报进度）
音乐 → INLINECODE258

Cost transparency (new requirement):

- Always show credit cost with model tier context
For expensive models (>50 pts), offer cheaper alternative proactively
Examples:

- Balanced (default): "使用 Wan 2.6（25 积分，最新 Wan）"
- Premium (user explicit): "使用高端模型 Kling O1（48-120 积分），质量最佳"
- Premium (auto-selected): "使用 Wan 2.6（25 积分）。若需更高质量可选 Kling O1（48 积分起）"
- Budget (user asked): "使用 Vidu Q2（5 积分，最省钱）"

Adapt language to match the user (Chinese / English). For video, always add a note that it takes longer. For expensive models, always mention cheaper alternatives unless user explicitly requested premium.

Step 2 — Progress Updates

Poll the task detail API every [Poll Every] seconds per the table.
Send a progress update every [Send Progress Every] seconds.

CODEBLOCK29

Progress formula:
CODEBLOCK30

- Cap at 95% — never reach 100% until the API confirms INLINECODE261
If elapsed > estimated_max: freeze at 95%, append INLINECODE263
For video with max=360s: at 120s → 33%, at 250s → 69%, at 400s → 95% (frozen)

Step 3 — Success Notification

When task status = success:

For Video Tasks (texttovideo / imagetovideo / firstlastframe / reference_image)

3.1 Send video player first (IM platforms like Feishu will render inline player):
CODEBLOCK31

Important:

- Hint is non-intrusive — does NOT interrupt generation
Only shown when user pref conflicts with knowledge-ai recommendation
User can ignore the hint; video is already delivered

3.2 Then send link as text (for copying/sharing):
CODEBLOCK32

⚠️ Critical for video:

- Send video player FIRST (inline preview)
Send text link SECOND (for copying)
Include first-frame thumbnail URL if available: INLINECODE265

For Image Tasks (texttoimage / imagetoimage)

CODEBLOCK33

Important:

- Hint is non-intrusive — does NOT interrupt generation
Only shown when user pref conflicts with knowledge-ai recommendation
User can ignore the hint; image is already delivered

For Music Tasks (texttomusic)

Send audio file with player:
CODEBLOCK34

For TTS Tasks (texttospeech) — Full UX Protocol (Steps 0–5)

Step 0 — Initial acknowledgment (normal reply)
First reply with a short acknowledgment, e.g.: 好的，正在帮你把这段文字转成语音。 / OK, converting this text to speech.

Step 1 — Pre-generation (message tool)
Push once:
CODEBLOCK35

Step 2 — Progress
Poll every 2–5s. Every 10–15s send: ⏳ 语音合成中… [P]%，已等待 [elapsed]s，预计最长 [max]s. Cap progress at 95% until API returns success.

Step 3 — Success (message tool)
When resource_status == 1 and status != "failed", send media = medias[0].url and caption:

✅ 语音合成成功！
• 模型：[Model Name]
• 耗时：实际 [actual]s
• 消耗积分：[N pts]
🔗 原始链接：[url]

Use the URL from the API (do not use local file paths).

Step 4 — Failure (message tool)
On failure, send user-friendly message. TTS error translation (do not expose raw API errors):

Technical	✅ Say (CN)	✅ Say (EN)
401 Unauthorized	密钥无效或未授权，请至 imaclaw.ai 生成新密钥	API key invalid; generate at imaclaw.ai
4008 Insufficient points

Links: API key — https://www.imaclaw.ai/imaclaw/apikey ；Credits — https://www.imaclaw.ai/imaclaw/subscription

Step 5 — Done
After Step 0–4, no further reply needed. Do not send duplicate confirmations.

Step 4 — Failure Notification

When task status = failed or any API/network error, send:

CODEBLOCK37

⚠️ CRITICAL: Error Message Translation

NEVER show technical error messages to users. Always translate API errors into natural language.
API key & credits: 密钥与积分管理入口为 imaclaw.ai（与 imastudio.com 同属 IMA 平台）。Key and subscription management: imaclaw.ai (same IMA platform as imastudio.com).

Technical Error	❌ Never Say	✅ Say Instead (Chinese)	✅ Say Instead (English)
INLINECODE271 🆕	Invalid API key / 401 Unauthorized	❌ API密钥无效或未授权<br>💡 生成新密钥: https://www.imaclaw.ai/imaclaw/apikey	❌ API key is invalid or unauthorized<br>💡 Generate API Key: https://www.imaclaw.ai/imaclaw/apikey
INLINECODE272 🆕

Generic fallback (when error is unknown):

- Chinese: INLINECODE282
English: INLINECODE283

Best Practices:

1. Focus on user action: Tell users what to do next, not what went wrong technically
Be reassuring: Use phrases like "建议换个模型试试" instead of "失败了"
Avoid blame: Never say "你的提示词有问题" → say "提示词需要调整一下"
Provide alternatives: Always suggest 1-2 alternative models in the failure message
🆕 Include actionable links (v1.0.8+): For 401/4008 errors, provide clickable links to API key generation or credit purchase pages
🎵 Music-specific (v1.2.0+):

- For Suno lyrics errors, suggest simplifying lyrics or using auto-generated lyrics (auto_lyrics=true)
- For prompt length errors, give example length (e.g., "建议20-100字")
- For BGM requests, recommend DouBao BGM over Suno

7. 🔊 TTS-specific: Use the TTS error translation table in "For TTS Tasks (texttospeech)" above; suggest another model via --list-models or shortening text.

Step 5 — Done (No Further Action Needed) 🆕

After sending Step 3 (success) or Step 4 (failure):

1. DO NOT send any additional messages unless the user asks a follow-up question
The task is complete — wait for the user's next request
User preference has been saved (if generation succeeded)
The conversation is ready for the next generation request

Why this step matters:

- Prevents unnecessary "anything else?" messages that clutter the chat
Allows users to naturally continue the conversation when ready
Respects the asynchronous nature of IM platforms

Exception: If the user explicitly asks "还有别的吗？" or similar, then respond naturally.

🆕 Enhanced Error Handling (v1.0.8):

The Reflection mechanism (3 automatic retries) now provides specific, actionable suggestions for common errors:

- 401 Unauthorized: System suggests generating a new API key with clickable link
4008 Insufficient Points: System suggests purchasing credits with clickable link
500 Internal Server Error: Automatic parameter degradation (size, resolution, duration, quality)
6009 No Rule Match: Automatic parameter completion from creditrules
6010 Attribute Mismatch: Automatic creditrule reselection
Timeout: Helpful info with dashboard link for background task status

All error handling is automatic and transparent — users receive natural language explanations with next steps.

Failure fallback by task type:

Task Type	Failed Model	First Alt	Second Alt
texttoimage	SeeDream 4.5	Nano Banana2 (4pts, fast)	Nano Banana Pro (10-18pts, premium)
texttoimage

Music-specific failure guidance:

- If Suno fails → Recommend DouBao BGM (for background music) or DouBao Song (for songs)
If DouBao BGM fails → Try DouBao Song first (similar pricing), then Suno (more powerful)
If DouBao Song fails → Try DouBao BGM first (similar pricing), then Suno (more powerful)
For lyrics errors in Suno → Suggest simplifying lyrics or using INLINECODE287
For prompt length errors → Recommend 20-100 characters

TTS-specific failure guidance:

- If TTS fails → Run --task-type text_to_speech --list-models and suggest another model_id; or shorten text / simplify content. Use the TTS error translation table in "For TTS Tasks" above for user-facing messages.

Supported Models at a Glance

Source: production GET /open/v1/product/list (2026-02-27). Model count reduced significantly. Always query product list API at runtime.

Image Generation (4 models each)

Category	Name	modelid	Cost
texttoimage	SeeDream 4.5 🌟	INLINECODE290	5 pts
textto_image

Midjourney attributeids: 5451/5452 (texttoimage), 5453/5454 (imageto_image)
Nano Banana2 size options: 512px (4pts), 1K (6pts), 2K (10pts), 4K (13pts)
Nano Banana Pro size options: 1K (10pts), 2K (10pts), 4K (18pts for t2i / 10pts for i2i)

Image Model Capabilities (Parameter Support)

⚠️ Critical: Models have varying parameter support. Custom aspect ratios are now supported by multiple models.

Model	Custom Aspect Ratio	Max Resolution	Size Options	Notes
SeeDream 4.5	✅ (via virtual params)	4K (adaptive)	8 aspect ratios	Supports 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 21:9 (5 pts)
Nano Banana2

Key Capabilities:

- ✅ Aspect ratio control: SeeDream 4.5 (virtual params), Nano Banana Pro/2 (native support)
❌ 8K: Not supported by any model (max is 4K)
✅ Size control: Nano Banana2, Nano Banana Pro, and Midjourney support multiple size options via different attribute_ids

IMA AI 创作

⚠️ 重要：模型 ID 参考

关键提示： 调用脚本时，必须使用精确的 modelid（第二列），而不是友好的模型名称。请勿从友好名称推断 modelid（例如，❌ nano-banana-pro 是错误的；✅ gemini-3-pro-image 是正确的）。

快速参考表：

图像模型

友好名称	model_id	说明
Nano Banana2	gemini-3.1-flash-image	❌ 不是 nano-banana-2，预算选择 4-13 积分
Nano Banana Pro

视频模型

友好名称	modelid (文生视频)	modelid (图生视频)	说明
Wan 2.6	wan2.6-t2v	wan2.6-i2v	⚠️ 注意 -t2v/-i2v 后缀
IMA Video Pro (Sevio 1.0)

音乐模型

友好名称	model_id	说明
Suno (sonic v4)	sonic	⚠️ 简化为 sonic
DouBao BGM

语音模型

友好名称	model_id	说明
seed-tts-2.0	seed-tts-2.0	✅ 与友好名称相同（默认）

如何获取正确的 model_id：

1. 首先检查此表
使用 --list-models --task-type <类型> 查询可用模型
参考此 SKILL.md 中的命令示例

运行时真实数据源：GET /open/v1/product/list（或 --list-models）。
本文档中的任何表格仅供参考；实际可用性取决于当前产品列表。

示例：
bash

❌ 错误：从友好名称推断

--model-id nano-banana-pro

✅ 正确：使用表中的精确 model_id

--model-id gemini-3-pro-image

📚 可选知识增强 (ima-knowledge-ai)

此技能可作为独立包完整运行。
如果安装了 ima-knowledge-ai，代理可以读取其参考资料以进行工作流分解和一致性指导。

推荐的可选阅读：

1. 检查工作流复杂性 — 如果以下情况，请阅读 ima-knowledge-ai/references/workflow-design.md：

- 用户提及：MV、宣传片、完整作品、配乐、soundtrack - 任务跨越多种媒体类型（图像 + 视频、视频 + 音乐等） - 需要任务分解的复杂多步骤工作流

2. 检查视觉一致性需求 — 如果以下情况，请阅读 ima-knowledge-ai/references/visual-consistency.md：

- 用户提及：系列、多张、同一个、角色、续、series、same - 任务涉及：多张图像/视频、角色连续性、产品拍摄 - 关于同一主题的第二次及以上请求（例如，在生成旺财照片之后要求旺财在游泳）

3. 检查视频模式 — 如果以下情况，请阅读 ima-knowledge-ai/references/video-modes.md：

- 任何视频生成任务 - 需要理解：图生视频与参考图生视频的区别

4. 检查模型选择 — 如果以下情况，请阅读 ima-knowledge-ai/references/model-selection.md：

- 不确定使用哪个模型 - 需要成本/质量权衡指导 - 用户指定预算或质量要求

为什么这很重要：

- 多媒体工作流需要正确的任务排序（例如，视频时长 → 匹配音乐时长）
AI 生成每次默认独立生成 — 没有参考图像，结果将不一致
错误的视频模式 = 错误的结果（图生视频 ≠ 参考图生视频）
模型选择会显著影响成本和质量

多媒体工作流示例：

用户：帮我做个产品宣传MV，有背景音乐，主角是旺财小狗

❌ 错误：
1. 生成狗的图像（随机外观）
2. 生成视频（不同的狗）
3. 生成音乐（不相关）

✅ 正确：
1. 阅读 workflow-design.md + visual-consistency.md
2. 生成主参考：旺财小狗图片
3. 使用图生视频生成视频镜头，以旺财作为第一帧
4. 获取视频时长（例如，15秒）
5. 生成匹配时长和氛围的背景音乐

如何检查：
python

步骤 0：首先确定媒体类型（图像 / 视频 / 音乐 / 语音）

从用户请求：画/生成图/image → 图像；视频/video → 视频；音乐/歌/music/BGM → 音乐；语音/朗读/TTS/speech → 语音

然后从相应部分选择 tasktype 和模型（图像：texttoimage/imagetoimage；视频：texttovideo/...；音乐：texttomusic；语音：textto_speech）

步骤 1：根据任务类型读取知识库

if multimediaworkflow: read(~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md)

if same subject or series or character:
read(~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md)

if video_generation:
read(~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md)

步骤 2：使用正确的排序和参考图像执行

（具体模式请参见 workflow-design.md）

无例外 — 对于简单的单媒体请求，可以直接进行。对于复杂的多媒体工作流，请先阅读知识库。

📥 用户输入解析（媒体类型和任务路由）

目的： 使任何代理都能一致地解析用户意图，首先从用户请求中确定媒体类型，然后选择 task_type 和模型。

1. 用户措辞 → 媒体类型（首先执行此操作）

用户意图 / 关键词	媒体类型	tasktype 示例
画 / 生成图 / 图片 / image / 画一张 / 图生图	图像	texttoimage, imagetoimage
视频 / 生成视频 / video / 图生视频 / 文生视频

视频 | texttovideo, imagetovideo, firstlastframetovideo, referenceimagetovideo | | 音乐 / 歌 / BGM / 背景音乐 / music

IMA StudioIMA工作室

IMA Studio

IMA AI Creation

⚠️ 重要：模型 ID 参考

图像模型 (Image Models)

视频模型 (Video Models)

音乐模型 (Music Models)

语音模型 (Speech/TTS Models)

📚 Optional Knowledge Enhancement (ima-knowledge-ai)

📥 User Input Parsing (Media Type & Task Routing)

1. User phrasing → media type (do this first)

2. Model and parameter parsing

⚙️ How This Skill Works

🌐 Network Endpoints Used

⚠️ Credential Security Notice

Agent Execution (Internal Reference)

Overview

🔒 Security & Transparency Policy

✅ What Users CAN Do

⚠️ Advanced Users: Fork & Modify

❌ What to AVOID (Security Risks)

📋 Privacy & Data Handling Summary

🔧 For Skill Maintainers Only

🧠 User Preference Memory (Image)

Storage: ~/.openclaw/memory/ima_prefs.json

Model Selection Flow (Image Generation)

When to Write (User Explicit Preference ONLY)

When to Clear (User Abandons Preference)

⭐ Model Selection Priority (Image)

Preference Change Confirmation

⭐ Recommended Defaults

Image Models (texttoimage / imagetoimage)

Video Models (texttovideo / imagetovideo / firstlastframe / reference_image)

🆕 Special Case: Pixverse Model Parameter (v1.0.7+)

Music Models (texttomusic)

Common Parameter Patterns

Decision Tree: When User Requests Unsupported Features

🧠 User Preference Memory (Video)

Storage: ~/.openclaw/memory/ima_prefs.json

Model Selection Flow (Video Generation)

When to Write (User Explicit Preference ONLY)

When to Clear (User Abandons Preference)

⭐ Model Selection Priority (Video)

💬 User Experience Protocol (IM / Feishu / Discord) v2.0 🆕

🗣️ User-Friendly First, Transparent on Request

Estimated Generation Time (All Task Types)

Step 0 — Initial Acknowledgment Reply (Normal Reply) 🆕

Step 1 — Pre-Generation Notification (Push via message tool)

Step 2 — Progress Updates

Step 3 — Success Notification

For Video Tasks (texttovideo / imagetovideo / firstlastframe / reference_image)

For Image Tasks (texttoimage / imagetoimage)

For Music Tasks (texttomusic)

For TTS Tasks (texttospeech) — Full UX Protocol (Steps 0–5)

Step 4 — Failure Notification

Step 5 — Done (No Further Action Needed) 🆕

Supported Models at a Glance

Image Generation (4 models each)

Image Model Capabilities (Parameter Support)

IMA AI 创作

⚠️ 重要：模型 ID 参考

图像模型

视频模型

音乐模型

语音模型

❌ 错误：从友好名称推断

✅ 正确：使用表中的精确 model_id

📚 可选知识增强 (ima-knowledge-ai)

步骤 0：首先确定媒体类型（图像 / 视频 / 音乐 / 语音）

从用户请求：画/生成图/image → 图像；视频/video → 视频；音乐/歌/music/BGM → 音乐；语音/朗读/TTS/speech → 语音

然后从相应部分选择 tasktype 和模型（图像：texttoimage/imagetoimage；视频：texttovideo/...；音乐：texttomusic；语音：textto_speech）

步骤 1：根据任务类型读取知识库

步骤 2：使用正确的排序和参考图像执行

（具体模式请参见 workflow-design.md）

📥 用户输入解析（媒体类型和任务路由）

1. 用户措辞 → 媒体类型（首先执行此操作）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

Storage: `~/.openclaw/memory/ima_prefs.json`

Storage: `~/.openclaw/memory/ima_prefs.json`