Monet AI Skill

Comprehensive AI content generation API designed for AI agents. Monet AI provides unified access to state-of-the-art AI generation models for video (Sora, Veo, Doubao Seedance, Wan, Hailuo, Kling), image (GPT-4o, Nano Banana, Seedream, Flux, Imagen, Ideogram), and music (MiniMax Music) generation. Build intelligent workflows that combine multiple AI capabilities for automated content creation pipelines.

When to Use

Use this skill when:

- Video Generation: Create AI-generated videos from text prompts using state-of-the-art models

- Sora: OpenAI's video generation model for high-quality, realistic videos - Veo: Google's video generation model - Doubao Seedance: ByteDance's AI video model with audio-visual sync - Wan: Alibaba's video generation model with excellent localization support - Hailuo: Fast video generation with good quality-speed balance - Kling: Kuaishou's video generation model

- Image Generation: Generate images from text descriptions with various artistic styles

- GPT-4o: OpenAI's multimodal model for image generation - Nano Banana: Google's image model with ultra-high character consistency - Seedream: ByteDance's intelligent visual reasoning model - Wan: Alibaba's visual model for high-quality and expressive image generation - Flux: High-quality photorealistic and artistic image generation - Imagen: Google's text-to-image model - Ideogram: Specialized in text rendering and precise composition

- Music Generation: Create original music and audio from text descriptions

- MiniMax Music: AI music generation with support for custom lyrics and text-to-music conversion

- AI Agent Integration: Build intelligent workflows that combine multiple AI generation capabilities for automated content creation pipelines

Getting API Key

1. Visit https://monet.vision to register an account
After login, go to https://monet.vision/skills/keys to create an API Key
Configure the API Key in environment variables or code

If you don't have an API Key, ask your owner to apply at monet.vision.

Quick Start

Create a Video Generation Task

CODEBLOCK0

⚠️ Important: idempotency_key is required. Use a unique value (e.g., UUID) to prevent duplicate task creation if the request is retried.

Response:

CODEBLOCK1

Get Task Status and Result

Task processing is asynchronous. You need to poll the task status until it becomes success or failed. Recommended polling interval: 5 seconds.

CODEBLOCK2

Response when completed:

CODEBLOCK3

Example: Poll until completion

CODEBLOCK4

Supported Models

Video Generation

Sora (OpenAI)

sora-2 - Sora 2

OpenAI latest video generation model

- 🎯 Use Cases: Video projects requiring OpenAI's latest technology
⏱️ Duration: 10-15 seconds
🎵 Features: Audio generation support, reference image support

CODEBLOCK5

sora-2-pro - Sora 2 Pro

Perfect quality for cinematic scenes

- 🎯 Use Cases: Professional film, advertising, and high-end production
⏱️ Duration: 15-25 seconds
🎵 Features: Audio generation support, reference image support

CODEBLOCK6

Veo (Google)

veo-3-1-fast - Google Veo 3.1 Fast

Ultra-fast video generation

- 🎯 Use Cases: Video projects requiring fast generation
⏱️ Duration: 8 seconds
📺 Resolution: 1080p with audio generation support

CODEBLOCK7

veo-3-1 - Google Veo 3.1

Advanced AI video with sound

- 🎯 Use Cases: Professional-grade video production
⏱️ Duration: 8 seconds
📺 Resolution: 1080p with audio generation support

CODEBLOCK8

veo-3-fast - Google Veo 3 Fast

30% faster than standard Veo 3

- 🎯 Use Cases: Video projects requiring rapid iteration
⏱️ Duration: 8 seconds
📺 Resolution: 1080p, supports negative prompts

CODEBLOCK9

veo-3 - Google Veo 3

High-quality video generation

- 🎯 Use Cases: Standard high-quality video production
⏱️ Duration: 8 seconds
📺 Resolution: 1080p, supports negative prompts

CODEBLOCK10

Wan

wan-2-6 - Wan 2.6

Multi-shot and automatic audio

- 🎯 Use Cases: Video production requiring multi-shot switching
⏱️ Duration: 5-15 seconds
📺 Resolution: 720p-1080p with audio generation support

CODEBLOCK11

wan-2-5 - Wan 2.5

Supports automatic audio generation

- 🎯 Use Cases: Quickly generating videos with audio
⏱️ Duration: 5-10 seconds
📺 Resolution: 480p-1080p with audio support

CODEBLOCK12

wan-2-2-flash - Wan 2.2 Flash

Instruction understanding, controllable camera movement

- 🎯 Use Cases: Scenarios requiring precise camera movement control
⏱️ Duration: 5-10 seconds
📺 Resolution: 480p-1080p

CODEBLOCK13

wan-2-2 - Wan 2.2

Excellent image details, strong motion stability

- 🎯 Use Cases: Video production requiring high stability
⏱️ Duration: 5-10 seconds
📺 Resolution: 480p-1080p

CODEBLOCK14

Kling

kling-2-6 - Kling 2.6

Cinematic videos and audio

- 🎯 Use Cases: Cinematic video production
⏱️ Duration: 5-10 seconds
✨ Features: Strong visual realism, audio generation support

CODEBLOCK15

kling-2-5 - Kling 2.5 Turbo

Smooth motion, stronger consistency

- 🎯 Use Cases: Video production requiring high consistency
⏱️ Duration: 5-10 seconds
✨ Features: Supports negative prompts

CODEBLOCK16

kling-v2-1-master - Kling 2.1 Master

Strong visual realism with enhanced features

- 🎯 Use Cases: Professional-grade high-quality video production
⏱️ Duration: 5-10 seconds
✨ Features: Strength adjustment support, negative prompts

CODEBLOCK17

kling-v2-1 - Kling 2.1

Strong visual realism

- 🎯 Use Cases: High-realism video production
⏱️ Duration: 5-10 seconds
✨ Features: Strength adjustment, negative prompts

CODEBLOCK18

kling-v2 - Kling 2.0

Excellent aesthetics

- 🎯 Use Cases: Artistic creation and aesthetically-oriented videos
⏱️ Duration: 5-10 seconds
✨ Features: Strength adjustment, negative prompts

CODEBLOCK19

Hailuo

hailuo-2-3 - Hailuo 2.3

Excellent body movements and physics performance

- 🎯 Use Cases: Videos requiring realistic physics effects
⏱️ Duration: 6-10 seconds
📺 Resolution: 768p-1080p, extreme physics simulations

CODEBLOCK20

hailuo-2-3-fast - Hailuo 2.3 Fast

Fast generation speed

- 🎯 Use Cases: Projects requiring rapid iteration
⏱️ Duration: 6-10 seconds
📺 Resolution: 768p-1080p

CODEBLOCK21

hailuo-02 - Hailuo 02

Extreme physics simulations

- 🎯 Use Cases: Scenarios requiring accurate physics simulation
⏱️ Duration: 6-10 seconds
📺 Resolution: 768p-1080p

CODEBLOCK22

hailuo-01-live2d - Hailuo 01 Live2d

Hailuo Live2D model

- 🎯 Use Cases: 2D character animation production
✨ Features: Suitable for 2D character animation

CODEBLOCK23

hailuo-01 - Hailuo 01

Highest video quality

- 🎯 Use Cases: Video production requiring ultimate quality
✨ Features: Suitable for high-quality needs

CODEBLOCK24

Doubao Seedance

doubao-seedance-1-5-pro - Seedance 1.5 Pro

Pro-grade audio-visual sync

- 🎯 Use Cases: Professional production requiring audio-visual sync
⏱️ Duration: 4-12 seconds
📺 Resolution: 480p-720p with audio generation support

CODEBLOCK25

doubao-seedance-1-0-pro-fast - Seedance 1.0 Pro Fast

Premium quality & unbeatable efficiency

- 🎯 Use Cases: Scenarios requiring fast high-quality output
⏱️ Duration: 2-12 seconds
📺 Resolution: 720p-1080p, ByteDance's next-gen AI video model

CODEBLOCK26

doubao-seedance-1-0-pro - Seedance 1.0 Pro

Stable motion performance

- 🎯 Use Cases: Video production requiring stable motion
⏱️ Duration: 5-10 seconds
📺 Resolution: 480p-1080p

CODEBLOCK27

doubao-seedance-1-0-lite - Seedance 1.0 Lite

Precise semantic understanding

- 🎯 Use Cases: Scenarios requiring precise semantic understanding
⏱️ Duration: 5-10 seconds
📺 Resolution: 480p-1080p

CODEBLOCK28

Special Features

kling-motion-control - Kling Motion Control

Precision motion control via video references

- 🎯 Use Cases: Scenarios requiring motion replication from reference videos
⏱️ Duration: 3-30 seconds
📺 Resolution: 720p/1080p with audio generation support
💰 Pricing: 720p: 8 credits/s, 1080p: 15 credits/s

CODEBLOCK29

runway-act-two - Runway Act Two

Runway Next-Generation Motion Capture Model

- 🎯 Use Cases: Capturing motion from videos and applying to new characters
⏱️ Duration: 3-30 seconds
✨ Features: Motion transfer support
💰 Pricing: 10 credits/second

CODEBLOCK30

wan-animate-mix - Wan Animate Mix (Standard)

Perfect for character replacement scenarios

- 🎯 Use Cases: Video character replacement
⏱️ Duration: 3-30 seconds
✨ Features: Replace characters in videos with specified image characters
💰 Pricing: 10 credits/second

CODEBLOCK31

wan-animate-mix-pro - Wan Animate Mix Pro (Professional)

High animation fluidity with better results

- 🎯 Use Cases: Professional-grade video character replacement
⏱️ Duration: 3-30 seconds
✨ Features: Higher quality character replacement effects
💰 Pricing: 20 credits/second

CODEBLOCK32

wan-animate-move - Wan Animate Move (Standard)

Replicate dance and challenging body movements

- 🎯 Use Cases: Motion capture and transfer
⏱️ Duration: 3-30 seconds
✨ Features: Apply motion from reference videos to target images
💰 Pricing: 10 credits/second

CODEBLOCK33

wan-animate-move-pro - Wan Animate Move Pro (Professional)

High animation fluidity with better results

- 🎯 Use Cases: Professional-grade motion capture and transfer
⏱️ Duration: 3-30 seconds
✨ Features: Higher quality motion transfer effects
💰 Pricing: 20 credits/second

CODEBLOCK34

Image Generation

GPT (OpenAI)

gpt-4o - GPT 4o

Accurate, realistic output

- 🎯 Use Cases: High-quality, photorealistic image generation
✨ Features: Supports multiple reference images, multiple aspect ratios, customizable style

CODEBLOCK35

gpt-image-1-5 - GPT Image 1.5

True-color precision rendering

- 🎯 Use Cases: Professional image generation requiring color accuracy
✨ Features: Supports up to 10 reference images, adjustable quality

CODEBLOCK36

Nano Banana (Google)

nano-banana-1 - Google Nano Banana

Ultra-high character consistency

- 🎯 Use Cases: Image series requiring consistent character appearance
✨ Features: Supports up to 5 reference images, multiple aspect ratio options

CODEBLOCK37

nano-banana-1-pro - Nano Banana Pro

Google's flagship generation model

- 🎯 Use Cases: Professional-grade high-quality image generation
✨ Features: Supports 1K-4K resolution, up to 14 reference images, ultra-wide 21:9

CODEBLOCK38

nano-banana-2 - Nano Banana 2

Google Gemini latest model

- 🎯 Use Cases: Latest technology for high-quality image generation
✨ Features: Supports 1K-4K resolution, up to 14 reference images, ultra-wide 8:1 ratio

CODEBLOCK39

Wan

wan-i-2-6 - Wan 2.6

High-quality and expressive

- 🎯 Use Cases: Creative image generation requiring high expressiveness
✨ Features: Supports up to 4 reference images, ultra-wide 21:9

CODEBLOCK40

wan-2-5 - Wan 2.5

Fast, creative image generation

- 🎯 Use Cases: Quick creation and iteration
✨ Features: Supports up to 2 reference images, ultra-wide 21:9

CODEBLOCK41

Seedream (ByteDance)

seedream-5-0 - Seedream 5.0 Lite

Intelligent visual reasoning

- 🎯 Use Cases: Complex scenarios requiring intelligent understanding and reasoning
✨ Features: 2K-3K resolution, up to 14 reference images, ultra-wide 21:9

CODEBLOCK42

seedream-4-5 - Seedream 4.5

ByteDance's 4K image model

- 🎯 Use Cases: High-resolution professional image generation
✨ Features: 2K-4K resolution, up to 14 reference images, ultra-wide 21:9

CODEBLOCK43

seedream-4-0 - Seedream 4.0

Support images with cohesive styles

- 🎯 Use Cases: Image series requiring consistent style
✨ Features: Supports up to 10 reference images

CODEBLOCK44

Flux (Black Forest Labs)

flux-2-dev - Flux.2 Dev

Photorealistic output

- 🎯 Use Cases: Image generation requiring high photorealism
✨ Features: Model by Black Forest Labs, multiple aspect ratio options

CODEBLOCK45

flux-kontext-pro - Flux Kontext Pro

Perfect for editing, compositing

- 🎯 Use Cases: Professional image editing and compositing work
✨ Features: Supports reference images, customizable style

CODEBLOCK46

flux-kontext-max - Flux Kontext Max

Excellent for prompt accuracy

- 🎯 Use Cases: Scenarios requiring precise control of generation results
✨ Features: Supports reference images, customizable style

CODEBLOCK47

flux-1-schnell - Flux Schnell

Suitable for simple basic scenes

- 🎯 Use Cases: Quick prototyping and simple scenarios
✨ Features: Fast generation speed

CODEBLOCK48

Imagen (Google)

imagen-3-0 - Imagen 3.0

Fast, high-quality results

- 🎯 Use Cases: Fast high-quality image generation
✨ Features: Google's advanced image model, customizable style

CODEBLOCK49

imagen-4-0 - Imagen 4.0

Google's latest generation model

- 🎯 Use Cases: High-quality images requiring latest technology
✨ Features: Higher quality and precision, customizable style

CODEBLOCK50

Ideogram

ideogram-v2 - Ideogram V2

Highly recommended for text editing

- 🎯 Use Cases: Scenarios requiring text in images
✨ Features: Excellent text rendering performance

CODEBLOCK51

ideogram-v3 - Ideogram V3

Outstanding design capabilities

- 🎯 Use Cases: First choice for designers and creative professionals
✨ Features: Better text rendering and typography

CODEBLOCK52

Stability AI

stability-1-0 - Stability 1.0

Perfect for generating detailed images

- 🎯 Use Cases: Image generation requiring fine control and high detail
✨ Features: Supports negative prompts, customizable style

CODEBLOCK53

Music Generation

minimax-music - MiniMax Music

AI music generation from text with custom lyrics support

- 🎯 Provider: MiniMax
✨ Features: Text-to-music conversion, supports custom lyrics
🎵 Use Cases: Music creation from text descriptions or lyrics

CODEBLOCK54

API Reference

Create Task (Async)

POST /api/v1/tasks/async - Create an async task. Returns immediately with task ID.

Request:

CODEBLOCK55

⚠️ Important: idempotency_key is required. Use a unique value (e.g., UUID) to prevent duplicate task creation if the request is retried.

Response:

CODEBLOCK56

Create Task (Streaming)

POST /api/v1/tasks/sync - Create a task with SSE streaming. Waits for completion and streams progress.

Request:

CODEBLOCK57

Get Task

GET /api/v1/tasks/{taskId} - Get task status and result.

Request:

CODEBLOCK58

Response:

CODEBLOCK59

List Tasks

GET /api/v1/tasks/list - List tasks with pagination.

Request:

CODEBLOCK60

Response:

CODEBLOCK61

Upload File

POST /api/v1/files - Upload a file to get an online access URL.

📁 File Storage: Uploaded files are stored for 24 hours and will be automatically deleted after expiration.

Request:

CODEBLOCK62

Use Cases:

- Upload reference images for video/image generation tasks
Upload video files for video processing
Upload audio files for music tasks
Get temporary online URLs for file sharing

Response:

CODEBLOCK63

Configuration

Environment Variables

CODEBLOCK64

Authentication

All API requests require authentication via the Authorization header:

CODEBLOCK65

Monet AI 技能

专为AI代理设计的综合性AI内容生成API。Monet AI提供对最先进AI生成模型的统一访问，涵盖视频（Sora、Veo、豆包Seedance、万相、海螺、可灵）、图像（GPT-4o、Nano Banana、Seedream、Flux、Imagen、Ideogram）和音乐（MiniMax Music）生成。构建结合多种AI能力的智能工作流，实现自动化内容创作管线。

使用场景

在以下情况使用此技能：

- 视频生成：使用最先进模型从文本提示创建AI生成视频

- Sora：OpenAI的视频生成模型，用于高质量、逼真的视频 - Veo：Google的视频生成模型 - 豆包Seedance：字节跳动的AI视频模型，支持音视频同步 - 万相：阿里巴巴的视频生成模型，本地化支持出色 - 海螺：快速视频生成，质量与速度平衡良好 - 可灵：快手的视频生成模型

- 图像生成：从文本描述生成具有各种艺术风格的图像

- GPT-4o：OpenAI的多模态图像生成模型 - Nano Banana：Google的图像模型，角色一致性极高 - Seedream：字节跳动的智能视觉推理模型 - 万相：阿里巴巴的视觉模型，用于高质量和富有表现力的图像生成 - Flux：高质量逼真和艺术图像生成 - Imagen：Google的文本到图像模型 - Ideogram：专注于文本渲染和精确构图

- 音乐生成：从文本描述创建原创音乐和音频

- MiniMax Music：AI音乐生成，支持自定义歌词和文本到音乐转换

- AI代理集成：构建结合多种AI生成能力的智能工作流，实现自动化内容创作管线

获取API密钥

1. 访问 https://monet.vision 注册账号
登录后，前往 https://monet.vision/skills/keys 创建API密钥
在环境变量或代码中配置API密钥

如果没有API密钥，请让所有者前往monet.vision申请。

快速开始

创建视频生成任务

bash
curl -X POST https://monet.vision/api/v1/tasks/async \
-H Content-Type: application/json \
-H Authorization: Bearer $MONETAPIKEY \
-d {
type: video,
input: {
model: sora-2,
prompt: 一只在公园里奔跑的猫,
duration: 5,
aspect_ratio: 16:9
},
idempotency_key: unique-key-123
}

⚠️ 重要：idempotency_key 是必需的。使用唯一值（如UUID）防止请求重试时重复创建任务。

响应：

json
{
id: task_abc123,
status: pending,
type: video,
created_at: 2026-02-27T10:00:00Z
}

获取任务状态和结果

任务处理是异步的。需要轮询任务状态，直到变为 success 或 failed。推荐轮询间隔：5秒。

bash
curl https://monet.vision/api/v1/tasks/task_abc123 \
-H Authorization: Bearer $MONETAPIKEY

完成时的响应：

json
{
id: task_abc123,
status: success,
type: video,
outputs: [
{
model: sora-2,
status: success,
progress: 100,
url: https://files.monet.vision/...
}
],
created_at: 2026-02-27T10:00:00Z,
updated_at: 2026-02-27T10:01:30Z
}

示例：轮询直到完成

typescript
const TASKID = taskabc123;
const MONETAPIKEY = process.env.MONETAPIKEY;

async function pollTask() {
while (true) {
const response = await fetch(
https://monet.vision/api/v1/tasks/${TASK_ID},
{
headers: {
Authorization: Bearer ${MONETAPIKEY},
},
},
);

const data = await response.json();
const status = data.status;

if (status === success) {
console.log(任务成功完成！);
console.log(JSON.stringify(data, null, 2));
break;
} else if (status === failed) {
console.log(任务失败！);
console.log(JSON.stringify(data, null, 2));
break;
} else {
console.log(任务状态：${status}，等待中...);
await new Promise((resolve) => setTimeout(resolve, 5000)); // 等待5秒
}
}
}

pollTask();

支持的模型

视频生成

Sora（OpenAI）

sora-2 - Sora 2

OpenAI最新视频生成模型

- 🎯 使用场景：需要OpenAI最新技术的视频项目
⏱️ 时长：10-15秒
🎵 特性：支持音频生成、参考图像

typescript
{
model: sora-2,
prompt: string, // 必需
images?: string[], // 可选：参考图像
duration?: 10 | 15, // 可选，默认：10
aspect_ratio?: 16:9 | 9:16
}

sora-2-pro - Sora 2 Pro

电影级场景的完美画质

- 🎯 使用场景：专业电影、广告和高端制作
⏱️ 时长：15-25秒
🎵 特性：支持音频生成、参考图像

typescript
{
model: sora-2-pro,
prompt: string,
images?: string[],
duration?: 15 | 25, // 可选，默认：15
aspect_ratio?: 16:9 | 9:16
}

Veo（Google）

veo-3-1-fast - Google Veo 3.1 Fast

超快速视频生成

- 🎯 使用场景：需要快速生成的视频项目
⏱️ 时长：8秒
📺 分辨率：1080p，支持音频生成

typescript
{
model: veo-3-1-fast,
prompt: string,
images?: string[], // 参考图像
aspect_ratio?: 16:9 | 9:16
}

veo-3-1 - Google Veo 3.1

带声音的高级AI视频

- 🎯 使用场景：专业级视频制作
⏱️ 时长：8秒
📺 分辨率：1080p，支持音频生成

typescript
{
model: veo-3-1,
prompt: string,
images?: string[],
aspect_ratio?: 16:9 | 9:16
}

veo-3-fast - Google Veo 3 Fast

比标准Veo 3快30%

- 🎯 使用场景：需要快速迭代的视频项目
⏱️ 时长：8秒
📺 分辨率：1080p，支持负面提示

typescript
{
model: veo-3-fast,
prompt: string,
images?: string[],
negative_prompt?: string // 指定不想要的内容
}

veo-3 - Google Veo 3

高质量视频生成

- 🎯 使用场景：标准高质量视频制作
⏱️ 时长：8秒
📺 分辨率：1080p，支持负面提示

typescript
{
model: veo-3,
prompt: string,
images?: string[],
negative_prompt?: string
}

万相

wan-2-6 - 万相 2.6

多镜头和自动音频

- 🎯 使用场景：需要多镜头切换的视频制作
⏱️ 时长：5-15秒
📺 分辨率：720p-1080p，支持音频生成

typescript
{
model: wan-2-6,
prompt: string,
images?: string[],
duration?: 5 | 10 | 15,
resolution?: 720p | 1080p,
aspect_ratio?: 16:9 | 9:16 | 4:3 | 3

monet-ai-skillMonet AI 技能

monet-ai-skill

Monet AI Skill

When to Use

Getting API Key

Quick Start

Create a Video Generation Task

Get Task Status and Result

Supported Models

Video Generation

Sora (OpenAI)

Veo (Google)

Wan

Kling

Hailuo

Doubao Seedance

Special Features

Image Generation

GPT (OpenAI)

Nano Banana (Google)

Wan

Seedream (ByteDance)

Flux (Black Forest Labs)

Imagen (Google)

Ideogram

Stability AI

Music Generation

API Reference

Create Task (Async)

Create Task (Streaming)

Get Task

List Tasks

Upload File

Configuration

Environment Variables

Authentication

Monet AI 技能

使用场景

获取API密钥

快速开始

创建视频生成任务

获取任务状态和结果

支持的模型

视频生成

Sora（OpenAI）

Veo（Google）

万相

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement