Agent setup: If your agent doesn't auto-load skills (e.g. Claude Code),
see agent-compatibility.md once per session.
Qwen Model Selector (Advisor)
This skill operates in two modes:
- 1. Interactive advisory — asks diagnostic questions to recommend the right model (see Diagnostic Flow).
- Cross-skill resolution — provides a fast-path model lookup for execution skills that need a model decision
without user interaction (see Cross-Skill Model Resolution).
Do not fabricate model names — only recommend models listed in this skill.
This skill is part of qwencloud/qwencloud-ai.
Skill directory
Use this skill's reference files for data and learning. Load on demand — do not fetch external URLs unless the user
explicitly asks for latest data.
| Location | Purpose |
|---|
| INLINECODE0 | Pricing overview — model categories, billing units, and link to official pricing page |
| INLINECODE1 |
Model catalog (point-in-time snapshot) |
|
references/sources.md | Official documentation URLs (manual lookup only) |
|
references/agent-compatibility.md | Agent self-check: register skills in project config for agents that don't auto-load |
Security
NEVER output any API key or credential in plaintext. Always use variable references ($DASHSCOPE_API_KEY in shell,
os.environ["QWEN_API_KEY"] in Python). Any check or detection of credentials must be non-plaintext: report
only status (e.g. "set" / "not set", "valid" / "invalid"), never the value. Never display contents of .env or config
files that may contain secrets.
Coding Plan Models
Users with a Coding Plan subscription have access to a
limited set of models through their coding tools only:
| Model | Context | Thinking |
|---|
| qwen3.5-plus | 1M | Yes (budget: 81,920) |
| kimi-k2.5 |
256K | Yes (budget: 81,920) |
| glm-5 | 198K | Yes (budget: 32,768) |
| MiniMax-M2.5 | 192K | Yes (budget: 32,768) |
| qwen3-max-2026-01-23 | 256K | Yes (budget: 81,920) |
| qwen3-coder-next | 256K | No |
| qwen3-coder-plus | 1M | No |
| glm-4.7 | 198K | Yes (budget: 32,768) |
Coding Plan does not include image, video, TTS, or specialized vision models. When recommending models, note if the
user's chosen model falls outside this list and they are using a Coding Plan key (sk-sp-...). If qwencloud-ops-auth is
installed, see its references/codingplan.md for the full model list and error codes.
Diagnostic Flow
Ask the user (in order):
- 1. Content type? — text / image / video / audio / vision
- Primary task? — generation / understanding / coding / reasoning / translation
- Priority? — quality vs speed vs cost
- Input size? — short / medium / long context
- Structured output? — JSON / function calling needed?
Cross-Skill Model Resolution
When an execution skill needs to choose a model, evaluate across three dimensions: Requirement → Scenario →
Pricing. If the user explicitly specified a model, use it as given — but still verify availability; if
restricted, warn the user and suggest an alternative.
Dimension 1 · Requirement (select)
Match task capability to the right model. Use when the user's need points to a specialized model, or when the task is
ambiguous and you need to compare capabilities.
| Signal | Keywords | Model |
|---|
| Reasoning | "think step by step", "reason", "analyze" | qwq-plus (text) · qvq-max (vision) |
| Coding |
"write code", "implement", "debug" | qwen3-coder-plus |
| OCR / document | "extract text", "OCR", "scan" | qwen-vl-ocr |
| Long context | "long document", "large file" | qwen3.5-plus (1M context) |
| Multimodal (text+image+video) | "analyze image", "understand video" + text | qwen3.5-plus (unified multimodal) |
| Voice interaction / omni | "voice chat", "speak", "listen" | qwen3-omni-flash |
| Built-in tools | "search the web", "run code", "use tools" | qwen3-max (web search, code interpreter) |
| Image editing / style transfer | "edit image", "style transfer", "reference image" | wan2.6-image (preferred) · wan2.5-i2i-preview |
| Image-to-image fusion | "place object", "combine images", "fuse images" | wan2.6-image · wan2.5-i2i-preview |
| Style TTS | "emotion", "tone", "pace" | qwen3-tts-instruct-flash |
| Ambiguous | task doesn't clearly map to one model | compare Recommendation Matrix; ask user to clarify if needed |
Dimension 2 · Scenario (tune)
Adjust model tier based on how the model will be used.
| Pattern | Signals | Guidance |
|---|
| Interactive / real-time | "chat", "real-time", "interactive" | Prefer flash/turbo variants; enable streaming |
| Batch / offline |
"batch", "offline", "background" | Quality model + Batch API (50% off) |
| One-off trial | "try", "test", "experiment" | Quality model; check if free quota is still available in user's console |
| High-volume production | "production", "at scale", "high volume" | Cost-optimize: flash/turbo + context cache |
| Repeated context | "template", "same prompt", "repeated" | Enable context caching for input token discount |
Dimension 3 · Pricing (optimize)
Given the candidates from dimensions 1–2, compare costs and apply modifiers.
- - Pricing reference: pricing.md. For the latest rates, check
the
official pricing page.
- - Free quota: Some models offer a limited free quota after activation. However, quotas may have been consumed,
expired, or changed.
Never assume remaining free quota — always present the paid unit price.
- - Batch API: 50% off both input and output tokens for non-realtime workloads.
- Context cache: Input token discount for repeated/templated contexts.
- Tiered pricing: Some models charge more per token as input length increases — check pricing tables for
breakpoints.
- - When cost is the user's primary concern, explicitly recommend the cheapest viable model and cite the price.
Default
No signals detected, clear task → use the Canonical Default for the domain.
| Domain | Default | Quality | Speed | Cost |
|---|
| text.chat | qwen3.5-plus | qwen3-max | qwen3.5-flash | qwen-turbo |
| vision.analyze |
qwen3-vl-plus | qwen3-vl-plus | qwen3-vl-flash | qwen3-vl-flash |
| omni (voice+vision) | qwen3-omni-flash | qwen3-omni-flash | qwen3-omni-flash | — |
| image.generate | wan2.6-t2i | wan2.6-t2i | wan2.2-t2i-flash | wan2.2-t2i-flash |
| image.edit | wan2.6-image | wan2.6-image | wan2.5-i2i-preview | wan2.5-i2i-preview |
| video.t2v | wan2.6-t2v | wan2.6-t2v | — | — |
| video.i2v | wan2.6-i2v-flash | wan2.6-i2v | wan2.6-i2v-flash | — |
| audio.tts | qwen3-tts-flash | — | qwen3-tts-flash | — |
Degradation: If this skill is not loaded or not available, each execution skill falls back to its own built-in
default. This protocol is purely additive — it enhances model selection but never blocks execution.
Model Recommendation Matrix
Text Models
| Use Case | Recommended | Why |
|---|
| General chat/assistant | qwen3.5-plus | Best balance of quality, speed, cost. Also accepts image/video input (multimodal). Thinking enabled by default. |
| Fast responses, low cost |
qwen3.5-flash | 3x faster, 70% cheaper than Plus. Thinking enabled by default. |
| Highest quality | qwen3-max | Strongest reasoning. Built-in tools (web search, code interpreter). Supports thinking mode. |
| Code generation | qwen3-coder-next | Best balance of code quality, speed, cost. Agentic coding.
qwen3-coder-plus for highest quality. |
| Complex reasoning | qwq-plus | Chain-of-thought reasoning specialist |
| Long documents | qwen3.5-plus | Up to 1M context. For >1M needs, see
model-list.md. |
| Budget/high volume | qwen-turbo | Cheapest per-token cost |
Image Models
| Use Case | Recommended | Why |
|---|
| Best quality text-to-image | wan2.6-t2i | Latest model, sync support |
| Image editing / style transfer (1–4 refs) |
wan2.6-image | Multi-image composition, subject consistency, 2K output, interleaved text-image |
| Image editing / multi-image fusion (1–3 refs) | wan2.5-i2i-preview | Simpler prompt-based editing, subject consistency, multi-image fusion |
| Interleaved text-image output (tutorials) | wan2.6-image | Mixed text+image generation |
| Fast iteration | wan2.2-t2i-flash | 50% faster generation |
| Flexible resolution | wan2.5-t2i-preview | Custom aspect ratios |
Video Models
| Use Case | Recommended | Why |
|---|
| Quick video creation | wan2.6-i2v-flash | Fast, multi-shot narrative |
| High quality |
wan2.6-i2v | Best visual quality |
| With audio | wan2.5-i2v-preview | Auto-dubbing support |
Audio Models
| Use Case | Recommended | Why |
|---|
| Highest quality | INLINECODE10 | Best naturalness, emotional expression, professional scenarios |
| High quality + speed |
cosyvoice-v3-flash | Good balance of quality and performance |
| Standard TTS |
qwen3-tts-flash | Fast, reliable, multi-language, cost-effective |
| Controlled style |
qwen3-tts-instruct-flash | Instruction-guided voice style (tone/emotion) |
Vision Models
| Use Case | Recommended | Why |
|---|
| Best accuracy | qwen3-vl-plus | Highest vision understanding. Thinking mode supported. 256K context. |
| Fast analysis |
qwen3-vl-flash | Quick image understanding. Thinking mode supported. |
| Unified text+vision | qwen3.5-plus | Multimodal (text + image + video). Surpasses qwen3-vl series on many benchmarks. Use when both text quality and vision matter. |
Omni Models
| Use Case | Recommended | Why |
|---|
| Voice + vision chat | qwen3-omni-flash | Text/image/audio/video → text or speech. 49 voices, 10 languages. Thinking supported. |
| Real-time voice |
qwen3-omni-flash-realtime | Streaming audio input + built-in VAD. 49 voices. |
Pricing Guidance
- - Default pricing: pricing.md — International, USD.
For the latest rates, check
the
official pricing page.
- - Latest prices: When the user explicitly asks for exact/latest pricing, see sources.md for
official URLs.
- - Cost formula:
Cost = Tokens ÷ 1,000,000 × Unit price. 1K Chinese chars ≈ 1,200-1,500 tokens. - Free quota: Some models offer a limited free quota after activation — but quotas may have been consumed, expired,
or changed without notice.
Always present the paid unit price first. Mention free quota only as something the user
should verify in their
QwenCloud console.
- Use Batch calling for 50% off in non-realtime scenarios
- Enable context cache for repeated contexts
- Use flash/turbo series for non-critical tasks
Cost Estimation Disclaimer (MANDATORY)
🚨 CRITICAL — NO EXCEPTIONS: NEVER fabricate, invent, or guess any price figure. If you do not have a
confirmed price from references/pricing.md or the official pricing page, you MUST NOT output any number.
Instead, direct the user to
the official pricing page.
Outputting a made-up price is a critical failure — worse than saying "I don't know."
When responding to any cost-related query — including but not limited to price evaluation, usage estimation, budget
forecasting, or cost comparison — you MUST append a professional disclaimer. This applies regardless of language or
response format.
Required disclaimer (Chinese response):
⚠️ 费用说明:以上费用为基于官方公示单价的预估价格,仅供参考。实际费用受 Token
消耗量、上下文长度阶梯定价、Batch/缓存折扣及计费策略调整等因素影响,请以QwenCloud控制台的实际账单为准。部分模型可能提供限时免费额度,但免费额度的可用性、额度量及有效期随时可能调整,请在控制台确认您的账户是否仍有剩余额度,切勿假设本次调用免费。最新定价详见 模型定价页。
Required disclaimer (English response):
⚠️ Pricing Notice: The cost figures above are estimates calculated from officially published unit prices and
are provided for reference only. Actual charges depend on token consumption, tiered context-length pricing,
Batch/cache discounts, and billing policy updates. Some models may offer a time-limited free quota, but
quota availability, amounts, and validity periods are subject to change — do not assume this call is free. Please
verify your remaining quota in
the QwenCloud console and refer to the actual
bill for definitive costs. See Model Pricing for
the latest rates.
Rules:
- - The disclaimer must appear at the end of every cost-related response, clearly separated from the main content.
- When the estimate involves assumptions (e.g., average tokens per character, assumed context length tier), explicitly
state each assumption
used in the calculation.
- - Never present estimated costs as exact or guaranteed amounts. Use hedging language such as "approximately", "estimated
at", "roughly" (or Chinese equivalents "约", "预估", "约合") throughout the cost breakdown.
- - Never tell the user a call will be free or cost $0/¥0. Even if a free quota exists, the user may have already
consumed it. Always present the paid price and note that a free quota
may apply — subject to the user verifying in
their console.
- - If pricing data is unavailable or uncertain, say so explicitly and link to the official pricing page. Never fill
the gap with a guess.
Available Models
All standard text, vision, image, video, audio, and coding models are available. Some models offer free
quota (verify in console).
- - Text: qwen3-max, qwen3.5-plus, qwen3.5-flash, qwen-turbo, qwq-plus, qwen3-coder-next/plus/flash, qwen-plus-character, qwen-plus-character-ja, qwen-flash-character
- Vision: qwen3-vl-plus, qwen3-vl-flash, qvq-max, qwen-vl-ocr, qwen-vl-max, qwen-vl-plus
- Omni: qwen3-omni-flash (+ realtime), qwen-omni-turbo (+ realtime)
- Image generation (text-to-image): wan2.6-t2i, wan2.5-t2i-preview, wan2.2-t2i-flash, z-image-turbo
- Image editing (requires reference images): wan2.6-image, wan2.5-i2i-preview
- Video generation: wan2.6 series (t2v, i2v, i2v-flash, r2v, r2v-flash), wan2.5/2.2 series, vace
- TTS: qwen3-tts-flash, qwen3-tts-instruct-flash, cosyvoice-v3 series
- ASR: qwen3-asr-flash, fun-asr
- Embedding/Rerank: text-embedding-v4, qwen3-rerank
- Translation: qwen-mt-plus/flash/lite/turbo
⚠️ Important: The model list above is a point-in-time snapshot and may be outdated. Model availability
changes frequently. Always check the official model list
for the authoritative, up-to-date catalog before making model decisions.
See model-list.md for a more detailed local reference.
Thinking Mode
Several models support hybrid thinking/non-thinking modes:
| Model | Thinking Default | Notes |
|---|
| qwen3.5-plus | On | Thinking enabled by default. Use enable_thinking: false to disable. |
| qwen3.5-flash |
On | Thinking enabled by default. |
| qwen3-max | Off | Use
enable_thinking: true for complex reasoning. Built-in tools available in thinking mode. |
| qwen-plus / qwen-flash / qwen-turbo | Off | Hybrid; enable for deeper reasoning at higher output cost. |
| qwen3-vl-plus / qwen3-vl-flash | Off | Vision + thinking for complex visual analysis. |
| qwen3-omni-flash | Off | Thinking supported; audio output not available in thinking mode. |
| qwq-plus / qvq-max | Always on | Pure reasoning models; CoT always active. |
Guidance: Do not enable thinking by default for simple or conversational tasks — it increases latency and output
token cost. Enable only when the user explicitly asks for deep reasoning or the task requires multi-step analysis.
Anti-Patterns
- - Only recommend models listed in this skill — never fabricate model names.
- When unsure, use
qwen3.5-plus as a safe default for text tasks. - 🚨 NEVER invent or guess any price figure — only use pricing from
references/pricing.md or the
official pricing page. If the data is not
available, say so and link to the official page.
Fabricating a price is a critical failure.
- - Always cite data source when providing pricing info.
- Default currency is USD. Check
the
official pricing page for the latest rates.
Writing in Chinese does NOT imply CNY.
- - All cost estimates must include the mandatory disclaimer — use hedging language ("approximately", "estimated").
Omitting the disclaimer is a critical violation.
- - Never assume free quota is available — free quotas may have been consumed, expired, or removed. Never tell the
user a call will cost $0/¥0. Always present the paid unit price first; mention free quota only as a possibility the
user should verify in their console.
- - Do NOT proactively fetch URLs or trigger web searches — only access online sources when the user explicitly asks
for "latest" data AND reference files cannot answer.
- - Never output API keys in plaintext — see Security section above.
Online Information Sources
Reference files in this skill are point-in-time snapshots. For official documentation URLs,
see sources.md.
Do NOT proactively fetch or search online. Only access online sources when ALL of these conditions are met:
- 1. The user explicitly asks for "latest", "real-time", or "current" data.
- The reference files in this skill cannot answer the question.
- The user confirms they want an online lookup.
Update Check (MANDATORY Post-Execution)
When the user asks to check for updates (e.g. "check for updates", "check version", "is there a new version", "update
skills"):
- 1. Find qwencloud-update-check: Look for
qwencloud-update-check/SKILL.md in sibling skill directories. - If found — run:
python3 <qwencloud-update-check-dir>/scripts/check_update.py --print-response and report the result.
Use
--force if the user asks to force-check.
- 3. If not found — you MUST present the following options and wait for the user's response:
The update-check skill is not installed. It keeps you informed about new qwencloud/qwencloud-ai features and bug fixes.
1) Install now — I'll run the install command for you
2) Skip for now — do nothing this time
3) Never remind me — permanently suppress this prompt
After the user responds:
npx skills add QwenCloud/qwencloud-ai --skill qwencloud-update-check -y, then run:
python3 <this-skill-scripts-dir>/gossamer.py --dismiss
- - User chooses 2 → run:
python3 <this-skill-scripts-dir>/gossamer.py --dismiss, then continue normally - User chooses 3 → run:
python3 <this-skill-scripts-dir>/gossamer.py --never-install, then continue normally
References
- - pricing.md — Pricing overview: model categories, billing units, and link to official pricing
page
check
official model list for latest)
- - sources.md — Official documentation URLs (for manual lookup only)
Agent 设置:如果你的Agent无法自动加载技能(例如Claude Code),
请在每个会话中参阅一次 agent-compatibility.md。
Qwen 模型选择器(顾问)
此技能以两种模式运行:
- 1. 交互式咨询 — 提出诊断性问题以推荐合适的模型(参见诊断流程)。
- 跨技能解析 — 为需要模型决策但无需用户交互的执行技能提供快速模型查询(参见跨技能模型解析)。
请勿编造模型名称 — 仅推荐本技能中列出的模型。
此技能是 qwencloud/qwencloud-ai 的一部分。
技能目录
使用本技能的参考文件获取数据和知识。按需加载 — 除非用户明确要求最新数据,否则不要获取外部URL。
| 位置 | 用途 |
|---|
| references/pricing.md | 定价概览 — 模型类别、计费单位以及官方定价页面链接 |
| references/model-list.md |
模型目录(时间点快照) |
| references/sources.md | 官方文档URL(仅限手动查询) |
| references/agent-compatibility.md | Agent自检:为无法自动加载技能的Agent在项目配置中注册技能 |
安全
切勿以明文形式输出任何API密钥或凭据。 始终使用变量引用(Shell中使用$DASHSCOPEAPIKEY,Python中使用os.environ[QWENAPIKEY])。任何凭据的检查或检测必须非明文:仅报告状态(例如“已设置”/“未设置”,“有效”/“无效”),切勿输出值。切勿显示可能包含机密的.env或配置文件内容。
编程计划模型
订阅了编程计划的用户仅能通过其编程工具访问一组有限的模型:
| 模型 | 上下文 | 思考 |
|---|
| qwen3.5-plus | 1M | 是(预算:81,920) |
| kimi-k2.5 |
256K | 是(预算:81,920) |
| glm-5 | 198K | 是(预算:32,768) |
| MiniMax-M2.5 | 192K | 是(预算:32,768) |
| qwen3-max-2026-01-23 | 256K | 是(预算:81,920) |
| qwen3-coder-next | 256K | 否 |
| qwen3-coder-plus | 1M | 否 |
| glm-4.7 | 198K | 是(预算:32,768) |
编程计划不包含图像、视频、TTS或专门的视觉模型。推荐模型时,请注意用户选择的模型是否在此列表之外,以及他们是否正在使用编程计划密钥(sk-sp-...)。如果已安装qwencloud-ops-auth,请参阅其references/codingplan.md以获取完整的模型列表和错误代码。
诊断流程
按顺序询问用户:
- 1. 内容类型? — 文本 / 图像 / 视频 / 音频 / 视觉
- 主要任务? — 生成 / 理解 / 编码 / 推理 / 翻译
- 优先级? — 质量 vs 速度 vs 成本
- 输入大小? — 短 / 中 / 长上下文
- 结构化输出? — 需要JSON / 函数调用?
跨技能模型解析
当执行技能需要选择模型时,从三个维度进行评估:需求 → 场景 → 定价。如果用户明确指定了模型,则按指定使用 — 但仍需验证可用性;如果受限,请警告用户并建议替代方案。
维度1 · 需求(选择)
将任务能力与合适的模型匹配。当用户需求指向专门模型,或任务不明确需要比较能力时使用。
| 信号 | 关键词 | 模型 |
|---|
| 推理 | 逐步思考、推理、分析 | qwq-plus(文本)· qvq-max(视觉) |
| 编码 |
写代码、实现、调试 | qwen3-coder-plus |
| OCR / 文档 | 提取文本、OCR、扫描 | qwen-vl-ocr |
| 长上下文 | 长文档、大文件 | qwen3.5-plus(1M上下文) |
| 多模态(文本+图像+视频) | 分析图像、理解视频 + 文本 | qwen3.5-plus(统一多模态) |
| 语音交互 / 全模态 | 语音聊天、说话、聆听 | qwen3-omni-flash |
| 内置工具 | 搜索网络、运行代码、使用工具 | qwen3-max(网络搜索、代码解释器) |
| 图像编辑 / 风格迁移 | 编辑图像、风格迁移、参考图像 | wan2.6-image(首选)· wan2.5-i2i-preview |
| 图像到图像融合 | 放置对象、合并图像、融合图像 | wan2.6-image · wan2.5-i2i-preview |
| 风格TTS | 情感、语调、语速 | qwen3-tts-instruct-flash |
| 不明确 | 任务未明确对应一个模型 | 比较推荐矩阵;如有需要请用户澄清 |
维度2 · 场景(调优)
根据模型的使用方式调整模型层级。
| 模式 | 信号 | 指导 |
|---|
| 交互式 / 实时 | 聊天、实时、交互式 | 优先选择flash/turbo变体;启用流式输出 |
| 批处理 / 离线 |
批处理、离线、后台 | 质量模型 + 批量API(五折) |
| 一次性试用 | 尝试、测试、实验 | 质量模型;检查用户控制台中是否仍有免费配额 |
| 高量生产 | 生产、大规模、高量 | 成本优化:flash/turbo + 上下文缓存 |
| 重复上下文 | 模板、相同提示、重复 | 启用上下文缓存以获得输入Token折扣 |
维度3 · 定价(优化)
根据维度1-2的候选模型,比较成本并应用修饰符。
- - 定价参考:pricing.md。如需最新费率,请查看官方定价页面。
- 免费配额:部分模型在激活后提供有限的免费配额。但是,配额可能已被消耗、过期或更改。切勿假设剩余免费配额 — 始终提供付费单价。
- 批量API:非实时工作负载的输入和输出Token均享受五折优惠。
- 上下文缓存:重复/模板化上下文的输入Token折扣。
- 阶梯定价:部分模型随着输入长度增加按Token收取更高费用 — 请查看定价表了解断点。
- 当成本是用户的主要关注点时,明确推荐最便宜的可行模型并引用价格。
默认值
未检测到信号,任务明确 → 使用该领域的规范默认值。
| 领域 | 默认值 | 质量 | 速度 | 成本 |
|---|
| 文本.聊天 | qwen3.5-plus | qwen3-max | qwen3.5-flash | qwen-turbo |
| 视觉.分析 |
qwen3-vl-plus | qwen3-vl-plus | qwen3-vl-flash | qwen3-vl-flash |
| 全模态(语音+视觉) | qwen3-omni-flash | qwen3-omni-flash | qwen3-omni-flash | — |
| 图像.生成 | wan2.6-t2i | wan2.6-t2i | wan2.2-t2i-flash | wan2.2-t2i-flash |
| 图像.编辑 | wan2.6-image | wan2.6-image | wan2.5-i2i-preview | wan2.5-i2i-preview |
| 视频.文本到视频 | wan2.6-t2v | wan2.6-t2v | — | — |
| 视频.图像到视频 | wan2.6-i2v-flash | wan2.6-i2v | wan2.6