Category: task
Alibaba Cloud Model Studio Entry (Routing)
Route requests to existing local skills to avoid duplicating model/parameter details.
Prerequisites
- - Install SDK (virtual environment recommended to avoid PEP 668 restrictions):
CODEBLOCK0
- - Configure
DASHSCOPE_API_KEY (environment variable preferred; or dashscope_api_key in ~/.alibabacloud/credentials).
Routing Table (currently supported in this repo)
| Need | Target skill |
|---|
| Text generation / reasoning / tool-calling | INLINECODE3 |
| Coding / repository reasoning |
skills/ai/code/aliyun-qwen-coder/ |
| Deep multi-step research |
skills/ai/research/aliyun-qwen-deep-research/ |
| Text-to-image / image generation |
skills/ai/image/aliyun-qwen-image/ |
| Image editing |
skills/ai/image/aliyun-qwen-image-edit/ |
| Text-to-video / image-to-video (t2v/i2v) |
skills/ai/video/aliyun-wan-video/ |
| Non-Wan PixVerse video generation |
skills/ai/video/aliyun-pixverse-generation/ |
| Reference-to-video (r2v) |
skills/ai/video/aliyun-wan-r2v/ |
| Digital human talking / singing avatar |
skills/ai/video/aliyun-wan-digital-human/ |
| Expressive portrait video (EMO) |
skills/ai/video/aliyun-emo/ |
| Lightweight portrait animation (LivePortrait) |
skills/ai/video/aliyun-liveportrait/ |
| Motion transfer / dancing avatar (AnimateAnyone) |
skills/ai/video/aliyun-animate-anyone/ |
| Emoji / meme portrait video |
skills/ai/video/aliyun-emoji/ |
| Text-to-speech (TTS) |
skills/ai/audio/aliyun-qwen-tts/ |
| Speech recognition/transcription (ASR) |
skills/ai/audio/aliyun-qwen-asr/ |
| Realtime speech recognition |
skills/ai/audio/aliyun-qwen-asr-realtime/ |
| Realtime TTS |
skills/ai/audio/aliyun-qwen-tts-realtime/ |
| Live speech translation |
skills/ai/audio/aliyun-qwen-livetranslate/ |
| CosyVoice voice clone |
skills/ai/audio/aliyun-cosyvoice-voice-clone/ |
| CosyVoice voice design |
skills/ai/audio/aliyun-cosyvoice-voice-design/ |
| Voice clone |
skills/ai/audio/aliyun-qwen-tts-voice-clone/ |
| Voice design |
skills/ai/audio/aliyun-qwen-tts-voice-design/ |
| Omni multimodal interaction |
skills/ai/multimodal/aliyun-qwen-omni/ |
| Visual reasoning |
skills/ai/multimodal/aliyun-qvq/ |
| OCR / document parsing / table parsing |
skills/ai/multimodal/aliyun-qwen-ocr/ |
| Text embeddings |
skills/ai/search/aliyun-qwen-text-embedding/ |
| Multimodal embeddings |
skills/ai/search/aliyun-qwen-multimodal-embedding/ |
| Rerank |
skills/ai/search/aliyun-qwen-rerank/ |
| Vector retrieval |
skills/ai/search/aliyun-dashvector-search/ or
skills/ai/search/aliyun-opensearch-search/ or
skills/ai/search/aliyun-milvus-search/ |
| Document understanding |
skills/ai/text/aliyun-docmind-extract/ |
| Video editing |
skills/ai/video/aliyun-wan-edit/ |
| Video lip-sync replacement / retalk |
skills/ai/video/aliyun-videoretalk/ |
| Model list crawl/update |
skills/ai/misc/aliyun-modelstudio-crawl-and-skill/ |
When Not Matched
- - Clarify model capability and input/output type first.
- If capability is missing in repo, add a new skill first.
Common Missing Capabilities In This Repo (remaining gaps)
- - image translation
- virtual try-on / digital human / advanced video personas
- - For multimodal/ASR download failures, prefer public URLs listed above.
- For ASR parameter errors, use data URI in
input_audio.data. - For multimodal embedding 400, ensure
input.contents is an array.
Async Task Polling Template (video/long-running tasks)
When X-DashScope-Async: enable returns task_id, poll as follows:
CODEBLOCK1
Example result fields (success):
CODEBLOCK2
Notes:
- - Recommended polling interval: 15-20 seconds, max 10 attempts.
- After success, download
output.video_url.
Clarifying questions (ask when uncertain)
- 1. Are you working with text, image, audio, or video?
- Is this generation, editing/understanding, or retrieval?
- Do you need speech (TTS/ASR/live translate) or retrieval (embedding/rerank/vector DB)?
- Do you want runnable SDK scripts or just API/parameter guidance?
References
- - Model list and links: INLINECODE43
- API/parameters/examples: see target sub-skill
SKILL.md and INLINECODE45
- - Official source list: INLINECODE46
Validation
CODEBLOCK3
Pass criteria: command exits 0 and output/aliyun-modelstudio-entry/validate.txt is generated.
Output And Evidence
- - Save artifacts, command outputs, and API response summaries under
output/aliyun-modelstudio-entry/. - Include key parameters (region/resource id/time range) in evidence files for reproducibility.
Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
技能名称: aliyun-modelstudio-entry
详细描述:
类别: 任务
阿里云模型工作室入口(路由)
将请求路由到现有的本地技能,以避免重复模型/参数细节。
前提条件
- - 安装 SDK(建议使用虚拟环境以避免 PEP 668 限制):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- - 配置 DASHSCOPEAPIKEY(优先使用环境变量;或 ~/.alibabacloud/credentials 中的 dashscopeapikey)。
路由表(此仓库当前支持)
| 需求 | 目标技能 |
|---|
| 文本生成 / 推理 / 工具调用 | skills/ai/text/aliyun-qwen-generation/ |
| 编码 / 仓库推理 |
skills/ai/code/aliyun-qwen-coder/ |
| 深度多步研究 | skills/ai/research/aliyun-qwen-deep-research/ |
| 文生图 / 图像生成 | skills/ai/image/aliyun-qwen-image/ |
| 图像编辑 | skills/ai/image/aliyun-qwen-image-edit/ |
| 文生视频 / 图生视频(t2v/i2v) | skills/ai/video/aliyun-wan-video/ |
| 非 Wan 的 PixVerse 视频生成 | skills/ai/video/aliyun-pixverse-generation/ |
| 参考视频生成(r2v) | skills/ai/video/aliyun-wan-r2v/ |
| 数字人说话 / 唱歌形象 | skills/ai/video/aliyun-wan-digital-human/ |
| 表情肖像视频(EMO) | skills/ai/video/aliyun-emo/ |
| 轻量级肖像动画(LivePortrait) | skills/ai/video/aliyun-liveportrait/ |
| 动作迁移 / 跳舞形象(AnimateAnyone) | skills/ai/video/aliyun-animate-anyone/ |
| 表情符号 / 表情包肖像视频 | skills/ai/video/aliyun-emoji/ |
| 文本转语音(TTS) | skills/ai/audio/aliyun-qwen-tts/ |
| 语音识别 / 转录(ASR) | skills/ai/audio/aliyun-qwen-asr/ |
| 实时语音识别 | skills/ai/audio/aliyun-qwen-asr-realtime/ |
| 实时 TTS | skills/ai/audio/aliyun-qwen-tts-realtime/ |
| 实时语音翻译 | skills/ai/audio/aliyun-qwen-livetranslate/ |
| CosyVoice 声音克隆 | skills/ai/audio/aliyun-cosyvoice-voice-clone/ |
| CosyVoice 声音设计 | skills/ai/audio/aliyun-cosyvoice-voice-design/ |
| 声音克隆 | skills/ai/audio/aliyun-qwen-tts-voice-clone/ |
| 声音设计 | skills/ai/audio/aliyun-qwen-tts-voice-design/ |
| Omni 多模态交互 | skills/ai/multimodal/aliyun-qwen-omni/ |
| 视觉推理 | skills/ai/multimodal/aliyun-qvq/ |
| OCR / 文档解析 / 表格解析 | skills/ai/multimodal/aliyun-qwen-ocr/ |
| 文本嵌入 | skills/ai/search/aliyun-qwen-text-embedding/ |
| 多模态嵌入 | skills/ai/search/aliyun-qwen-multimodal-embedding/ |
| 重排序 | skills/ai/search/aliyun-qwen-rerank/ |
| 向量检索 | skills/ai/search/aliyun-dashvector-search/ 或 skills/ai/search/aliyun-opensearch-search/ 或 skills/ai/search/aliyun-milvus-search/ |
| 文档理解 | skills/ai/text/aliyun-docmind-extract/ |
| 视频编辑 | skills/ai/video/aliyun-wan-edit/ |
| 视频唇音同步替换 / 重述 | skills/ai/video/aliyun-videoretalk/ |
| 模型列表抓取 / 更新 | skills/ai/misc/aliyun-modelstudio-crawl-and-skill/ |
当未匹配时
- - 首先明确模型能力及输入/输出类型。
- 如果仓库中缺少该能力,请先添加新技能。
此仓库中常见缺失能力(剩余缺口)
- - 图像翻译
- 虚拟试穿 / 数字人 / 高级视频形象
- - 对于多模态/ASR 下载失败,优先使用上述公共 URL。
- 对于 ASR 参数错误,在 input_audio.data 中使用数据 URI。
- 对于多模态嵌入 400 错误,确保 input.contents 是一个数组。
异步任务轮询模板(视频/长时间运行任务)
当 X-DashScope-Async: enable 返回 task_id 时,按如下方式轮询:
GET https://dashscope.aliyuncs.com/api/v1/tasks/
Authorization: Bearer $DASHSCOPEAPIKEY
成功结果示例字段:
{
output: {
task_status: SUCCEEDED,
video_url: https://...
}
}
注意:
- - 建议轮询间隔:15-20 秒,最多尝试 10 次。
- 成功后,下载 output.video_url。
澄清问题(不确定时询问)
- 1. 您处理的是文本、图像、音频还是视频?
- 是生成、编辑/理解,还是检索?
- 您需要语音(TTS/ASR/实时翻译)还是检索(嵌入/重排序/向量数据库)?
- 您想要可运行的 SDK 脚本,还是仅需 API/参数指导?
参考资料
- - 模型列表和链接:output/alicloud-model-studio-models-summary.md
- API/参数/示例:请参阅目标子技能的 SKILL.md 和 references/*.md
- - 官方来源列表:references/sources.md
验证
bash
mkdir -p output/aliyun-modelstudio-entry
echo validation_placeholder > output/aliyun-modelstudio-entry/validate.txt
通过标准:命令退出码为 0 且生成了 output/aliyun-modelstudio-entry/validate.txt。
输出和证据
- - 将产物、命令输出和 API 响应摘要保存到 output/aliyun-modelstudio-entry/ 下。
- 在证据文件中包含关键参数(区域/资源 ID/时间范围),以确保可重现性。
工作流程
1) 确认用户意图、区域、标识符,以及操作是只读还是变更型。
2) 首先运行一个最小的只读查询,以验证连接和权限。
3) 使用明确的参数和限定范围执行目标操作。
4) 验证结果并保存输出/证据文件。