Category: task
Alibaba Cloud Model Studio Entry (Routing)
Route requests to existing local skills to avoid duplicating model/parameter details.
Prerequisites
- - Install SDK (virtual environment recommended to avoid PEP 668 restrictions):
CODEBLOCK0
- - Configure
DASHSCOPE_API_KEY (environment variable preferred; or dashscope_api_key in ~/.alibabacloud/credentials).
Routing Table (currently supported in this repo)
| Need | Target skill |
|---|
| Text-to-image / image generation | INLINECODE3 |
| Image editing |
skills/ai/image/alicloud-ai-image-qwen-image-edit/ |
| Text-to-video / image-to-video (i2v) |
skills/ai/video/alicloud-ai-video-wan-video/ |
| Reference-to-video (r2v) |
skills/ai/video/alicloud-ai-video-wan-r2v/ |
| Text-to-speech (TTS) |
skills/ai/audio/alicloud-ai-audio-tts/ |
| Speech recognition/transcription (ASR) |
skills/ai/audio/alicloud-ai-audio-asr/ |
| Realtime speech recognition |
skills/ai/audio/alicloud-ai-audio-asr-realtime/ |
| Realtime TTS |
skills/ai/audio/alicloud-ai-audio-tts-realtime/ |
| Live speech translation |
skills/ai/audio/alicloud-ai-audio-livetranslate/ |
| CosyVoice voice clone |
skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/ |
| CosyVoice voice design |
skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-design/ |
| Voice clone |
skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ |
| Voice design |
skills/ai/audio/alicloud-ai-audio-tts-voice-design/ |
| Omni multimodal interaction |
skills/ai/multimodal/alicloud-ai-multimodal-qwen-omni/ |
| Visual reasoning |
skills/ai/multimodal/alicloud-ai-multimodal-qvq/ |
| Text embeddings |
skills/ai/search/alicloud-ai-search-text-embedding/ |
| Rerank |
skills/ai/search/alicloud-ai-search-rerank/ |
| Vector retrieval |
skills/ai/search/alicloud-ai-search-dashvector/ or
skills/ai/search/alicloud-ai-search-opensearch/ or
skills/ai/search/alicloud-ai-search-milvus/ |
| Document understanding |
skills/ai/text/alicloud-ai-text-document-mind/ |
| Video editing |
skills/ai/video/alicloud-ai-video-wan-edit/ |
| Model list crawl/update |
skills/ai/misc/alicloud-ai-misc-crawl-and-skill/ |
When Not Matched
- - Clarify model capability and input/output type first.
- If capability is missing in repo, add a new skill first.
Common Missing Capabilities In This Repo (remaining gaps)
- - text generation/chat (LLM)
- multimodal embeddings
- OCR-specialized extraction and image translation
- virtual try-on / digital human / advanced video personas
- - For multimodal/ASR download failures, prefer public URLs listed above.
- For ASR parameter errors, use data URI in
input_audio.data. - For multimodal embedding 400, ensure
input.contents is an array.
Async Task Polling Template (video/long-running tasks)
When X-DashScope-Async: enable returns task_id, poll as follows:
CODEBLOCK1
Example result fields (success):
CODEBLOCK2
Notes:
- - Recommended polling interval: 15-20 seconds, max 10 attempts.
- After success, download
output.video_url.
Clarifying questions (ask when uncertain)
- 1. Are you working with text, image, audio, or video?
- Is this generation, editing/understanding, or retrieval?
- Do you need speech (TTS/ASR/live translate) or retrieval (embedding/rerank/vector DB)?
- Do you want runnable SDK scripts or just API/parameter guidance?
References
- - Model list and links: INLINECODE31
- API/parameters/examples: see target sub-skill
SKILL.md and INLINECODE33
- - Official source list: INLINECODE34
Validation
CODEBLOCK3
Pass criteria: command exits 0 and output/alicloud-ai-entry-modelstudio/validate.txt is generated.
Output And Evidence
- - Save artifacts, command outputs, and API response summaries under
output/alicloud-ai-entry-modelstudio/. - Include key parameters (region/resource id/time range) in evidence files for reproducibility.
Workflow
1) Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
2) Run one minimal read-only query first to verify connectivity and permissions.
3) Execute the target operation with explicit parameters and bounded scope.
4) Verify results and save output/evidence files.
分类:任务
阿里云模型服务灵骏入口(路由)
将请求路由到现有的本地技能,以避免重复模型/参数细节。
前置条件
- - 安装SDK(建议使用虚拟环境以避免PEP 668限制):
bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install dashscope
- - 配置DASHSCOPEAPIKEY(优先使用环境变量;或~/.alibabacloud/credentials中的dashscopeapikey)。
路由表(当前仓库支持)
| 需求 | 目标技能 |
|---|
| 文生图/图像生成 | skills/ai/image/alicloud-ai-image-qwen-image/ |
| 图像编辑 |
skills/ai/image/alicloud-ai-image-qwen-image-edit/ |
| 文生视频/图生视频(i2v) | skills/ai/video/alicloud-ai-video-wan-video/ |
| 参考视频生成(r2v) | skills/ai/video/alicloud-ai-video-wan-r2v/ |
| 文本转语音(TTS) | skills/ai/audio/alicloud-ai-audio-tts/ |
| 语音识别/转录(ASR) | skills/ai/audio/alicloud-ai-audio-asr/ |
| 实时语音识别 | skills/ai/audio/alicloud-ai-audio-asr-realtime/ |
| 实时TTS | skills/ai/audio/alicloud-ai-audio-tts-realtime/ |
| 实时语音翻译 | skills/ai/audio/alicloud-ai-audio-livetranslate/ |
| CosyVoice声音克隆 | skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-clone/ |
| CosyVoice声音设计 | skills/ai/audio/alicloud-ai-audio-cosyvoice-voice-design/ |
| 声音克隆 | skills/ai/audio/alicloud-ai-audio-tts-voice-clone/ |
| 声音设计 | skills/ai/audio/alicloud-ai-audio-tts-voice-design/ |
| Omni多模态交互 | skills/ai/multimodal/alicloud-ai-multimodal-qwen-omni/ |
| 视觉推理 | skills/ai/multimodal/alicloud-ai-multimodal-qvq/ |
| 文本嵌入 | skills/ai/search/alicloud-ai-search-text-embedding/ |
| 重排序 | skills/ai/search/alicloud-ai-search-rerank/ |
| 向量检索 | skills/ai/search/alicloud-ai-search-dashvector/ 或 skills/ai/search/alicloud-ai-search-opensearch/ 或 skills/ai/search/alicloud-ai-search-milvus/ |
| 文档理解 | skills/ai/text/alicloud-ai-text-document-mind/ |
| 视频编辑 | skills/ai/video/alicloud-ai-video-wan-edit/ |
| 模型列表抓取/更新 | skills/ai/misc/alicloud-ai-misc-crawl-and-skill/ |
未匹配时
- - 首先明确模型能力及输入/输出类型。
- 若仓库中缺少该能力,请先添加新技能。
本仓库常见缺失能力(待补充)
- - 文本生成/对话(大语言模型)
- 多模态嵌入
- 专用OCR提取及图像翻译
- 虚拟试穿/数字人/高级视频角色
- - 对于多模态/ASR下载失败,优先使用上述公共URL。
- 对于ASR参数错误,请在input_audio.data中使用数据URI。
- 对于多模态嵌入400错误,请确保input.contents为数组。
异步任务轮询模板(视频/长时间运行任务)
当X-DashScope-Async: enable返回task_id时,按如下方式轮询:
GET https://dashscope.aliyuncs.com/api/v1/tasks/
Authorization: Bearer $DASHSCOPEAPIKEY
成功结果示例字段:
{
output: {
task_status: SUCCEEDED,
video_url: https://...
}
}
注意:
- - 建议轮询间隔:15-20秒,最多尝试10次。
- 成功后,下载output.video_url。
澄清问题(不确定时提问)
- 1. 您处理的是文本、图像、音频还是视频?
- 是生成、编辑/理解还是检索?
- 您需要语音(TTS/ASR/实时翻译)还是检索(嵌入/重排序/向量数据库)?
- 您需要可运行的SDK脚本还是仅需API/参数指导?
参考
- - 模型列表及链接:output/alicloud-model-studio-models-summary.md
- API/参数/示例:请参见目标子技能的SKILL.md及references/*.md
- - 官方来源列表:references/sources.md
验证
bash
mkdir -p output/alicloud-ai-entry-modelstudio
echo validation_placeholder > output/alicloud-ai-entry-modelstudio/validate.txt
通过标准:命令退出码为0且output/alicloud-ai-entry-modelstudio/validate.txt已生成。
输出与证据
- - 将工件、命令输出及API响应摘要保存至output/alicloud-ai-entry-modelstudio/目录下。
- 在证据文件中包含关键参数(区域/资源ID/时间范围)以确保可复现性。
工作流程
1) 确认用户意图、区域、标识符以及操作是只读还是修改型。
2) 首先执行一个最小的只读查询以验证连接和权限。
3) 使用明确参数和限定范围执行目标操作。
4) 验证结果并保存输出/证据文件。