Alibaba Cloud Solution Deploy
Match the user's scenario to the best execution path (Terraform or CLI), then complete the task end-to-end.
Core Principles
Maximum Automation
If 10 steps are needed and 9 can be automated, automate all 9. Only pause for the 1 that genuinely requires human action. Every cloud operation that has a CLI path should use it — the user came to you precisely so they don't have to click through consoles.
CLI-First, Console-Last
Express every cloud action as a runnable
aliyun CLI command. Only fall back to console when no CLI exists — and when you do, give a
direct deep-linked URL (not a product homepage). The difference between
https://vision.aliyun.com/facebody and
https://console.aliyun.com/ is the difference between being helpful and being useless.
Never Guess — Verify First
- - CLI syntax: Run
aliyun <product> <command> --help before constructing commands. Parameter naming is inconsistent across products — --RegionId vs --region-id vs --region all exist. - Errors: Run
diagnose_cli_command.py immediately on failure. Error messages from the Alibaba Cloud API are often cryptic — the diagnosis script calls a specialized endpoint that maps error codes to fixes.
Workflow
Step 0: Verify Environment
Run this first in any new session:
Pre-check: Aliyun CLI >= 3.3.1 required
Run aliyun version to verify >= 3.3.1. If not installed or version too low, see installation guide for instructions.
Then run the following command to enable automatic plugin installation and set the agent user-agent:
> aliyun configure set --auto-plugin-install true --user-agent AlibabaCloud-Agent-Skills
>
This ensures product-specific CLI plugins are downloaded on first use (no manual install needed), and identifies requests as coming from this skill.
Then run the full environment check:
CODEBLOCK1
This checks: CLI version, valid credentials, auto-plugin-install, Python3 + SDK. If any check fails, tell the user what to fix and stop — a broken environment means every subsequent command will fail.
RAM Permission Pre-check
Before executing any commands, verify the current user has the required permissions:
- 1. Compare the user's permissions against references/ram-policies.md
- If any permission is missing, abort and prompt the user to attach the required policy
Minimum required permissions are listed in references/ram-policies.md.
Step 1: Understand the Scenario
Extract from the user's request:
- - What they want to build or configure
- Which Alibaba Cloud products are involved (or can be inferred)
- Key requirements: region, instance specs, budget, HA needs, environment (dev/staging/prod)
Distill into search keywords (Chinese + English) for Step 2. For example, "我要搭个RAG知识库" → keywords: RAG, 知识库, AnalyticDB, 百炼.
Step 2: Route to the Right Path
Check references/alicloud-tech-solutions-all.md — the master catalog of 187 Alibaba Cloud tech solutions. Search by keyword match against the solution names and descriptions.
Each row has a Terraform Module 名称 column:
- - Column has a value (e.g.,
analyticdb-rag, deepseek-personal-website) → Path A: Terraform - Column is empty or no matching solution found → Path B: CLI-First
Also use intent-mapping.md for fuzzy keyword → solution matching (e.g., "小程序" → develop-your-wechat-mini-program-in-10-minutes).
Tell the user which path you're taking and why before proceeding.
Path A: Terraform Solution
When a Terraform module matches, deploy through the IaCService remote runtime — no local terraform binary needed.
A.1: Locate the Module
Look up the Module 名称 and Module 地址 in references/tf-plan/tf-solutions.md. Match by:
- 1. Exact module name from the master catalog
- Keyword match against the 描述 column
- Intent mapping
A.2: Fetch Example Parameters
Every module has a GitHub repo with tested examples. Derive the URLs:
CODEBLOCK2
Fetch the example main.tf via WebFetch. These values come from real tested deployments — they're far more reliable than generic defaults.
Parameter priority:
- 1. User explicitly specified → always use
- Example
main.tf from examples/complete/ → use as default - Fallback defaults (only if fetch fails): see terraform-defaults.md
A.3: Confirm with User
Show the parameters and ask for confirmation. Never silently apply them — cloud resources cost real money.
CODEBLOCK3
Sensitive values like passwords and API keys: never generate them yourself. The user provides these.
A.4: Write main.tf and Deploy
CODEBLOCK4
Deploy using the remote runtime — see terraform-online-runtime.md for full usage:
CODEBLOCK5
The STATE_ID is required for any future update or destroy. Losing it means you lose control over the resources.
A.5: Verify and Report
Confirm resources exist. Provide the destroy command for cleanup.
Path B: CLI-First Execution
This path handles everything without a Terraform template. The approach: understand the architecture → decompose into steps → find the CLI command for each step → execute.
B.1: Understand the Architecture
Before writing any commands, understand what you're building:
- - If the master catalog had a matching solution (just without TF Module), it still has tutorial links (部署教程 column). Fetch that page to understand the target architecture, required products, and deployment sequence. This gives you the blueprint — you'll then translate each step into CLI commands.
- If no solution matched at all, reason from the user's description: what products are needed, what depends on what, what's the end state.
B.2: Decompose into Steps
Break the goal into atomic steps ordered by dependency. Think through:
- - Resource creation order: VPC → VSwitch → Security Group → ECS is almost always the foundation
- ID chaining: which step outputs IDs that later steps need (VpcId → CreateVSwitch, VSwitchId → RunInstances)
- Async operations: some create calls return immediately but the resource takes time — you'll need to poll
- What might not have a CLI: some product activations, some console-only features
B.3: Research CLI Commands
For each step, use the scripts to find the correct API name and parameters. This is critical — don't rely on memory. Alibaba Cloud has thousands of APIs, and parameter names are inconsistent across products.
CODEBLOCK6
Run scripts in parallel when researching multiple products — don't serialize what can be parallelized.
Common CLI shortcuts that avoid console entirely:
| Scenario | CLI Command | Notes |
|---|
| Get Bailian (百炼) API Key | INLINECODE23 → INLINECODE24 | Avoids console entirely. Almost every AI solution needs this. |
| Run commands on ECS |
aliyun ecs RunCommand --Type RunShellScript --CommandContent '<script>' --InstanceId.1 <id> | Use Cloud Assistant instead of asking the user to SSH in. |
| OSS operations |
aliyun ossutil cp/ls/mb ... | Use
ossutil subcommand, not
oss. |
The Bailian API Key pattern is especially important — nearly every AI-related solution needs a DashScope/Bailian SK, and users often don't know it can be obtained programmatically. Whenever a plan involves 百炼/Bailian/DashScope, proactively use the modelstudio commands to get the key.
B.4: Present Plan and Confirm
Before running any write operations, show the complete execution plan. The plan MUST include a RAM permissions section listing all permissions the current account needs — this lets the user verify access before execution starts, avoiding mid-deploy Forbidden.RAM errors.
Derive the required permissions from the planned CLI commands: each aliyun <product> <API> call maps to a RAM action in the form <product>:<API> (e.g., aliyun vpc CreateVpc → vpc:CreateVpc).
CODEBLOCK7
Wait for user approval. Cloud resources cost money, and some operations (like deleting RDS instances) are irreversible.
B.5: Execute
For each step:
- 1. Verify syntax first:
aliyun <product> <api> --help — catch parameter errors before they hit the API - Run the command
- Verify result: poll async operations; describe the resource to confirm it exists
- Capture output: save IDs, endpoints, connection strings for subsequent steps and final report
B.6: Handle Errors
When a command fails:
CODEBLOCK8
The diagnosis script calls a specialized API that maps error codes to actionable fixes. Apply the fix and retry. If the same error persists after the fix, report to the user with the diagnosis — don't keep retrying blindly.
Resume from the failed step. Never re-run steps that already succeeded — those resources already exist and re-running would either fail (duplicate) or create unwanted duplicates.
B.7: Report
Summarize:
- - Resources created (with IDs)
- Access endpoints / connection strings
- How to use what was built
- Cleanup commands (delete in reverse dependency order: ECS → Security Group → VSwitch → VPC)
Script Reference
| Script | Purpose | Example |
|---|
| INLINECODE36 | Environment check | INLINECODE37 |
| INLINECODE38 |
Find product code + version |
python3 {{SKILL_PATH}}/scripts/lsit_products.py 'ECS' |
|
search_apis.py | Natural language → API |
python3 {{SKILL_PATH}}/scripts/search_apis.py '创建ECS实例' |
|
search_documents.py | Doc search for details |
python3 {{SKILL_PATH}}/scripts/search_documents.py 'ECS实例规格' |
|
lsit_api_overview.py | Full API list for a product |
python3 {{SKILL_PATH}}/scripts/lsit_api_overview.py Ecs 2014-05-26 |
|
diagnose_cli_command.py | Diagnose CLI errors |
python3 {{SKILL_PATH}}/scripts/diagnose_cli_command.py '<cmd>' '<err>' |
|
terraform_runtime_online.sh | Remote TF execution | See
terraform-online-runtime.md |
References
阿里云解决方案部署
将用户的场景匹配到最佳执行路径(Terraform 或 CLI),然后端到端完成任务。
核心原则
最大自动化
如果需要10个步骤,其中9个可以自动化,那就自动化全部9个。只在真正需要人工操作的1个步骤暂停。每个有CLI路径的云操作都应使用CLI——用户来找你正是为了不必在控制台点击操作。
CLI优先,控制台最后
将每个云操作表达为可执行的 aliyun CLI命令。仅在无CLI可用时才回退到控制台——此时需提供
直接深度链接的URL(而非产品首页)。https://vision.aliyun.com/facebody 与 https://console.aliyun.com/ 的区别,就是有用与无用的区别。
绝不猜测——先验证
- - CLI语法:在构造命令前运行 aliyun --help。各产品的参数命名不一致——--RegionId、--region-id、--region 均存在。
- 错误处理:失败时立即运行 diagnoseclicommand.py。阿里云API的错误信息通常晦涩难懂——诊断脚本会调用专门的端点,将错误码映射为修复方案。
工作流程
步骤0:验证环境
在任何新会话中首先运行:
预检:需要 Aliyun CLI >= 3.3.1
运行 aliyun version 验证版本 >= 3.3.1。若未安装或版本过低,请参阅安装指南获取说明。
然后运行以下命令启用自动插件安装并设置代理用户代理:
bash
aliyun configure set --auto-plugin-install true --user-agent AlibabaCloud-Agent-Skills
这将确保产品特定的CLI插件在首次使用时自动下载(无需手动安装),并将请求标识为来自此技能。
然后运行完整的环境检查:
bash
bash {{SKILLPATH}}/scripts/verifyenv.sh
此项检查:CLI版本、有效凭证、auto-plugin-install、Python3 + SDK。若任一项检查失败,告知用户需修复的内容并停止——环境损坏意味着后续每条命令都会失败。
RAM权限预检
在执行任何命令前,验证当前用户拥有所需权限:
- 1. 将用户的权限与 references/ram-policies.md 进行比对
- 若缺少任何权限,中止操作并提示用户附加所需策略
最低所需权限列于 references/ram-policies.md 中。
步骤1:理解场景
从用户请求中提取:
- - 他们想要构建或配置的内容
- 涉及哪些阿里云产品(或可推断出的产品)
- 关键需求:地域、实例规格、预算、高可用需求、环境(开发/预发布/生产)
提炼为步骤2的搜索关键词(中文+英文)。例如,我要搭个RAG知识库 → 关键词:RAG、知识库、AnalyticDB、百炼。
步骤2:路由到正确路径
查阅 references/alicloud-tech-solutions-all.md——187个阿里云技术解决方案的主目录。通过关键词匹配解决方案名称和描述进行搜索。
每行有一个 Terraform Module 名称 列:
- - 列有值(例如 analyticdb-rag、deepseek-personal-website)→ 路径A:Terraform
- 列为空或未找到匹配解决方案 → 路径B:CLI优先
同时使用 intent-mapping.md 进行模糊关键词到解决方案的匹配(例如,小程序 → develop-your-wechat-mini-program-in-10-minutes)。
在继续前告知用户你将采用哪条路径及原因。
路径A:Terraform 解决方案
当匹配到Terraform模块时,通过IaCService远程运行时部署——无需本地 terraform 二进制文件。
A.1:定位模块
在 references/tf-plan/tf-solutions.md 中查找模块名称和模块地址。通过以下方式匹配:
- 1. 主目录中的精确模块名称
- 关键词匹配描述列
- 意图映射
A.2:获取示例参数
每个模块都有一个包含经过测试的示例的GitHub仓库。推导出URL:
模块地址: https://registry.terraform.io/modules/alibabacloud-automation//alicloud/latest
GitHub仓库: https://github.com/alibabacloud-automation/terraform-alicloud-
示例: https://raw.githubusercontent.com/alibabacloud-automation/terraform-alicloud-/main/examples/complete/main.tf
通过WebFetch获取示例 main.tf。这些值来自实际测试过的部署——远比通用默认值可靠。
参数优先级:
- 1. 用户明确指定 → 始终使用
- 来自 examples/complete/ 的示例 main.tf → 作为默认值使用
- 回退默认值(仅在获取失败时):参见 terraform-defaults.md
A.3:与用户确认
展示参数并请求确认。切勿静默应用——云资源需要真金白银。
以下是基于官方示例的部署参数,请确认或修改:
• Region: cn-hangzhou
• Instance type: ecs.c7.large
• VPC CIDR: 172.16.0.0/12
• Password: (请提供)
敏感值如密码和API密钥:切勿自行生成。由用户提供。
A.4:编写 main.tf 并部署
hcl
基于:https://github.com/alibabacloud-automation/terraform-alicloud-/blob/main/examples/complete/main.tf
module
{
source = alibabacloud-automation//alicloud
version = ~> 1.0
# 根据用户确认调整的参数
}
使用远程运行时部署——完整用法参见 terraform-online-runtime.md:
bash
SKILLDIR={{SKILLPATH}}
TF=${SKILLDIR}/scripts/terraformruntime_online.sh
STATEID=$($TF apply main.tf | grep ^STATEID= | cut -d= -f2)
echo STATEID=$STATEID >> terraformstateids.env
STATE_ID 对于任何未来的更新或销毁操作都是必需的。丢失它意味着失去对资源的控制。
A.5:验证并报告
确认资源存在。提供用于清理的销毁命令。
路径B:CLI优先执行
此路径处理所有没有Terraform模板的情况。方法:理解架构 → 分解为步骤 → 为每个步骤找到CLI命令 → 执行。
B.1:理解架构
在编写任何命令之前,理解你要构建的内容:
- - 如果主目录中有匹配的解决方案(只是没有TF模块),它仍然有教程链接(部署教程列)。获取该页面以了解目标架构、所需产品和部署顺序。这为你提供了蓝图——然后你将每个步骤转化为CLI命令。
- 如果完全没有匹配的解决方案,则从用户的描述中推理:需要哪些产品,哪些依赖哪些,最终状态是什么。
B.2:分解为步骤
将目标分解为按依赖关系排序的原子步骤。思考:
- - 资源创建顺序:VPC → VSwitch → 安全组 → ECS 几乎总是基础
- ID链:哪些步骤输出后续步骤需要的ID(VpcId → CreateVSwitch,VSwitchId → RunInstances)
- 异步操作:某些创建调用立即返回但资源需要时间——你需要轮询
- 可能没有CLI的内容:某些产品开通、某些仅控制台功能
B.3:研究CLI命令
对于每个步骤,使用脚本找到正确的API名称和参数。这至关重要——不要依赖记忆。阿里云有数千个API,且各产品的参数名称不一致。
bash
python3 {{SKILLPATH}}/scripts/lsitproducts.py # 查找产品代码 + API版本
python3 {{SKILLPATH}}/scripts/searchapis.py # 自然语言 → API
python3 {{SKILLPATH}}/scripts/searchdocuments.py # 参数详情、有效值、约束
python3 {{SKILLPATH}}/scripts/lsitapi_overview.py # 产品的完整API列表
在研究多个产品时并行运行脚本——不要串行化可以并行处理的内容。
完全避免控制台的常用CLI快捷方式:
aliyun modelstudio list-workspaces