Alibaba Cloud Solution Deploy

Match the user's scenario to the best execution path (Terraform or CLI), then complete the task end-to-end.

Core Principles

Maximum Automation

If 10 steps are needed and 9 can be automated, automate all 9. Only pause for the 1 that genuinely requires human action. Every cloud operation that has a CLI path should use it — the user came to you precisely so they don't have to click through consoles.

CLI-First, Console-Last

Express every cloud action as a runnable aliyun CLI command. Only fall back to console when no CLI exists — and when you do, give a direct deep-linked URL (not a product homepage). The difference between https://vision.aliyun.com/facebody and https://console.aliyun.com/ is the difference between being helpful and being useless.

Never Guess — Verify First

- CLI syntax: Run aliyun <product> <command> --help before constructing commands. Parameter naming is inconsistent across products — --RegionId vs --region-id vs --region all exist.
Errors: Run diagnose_cli_command.py immediately on failure. Error messages from the Alibaba Cloud API are often cryptic — the diagnosis script calls a specialized endpoint that maps error codes to fixes.

Workflow

Step 0: Verify Environment

Run this first in any new session:

Pre-check: Aliyun CLI >= 3.3.1 required
Run aliyun version to verify >= 3.3.1. If not installed or version too low, see installation guide for instructions.
Then run the following command to enable automatic plugin installation and set the agent user-agent:
> aliyun configure set --auto-plugin-install true --user-agent AlibabaCloud-Agent-Skills
> 
This ensures product-specific CLI plugins are downloaded on first use (no manual install needed), and identifies requests as coming from this skill.

Then run the full environment check:

CODEBLOCK1

This checks: CLI version, valid credentials, auto-plugin-install, Python3 + SDK. If any check fails, tell the user what to fix and stop — a broken environment means every subsequent command will fail.

RAM Permission Pre-check

Before executing any commands, verify the current user has the required permissions:

1. Compare the user's permissions against references/ram-policies.md
If any permission is missing, abort and prompt the user to attach the required policy

Minimum required permissions are listed in references/ram-policies.md.

Step 1: Understand the Scenario

Extract from the user's request:

- What they want to build or configure
Which Alibaba Cloud products are involved (or can be inferred)
Key requirements: region, instance specs, budget, HA needs, environment (dev/staging/prod)

Distill into search keywords (Chinese + English) for Step 2. For example, "我要搭个RAG知识库" → keywords: RAG, 知识库, AnalyticDB, 百炼.

Step 2: Route to the Right Path

Check references/alicloud-tech-solutions-all.md — the master catalog of 187 Alibaba Cloud tech solutions. Search by keyword match against the solution names and descriptions.

Each row has a Terraform Module 名称 column:

- Column has a value (e.g., analyticdb-rag, deepseek-personal-website) → Path A: Terraform
Column is empty or no matching solution found → Path B: CLI-First

Also use intent-mapping.md for fuzzy keyword → solution matching (e.g., "小程序" → develop-your-wechat-mini-program-in-10-minutes).

Tell the user which path you're taking and why before proceeding.

Path A: Terraform Solution

When a Terraform module matches, deploy through the IaCService remote runtime — no local terraform binary needed.

A.1: Locate the Module

Look up the Module 名称 and Module 地址 in references/tf-plan/tf-solutions.md. Match by:

1. Exact module name from the master catalog
Keyword match against the 描述 column
Intent mapping

A.2: Fetch Example Parameters

Every module has a GitHub repo with tested examples. Derive the URLs:

CODEBLOCK2

Fetch the example main.tf via WebFetch. These values come from real tested deployments — they're far more reliable than generic defaults.

Parameter priority:

1. User explicitly specified → always use
Example main.tf from examples/complete/ → use as default
Fallback defaults (only if fetch fails): see terraform-defaults.md

A.3: Confirm with User

Show the parameters and ask for confirmation. Never silently apply them — cloud resources cost real money.

CODEBLOCK3

Sensitive values like passwords and API keys: never generate them yourself. The user provides these.

A.4: Write main.tf and Deploy

CODEBLOCK4

Deploy using the remote runtime — see terraform-online-runtime.md for full usage:

CODEBLOCK5

The STATE_ID is required for any future update or destroy. Losing it means you lose control over the resources.

A.5: Verify and Report

Confirm resources exist. Provide the destroy command for cleanup.

Path B: CLI-First Execution

This path handles everything without a Terraform template. The approach: understand the architecture → decompose into steps → find the CLI command for each step → execute.

B.1: Understand the Architecture

Before writing any commands, understand what you're building:

- If the master catalog had a matching solution (just without TF Module), it still has tutorial links (部署教程 column). Fetch that page to understand the target architecture, required products, and deployment sequence. This gives you the blueprint — you'll then translate each step into CLI commands.
If no solution matched at all, reason from the user's description: what products are needed, what depends on what, what's the end state.

B.2: Decompose into Steps

Break the goal into atomic steps ordered by dependency. Think through:

- Resource creation order: VPC → VSwitch → Security Group → ECS is almost always the foundation
ID chaining: which step outputs IDs that later steps need (VpcId → CreateVSwitch, VSwitchId → RunInstances)
Async operations: some create calls return immediately but the resource takes time — you'll need to poll
What might not have a CLI: some product activations, some console-only features

B.3: Research CLI Commands

For each step, use the scripts to find the correct API name and parameters. This is critical — don't rely on memory. Alibaba Cloud has thousands of APIs, and parameter names are inconsistent across products.

CODEBLOCK6

Run scripts in parallel when researching multiple products — don't serialize what can be parallelized.

Common CLI shortcuts that avoid console entirely:

Scenario	CLI Command	Notes
Get Bailian (百炼) API Key	INLINECODE23 → INLINECODE24	Avoids console entirely. Almost every AI solution needs this.
Run commands on ECS

aliyun ecs RunCommand --Type RunShellScript --CommandContent '<script>' --InstanceId.1 <id> | Use Cloud Assistant instead of asking the user to SSH in. |
| OSS operations | aliyun ossutil cp/ls/mb ... | Use ossutil subcommand, not oss. |

The Bailian API Key pattern is especially important — nearly every AI-related solution needs a DashScope/Bailian SK, and users often don't know it can be obtained programmatically. Whenever a plan involves 百炼/Bailian/DashScope, proactively use the modelstudio commands to get the key.

B.4: Present Plan and Confirm

Before running any write operations, show the complete execution plan. The plan MUST include a RAM permissions section listing all permissions the current account needs — this lets the user verify access before execution starts, avoiding mid-deploy Forbidden.RAM errors.

Derive the required permissions from the planned CLI commands: each aliyun <product> <API> call maps to a RAM action in the form <product>:<API> (e.g., aliyun vpc CreateVpc → vpc:CreateVpc).

CODEBLOCK7

Wait for user approval. Cloud resources cost money, and some operations (like deleting RDS instances) are irreversible.

B.5: Execute

For each step:

1. Verify syntax first: aliyun <product> <api> --help — catch parameter errors before they hit the API
Run the command
Verify result: poll async operations; describe the resource to confirm it exists
Capture output: save IDs, endpoints, connection strings for subsequent steps and final report

B.6: Handle Errors

When a command fails:

CODEBLOCK8

The diagnosis script calls a specialized API that maps error codes to actionable fixes. Apply the fix and retry. If the same error persists after the fix, report to the user with the diagnosis — don't keep retrying blindly.

Resume from the failed step. Never re-run steps that already succeeded — those resources already exist and re-running would either fail (duplicate) or create unwanted duplicates.

B.7: Report

Summarize:

- Resources created (with IDs)
Access endpoints / connection strings
How to use what was built
Cleanup commands (delete in reverse dependency order: ECS → Security Group → VSwitch → VPC)

Script Reference

Script	Purpose	Example
INLINECODE36	Environment check	INLINECODE37
INLINECODE38

Find product code + version | python3 {{SKILL_PATH}}/scripts/lsit_products.py 'ECS' |
| search_apis.py | Natural language → API | python3 {{SKILL_PATH}}/scripts/search_apis.py '创建ECS实例' |
| search_documents.py | Doc search for details | python3 {{SKILL_PATH}}/scripts/search_documents.py 'ECS实例规格' |
| lsit_api_overview.py | Full API list for a product | python3 {{SKILL_PATH}}/scripts/lsit_api_overview.py Ecs 2014-05-26 |
| diagnose_cli_command.py | Diagnose CLI errors | python3 {{SKILL_PATH}}/scripts/diagnose_cli_command.py '<cmd>' '<err>' |
| terraform_runtime_online.sh | Remote TF execution | See terraform-online-runtime.md |

References

- Intent Mapping — keyword → solution mapping
Terraform Defaults — default parameter values
Terraform Online Runtime — IaCService script usage
All Tech Solutions Catalog — 187 solutions with TF Module availability
TF Solutions Detail — 48 Terraform modules with Registry addresses

阿里云解决方案部署

将用户的场景匹配到最佳执行路径（Terraform 或 CLI），然后端到端完成任务。

核心原则

最大自动化

如果需要10个步骤，其中9个可以自动化，那就自动化全部9个。只在真正需要人工操作的1个步骤暂停。每个有CLI路径的云操作都应使用CLI——用户来找你正是为了不必在控制台点击操作。

CLI优先，控制台最后

将每个云操作表达为可执行的 aliyun CLI命令。仅在无CLI可用时才回退到控制台——此时需提供直接深度链接的URL（而非产品首页）。https://vision.aliyun.com/facebody 与 https://console.aliyun.com/ 的区别，就是有用与无用的区别。

绝不猜测——先验证

- CLI语法：在构造命令前运行 aliyun --help。各产品的参数命名不一致——--RegionId、--region-id、--region 均存在。
错误处理：失败时立即运行 diagnoseclicommand.py。阿里云API的错误信息通常晦涩难懂——诊断脚本会调用专门的端点，将错误码映射为修复方案。

工作流程

步骤0：验证环境

在任何新会话中首先运行：

预检：需要 Aliyun CLI >= 3.3.1
运行 aliyun version 验证版本 >= 3.3.1。若未安装或版本过低，请参阅安装指南获取说明。
然后运行以下命令启用自动插件安装并设置代理用户代理：
bash
aliyun configure set --auto-plugin-install true --user-agent AlibabaCloud-Agent-Skills

这将确保产品特定的CLI插件在首次使用时自动下载（无需手动安装），并将请求标识为来自此技能。

然后运行完整的环境检查：

bash
bash {{SKILLPATH}}/scripts/verifyenv.sh

此项检查：CLI版本、有效凭证、auto-plugin-install、Python3 + SDK。若任一项检查失败，告知用户需修复的内容并停止——环境损坏意味着后续每条命令都会失败。

RAM权限预检

在执行任何命令前，验证当前用户拥有所需权限：

1. 将用户的权限与 references/ram-policies.md 进行比对
若缺少任何权限，中止操作并提示用户附加所需策略

最低所需权限列于 references/ram-policies.md 中。

步骤1：理解场景

从用户请求中提取：

- 他们想要构建或配置的内容
涉及哪些阿里云产品（或可推断出的产品）
关键需求：地域、实例规格、预算、高可用需求、环境（开发/预发布/生产）

提炼为步骤2的搜索关键词（中文+英文）。例如，我要搭个RAG知识库 → 关键词：RAG、知识库、AnalyticDB、百炼。

步骤2：路由到正确路径

查阅 references/alicloud-tech-solutions-all.md——187个阿里云技术解决方案的主目录。通过关键词匹配解决方案名称和描述进行搜索。

每行有一个 Terraform Module 名称列：

- 列有值（例如 analyticdb-rag、deepseek-personal-website）→ 路径A：Terraform
列为空或未找到匹配解决方案 → 路径B：CLI优先

同时使用 intent-mapping.md 进行模糊关键词到解决方案的匹配（例如，小程序 → develop-your-wechat-mini-program-in-10-minutes）。

在继续前告知用户你将采用哪条路径及原因。

路径A：Terraform 解决方案

当匹配到Terraform模块时，通过IaCService远程运行时部署——无需本地 terraform 二进制文件。

A.1：定位模块

在 references/tf-plan/tf-solutions.md 中查找模块名称和模块地址。通过以下方式匹配：

1. 主目录中的精确模块名称
关键词匹配描述列
意图映射

A.2：获取示例参数

每个模块都有一个包含经过测试的示例的GitHub仓库。推导出URL：

模块地址: https://registry.terraform.io/modules/alibabacloud-automation//alicloud/latest
GitHub仓库: https://github.com/alibabacloud-automation/terraform-alicloud-
示例: https://raw.githubusercontent.com/alibabacloud-automation/terraform-alicloud-/main/examples/complete/main.tf

通过WebFetch获取示例 main.tf。这些值来自实际测试过的部署——远比通用默认值可靠。

参数优先级：

1. 用户明确指定 → 始终使用
来自 examples/complete/ 的示例 main.tf → 作为默认值使用
回退默认值（仅在获取失败时）：参见 terraform-defaults.md

A.3：与用户确认

展示参数并请求确认。切勿静默应用——云资源需要真金白银。

以下是基于官方示例的部署参数，请确认或修改：
• Region: cn-hangzhou
• Instance type: ecs.c7.large
• VPC CIDR: 172.16.0.0/12
• Password: (请提供)

敏感值如密码和API密钥：切勿自行生成。由用户提供。

A.4：编写 main.tf 并部署

hcl

基于：https://github.com/alibabacloud-automation/terraform-alicloud-/blob/main/examples/complete/main.tf

module {
source = alibabacloud-automation//alicloud
version = ~> 1.0
# 根据用户确认调整的参数
}

使用远程运行时部署——完整用法参见 terraform-online-runtime.md：

bash
SKILLDIR={{SKILLPATH}}
TF=${SKILLDIR}/scripts/terraformruntime_online.sh
STATEID=$($TF apply main.tf | grep ^STATEID= | cut -d= -f2)
echo STATEID=$STATEID >> terraformstateids.env

STATE_ID 对于任何未来的更新或销毁操作都是必需的。丢失它意味着失去对资源的控制。

A.5：验证并报告

确认资源存在。提供用于清理的销毁命令。

路径B：CLI优先执行

此路径处理所有没有Terraform模板的情况。方法：理解架构 → 分解为步骤 → 为每个步骤找到CLI命令 → 执行。

B.1：理解架构

在编写任何命令之前，理解你要构建的内容：

- 如果主目录中有匹配的解决方案（只是没有TF模块），它仍然有教程链接（部署教程列）。获取该页面以了解目标架构、所需产品和部署顺序。这为你提供了蓝图——然后你将每个步骤转化为CLI命令。
如果完全没有匹配的解决方案，则从用户的描述中推理：需要哪些产品，哪些依赖哪些，最终状态是什么。

B.2：分解为步骤

将目标分解为按依赖关系排序的原子步骤。思考：

- 资源创建顺序：VPC → VSwitch → 安全组 → ECS 几乎总是基础
ID链：哪些步骤输出后续步骤需要的ID（VpcId → CreateVSwitch，VSwitchId → RunInstances）
异步操作：某些创建调用立即返回但资源需要时间——你需要轮询
可能没有CLI的内容：某些产品开通、某些仅控制台功能

B.3：研究CLI命令

对于每个步骤，使用脚本找到正确的API名称和参数。这至关重要——不要依赖记忆。阿里云有数千个API，且各产品的参数名称不一致。

bash
python3 {{SKILLPATH}}/scripts/lsitproducts.py # 查找产品代码 + API版本
python3 {{SKILLPATH}}/scripts/searchapis.py # 自然语言 → API
python3 {{SKILLPATH}}/scripts/searchdocuments.py # 参数详情、有效值、约束
python3 {{SKILLPATH}}/scripts/lsitapi_overview.py # 产品的完整API列表

在研究多个产品时并行运行脚本——不要串行化可以并行处理的内容。

完全避免控制台的常用CLI快捷方式：

场景	CLI命令	备注
获取百炼API密钥

aliyun modelstudio list-workspaces

alibabacloud-solution-deploy阿里云部署方案