DataWorks Infrastructure Management
Unified management of Data Sources, Compute Resources, and Resource Groups in Alibaba Cloud DataWorks workspaces, supporting create and query operations.
Architecture
CODEBLOCK0
Global Rules
Prerequisites
- 1. Aliyun CLI >= 3.3.1:
aliyun version (Installation guide: references/cli-installation-guide.md) - First-time use: INLINECODE1
- jq (required for resource group operations): INLINECODE2
- Credential status:
aliyun configure list, verify valid credentials exist - DataWorks edition: Basic edition or above required
Security Rules: DO NOT read/print/echo AK/SK values, DO NOT let users input AK/SK directly, ONLY use aliyun configure list to check credential status.
Command Formatting
- - User-Agent (mandatory): All
aliyun CLI commands must include the --user-agent AlibabaCloud-Agent-Skills parameter to identify the source. - Single-line commands: When executing Bash commands, must construct as a single-line string; do not use
\ for line breaks. - jq step-by-step execution: First execute the
aliyun command to get JSON, then format with jq (to avoid multi-line security prompts). - Endpoint mandatory: When specifying the
--region parameter, you must also add --endpoint dataworks.<REGION_ID>.aliyuncs.com. Not needed when --region is not specified.
Parameter Confirmation
Before executing any command, all user-customizable parameters must be confirmed by the user. Do not assume or use default values.
Exception: When the user has explicitly specified parameter values in the conversation, use them directly without re-confirmation.
Resource group related parameters (mandatory user selection): VPC, VSwitch, Resource Group ID (for binding/connectivity testing) — involve networking and billing, DO NOT auto-select; must display a list for the user to explicitly choose. Confirm even if there is only one option.
⚠️ Write API Execution Gate — MUST Check Before Every Write Operation
MANDATORY: Before calling any Write API (Create / Update / Delete / Bind / Unbind / Associate / Dissociate / Test), you MUST perform the following checks in order:
- 1. Scan the entire SKILL.md for a Security Restriction or Disabled Operations notice that mentions the target API or module.
- If a restriction exists: BLOCK the operation immediately. Do NOT call the API. Respond to the user with:
- What operation is blocked and why
- The recommended alternative (e.g., use the DataWorks console, contact administrator)
- 3. If no restriction exists: Proceed normally with parameter confirmation and execution.
This check is NOT optional. It applies to every single write operation without exception. Never skip this step.
Quick Reference — Blocked APIs in this skill:| Module | Blocked APIs | Reason |
|---|
| Data Sources (Module 1) | INLINECODE13 , INLINECODE14 | Prevent accidental data loss, credential exposure, disruption of running tasks |
| Compute Resources (Module 2) |
UpdateComputeResource, DeleteComputeResource | Prevent disruption of running development and scheduling tasks |
Allowed Write APIs: CreateDataSource, CreateComputeResource, CreateResourceGroup, AssociateProjectToResourceGroup, DissociateProjectFromResourceGroup, INLINECODE22
RAM Permissions
All operations require dataworks:<APIAction> permissions. Creating resource groups additionally requires AliyunBSSOrderAccess and vpc:DescribeVpcs, vpc:DescribeVSwitches.
Full permission matrix: references/ram-policies.md
Quick Start: New Workspace Infrastructure Initialization
When the user is unsure about specific operations or has vague requirements, guide them through the following process:
- 1. Environment check — Check CLI and credentials per Prerequisites
- Confirm workspace — Use
ListProjects to locate the workspace, GetProject to confirm the mode (Simple/Standard) - Create compute resources — Guide engine type selection; the system will automatically create corresponding data sources. Standard Mode requires Dev+Prod pairs. Only pure storage-type data sources (MySQL, Kafka, etc.) need separate data source creation
- Create/bind resource groups — Query existing resource groups → let user select → bind. Guide creation when no resource groups are available
- Test connectivity — Test with bound resource groups; when all pass, inform "Infrastructure configuration complete"
After each step, proactively suggest the next action.
Next Step Guidance
After each write operation is completed and verified, proactively suggest follow-up actions:
| Completed Operation | Recommended Next Step |
|---|
| Create compute resource | Standard Mode: "Create the corresponding Dev resource?"; "Test connectivity?" |
| Create data source separately |
"Test connectivity?"; Standard Mode: "Create Dev/Prod environment data sources?" |
| Create resource group | "Bind to a workspace?" |
| Bind resource group | "Test data source connectivity?" |
| Connectivity test passed | "Infrastructure is ready." |
| Connectivity test failed | Analyze the error cause, guide the fix |
| Unbind resource group | "Bind to another workspace?" |
Trigger Rules
Trigger scenarios: Data source create/query, compute resource create/query, resource group management, infrastructure initialization, colloquial aliases (DW database connection failure, configure holo/mc resources, create rg)
Not triggered: Data development tasks, scheduling configuration, MaxCompute table management, data integration tasks, ECS/RDS/OSS, workspace member management, data quality/lineage/preview. Standalone workspace queries are handled by the alibabacloud-dataworks-workspace-manage skill.
Interaction Flow
All operations follow: Identify module → Environment check → Collect parameters → Execute command → Verify result → Guide next step
Common aliases: DW=DataWorks, holo=Hologres, mc/MC/odps=MaxCompute, pg=PostgreSQL, rg=Resource Group, ds=Data Source, RDS=InstanceMode MySQL/PG/SQLServer, ADB=AnalyticDB
Naming suggestions: Data source {type}_{business}_{purpose}, Compute resource {type}_{business}, Resource group dw_{purpose}_rg_{env}
Module 0: Workspace Query
If the alibabacloud-dataworks-workspace-manage skill is available, prefer using it for workspace queries. The following is only a fallback.
CODEBLOCK1
When searching by name, first get the full list then filter .PagingInfo.Projects[] by Name/DisplayName using jq.
Module 1: Data Source Management
Supports 50 data source types. See references/data-sources/README.md for details.
When do you need to create a data source separately? Creating a compute resource (Module 2) will automatically create the corresponding data source. Only pure storage-type databases (MySQL, PostgreSQL, Kafka, MongoDB, etc.) need separate creation.
Note: The following types do not currently support OpenAPI: INLINECODE36
Connection modes: UrlMode (self-hosted databases, requires host/port) or InstanceMode (Alibaba Cloud managed instances, requires instanceId). When unsure, proactively ask the user. InstanceMode is preferred.
Instance query APIs: references/data-sources/instance-apis.md
⚠️ Security Restriction — See Write API Execution Gate (Global Rules) for mandatory pre-check
IMPORTANT: The DeleteDataSource and UpdateDataSource APIs are supported by the DataWorks service, but this skill has disabled modifying or deleting data sources for security reasons. Before attempting any write operation, the agent MUST check the Write API Execution Gate section.
If you need to modify or delete a data source, please use the DataWorks console directly or contact your administrator.
Connection Mode Quick Reference
INLINECODE39 selection determines required fields. InstanceMode is preferred when both are available.
| Mode | Types | Count |
|---|
| Both | mysql, postgresql, sqlserver, polardb, polardbo, polardb-x-2-0, apsaradbforoceanbase, drds, starrocks, analyticdbformysql, analyticdbforpostgresql, milvus, mongodb, redis, elasticsearch, kafka | 16 |
| InstanceMode only |
hologres, dlf, opensearch | 3 |
|
UrlMode only | oracle, mariadb, dm, db2, tidb, vertica, gbase8a, kingbasees, saphana, snowflake, maxcompute, hive, clickhouse, doris, selectdb, redshift, hbase, lindorm, oss, s3, ftp, ssh, tablestore, memcache, graph_database, datahub, loghub, restapi, salesforce, httpfile, bigquery | 31 |
INLINECODE40 — not supported via OpenAPI.
Full details: references/data-sources/README.md
Workspace Mode
Environment note: Prod (Production) is for production data processing; Dev (Development) is for development and debugging, physically isolated from production.
INLINECODE41 — check DevEnvironmentEnabled:
- -
false → Simple Mode (1 data source, envType=Prod) - INLINECODE44 → Standard Mode (2 data sources, Dev + Prod, physically isolated)
Full mode comparison: references/data-sources/README.md
Task 1.1: Create Data Source (CreateDataSource)
CODEBLOCK2
ConnectionProperties common structure:
- - UrlMode: INLINECODE45
- InstanceMode: INLINECODE46
Special type structures (Oracle, MaxCompute, HBase, etc.): see references/data-sources/ per-type docs
Cross-account data source configuration: references/cross-account-datasources.md
Task 1.2: Get Data Source (GetDataSource)
CODEBLOCK3
Task 1.3: List Data Sources (ListDataSources)
CODEBLOCK4
Returns nested structure DataSources[].DataSource[]; Name/Type are in the outer layer, Id/Description in the inner layer.
Task 1.4: Test Connectivity (TestDataSourceConnectivity)
Process: Query resource group list → Let user select a resource group → Execute test.
CODEBLOCK5
If error "resourceGroupId is not in the project", the resource group needs to be bound first (confirm with user, then execute AssociateProjectToResourceGroup).
Module 2: Compute Resource Management
Supports Hologres, MaxCompute, Flink, Spark, and other types. The system will automatically create corresponding data sources upon creation.
⚠️ Security Restriction — See Write API Execution Gate (Global Rules) for mandatory pre-check
IMPORTANT: For security reasons, this skill does NOT support modifying or deleting compute resources. Before attempting any write operation, the agent MUST check the Write API Execution Gate section. These operations are disabled to prevent:
- - Accidental data loss or service interruption
- Disruption of running data development and scheduling tasks
- Unintended changes to production compute resource configurations
If you need to modify or delete a compute resource, please use the DataWorks console directly or contact your administrator.
authType Rules
- - Dev environment:
authType is fixed as INLINECODE51 - Prod environment: Options are
PrimaryAccount (recommended), TaskOwner, SubAccount, RamRole. Default recommendation is PrimaryAccount unless user has special requirements
authType details and guidance: references/compute-resources/README.md
Type-Specific Notes
- - Hologres: Only supports InstanceMode, requires
instanceId, INLINECODE58 - MaxCompute: Only supports UrlMode, requires
project, INLINECODE60
Full ConnectionProperties examples: references/compute-resources/README.md
Task 2.1: Create Compute Resource (CreateComputeResource)
CODEBLOCK6
After creation, use ListDataSources to verify the corresponding data source was auto-generated.
Task 2.2: Get Compute Resource (GetComputeResource)
CODEBLOCK7
Task 2.3: List Compute Resources (ListComputeResources)
CODEBLOCK8
Returns nested structure ComputeResources[].ComputeResource[]; Name/Type are in the outer layer, Id in the inner layer.
Module 3: Resource Group Management
Manages the full lifecycle of Serverless resource groups.
Task 3.1: Create Resource Group (CreateResourceGroup)
Requires AliyunBSSOrderAccess permission.
Interaction flow (let user choose at each step, DO NOT auto-select):
- 1. Query and select VPC:
aliyun vpc DescribeVpcs --user-agent AlibabaCloud-Agent-Skills --RegionId "<REGION_ID>" --PageSize 50
If the list is empty, guide the user to create a VPC;
DO NOT auto-create.
- 2. Query and select VSwitch:
CODEBLOCK10
- 3. Confirm name and specification → Execute creation:
CODEBLOCK11
After creation, poll GetResourceGroup until status becomes Normal (every 10 seconds, up to 10 minutes).
Task 3.2: Get Resource Group (GetResourceGroup)
CODEBLOCK12
Task 3.3: List Resource Groups (ListResourceGroups)
CODEBLOCK13
Task 3.4: Bind Resource Group (AssociateProjectToResourceGroup)
Process: Query available resource groups → Display list for user to select → Bind after user confirms.
CODEBLOCK14
Task 3.5: Query Binding Relationships
CODEBLOCK15
Task 3.6: Unbind Resource Group (DissociateProjectFromResourceGroup)
CODEBLOCK16
Success Verification
After all write operations, use the corresponding Get/List command to verify the result.
Common Errors
Check ConnectionProperties JSON and required parameters |
| EntityNotExists | Verify the ID and Region are correct |
| QuotaExceeded | Delete unused resources or request a quota increase |
| Duplicate* | Use a different name |
Region
Common: cn-hangzhou, cn-shanghai, cn-beijing, cn-shenzhen. Endpoint: dataworks.<region-id>.aliyuncs.com
Full list: references/related-apis.md
Best Practices
- 1. Query before action — Confirm current state before create operations
- Manage by environment — Manage Dev and Prod resources separately
- Verify operations — Use Get/List to verify after each write operation
- Proactive guidance — Suggest the next step after each step completes
- Protect data sources and compute resources — Never modify or delete data sources or compute resources via this skill; use the DataWorks console for such operations
Reference Links
Detailed configuration docs for each data source type (50 files) |
|
references/cross-account-datasources.md | Cross-account data source configuration guide |
|
references/compute-resources/README.md | Compute resource ConnectionProperties examples |
|
references/cli-installation-guide.md | Aliyun CLI installation guide |
|
references/ram-policies.md | RAM permission configuration and policy examples |
|
references/related-apis.md | API parameter details and Region Endpoints |
DataWorks 基础设施管理
在阿里云 DataWorks 工作空间中统一管理数据源、计算资源和资源组,支持创建和查询操作。
架构
DataWorks
├── 工作空间 ─── 查询和搜索工作空间
│ ├── 数据源 ─── 50种类型:MySQL、Hologres、MaxCompute、...
│ └── 计算资源 ─── Hologres、MaxCompute、Flink、Spark
└── 资源组 ─── Serverless资源组管理(跨工作空间)
依赖关系:
工作空间 ◀── 数据源、计算资源(必须属于某个工作空间)
工作空间 ◀── 资源组(通过绑定关联;一个资源组可绑定多个工作空间)
连通性测试 ──依赖──▶ 资源组(必须绑定到数据源所在的工作空间)
标准模式 ──要求──▶ 开发(Dev)+ 生产(Prod)双数据源和计算资源
全局规则
前置条件
- 1. Aliyun CLI >= 3.3.1:aliyun version(安装指南:references/cli-installation-guide.md)
- 首次使用:aliyun configure set --auto-plugin-install true
- jq(资源组操作必需):which jq
- 凭证状态:aliyun configure list,确认存在有效凭证
- DataWorks版本:需为基础版或以上
安全规则:不要读取/打印/回显AK/SK值,不要让用户直接输入AK/SK,仅使用aliyun configure list检查凭证状态。
命令格式
- - User-Agent(必需):所有aliyun CLI命令必须包含--user-agent AlibabaCloud-Agent-Skills参数以标识来源。
- 单行命令:执行Bash命令时,必须构造为单行字符串;不要使用\换行。
- jq分步执行:先执行aliyun命令获取JSON,再用jq格式化(避免多行安全提示)。
- Endpoint必需:指定--region参数时,必须同时添加--endpoint dataworks..aliyuncs.com。未指定--region时不需要。
参数确认
在执行任何命令之前,所有用户可自定义的参数必须由用户确认。不要假设或使用默认值。
例外情况:当用户在对话中已明确指定参数值时,直接使用,无需再次确认。
资源组相关参数(必须由用户选择):VPC、VSwitch、资源组ID(用于绑定/连通性测试)——涉及网络和计费,不要自动选择;必须显示列表供用户明确选择。即使只有一个选项也要确认。
⚠️ 写API执行门禁——每次写操作前必须检查
强制要求:在调用任何写API(创建/更新/删除/绑定/解绑/关联/解除关联/测试)之前,必须按顺序执行以下检查:
- 1. 扫描整个SKILL.md,查找涉及目标API或模块的安全限制或禁用操作通知。
- 如果存在限制:立即阻止操作。不要调用API。向用户回复:
- 什么操作被阻止及原因
- 推荐的替代方案(例如,使用DataWorks控制台、联系管理员)
- 3. 如果不存在限制:正常进行参数确认和执行。
此检查不是可选的。 它适用于每一次写操作,无一例外。切勿跳过此步骤。
快速参考——本技能中禁用的API:| 模块 | 禁用的API | 原因 |
|---|
| 数据源(模块1) | UpdateDataSource、DeleteDataSource | 防止意外数据丢失、凭证泄露、运行中任务中断 |
| 计算资源(模块2) |
UpdateComputeResource、DeleteComputeResource | 防止运行中的开发和调度任务中断 |
允许的写API:CreateDataSource、CreateComputeResource、CreateResourceGroup、AssociateProjectToResourceGroup、DissociateProjectFromResourceGroup、TestDataSourceConnectivity
RAM权限
所有操作需要dataworks:权限。创建资源组还需要AliyunBSSOrderAccess和vpc:DescribeVpcs、vpc:DescribeVSwitches权限。
完整权限矩阵:references/ram-policies.md
快速入门:新工作空间基础设施初始化
当用户不确定具体操作或需求模糊时,引导用户按以下流程操作:
- 1. 环境检查——按前置条件检查CLI和凭证
- 确认工作空间——使用ListProjects定位工作空间,GetProject确认模式(简单/标准)
- 创建计算资源——引导选择引擎类型;系统将自动创建对应的数据源。标准模式需要开发+生产成对创建。仅纯存储类型数据源(MySQL、Kafka等)需要单独创建数据源
- 创建/绑定资源组——查询现有资源组→让用户选择→绑定。没有可用资源组时引导创建
- 测试连通性——使用已绑定的资源组进行测试;全部通过时告知基础设施配置完成
每步完成后,主动建议下一步操作。
下一步引导
每次写操作完成并验证后,主动建议后续操作:
| 已完成操作 | 推荐下一步 |
|---|
| 创建计算资源 | 标准模式:创建对应的开发资源?;测试连通性? |
| 单独创建数据源 |
测试连通性?;标准模式:创建开发/生产环境数据源? |
| 创建资源组 | 绑定到工作空间? |
| 绑定资源组 | 测试数据源连通性? |
| 连通性测试通过 | 基础设施已就绪。 |
| 连通性测试失败 | 分析错误原因,引导修复 |
| 解绑资源组 | 绑定到其他工作空间? |
触发规则
触发场景:数据源创建/查询、计算资源创建/查询、资源组管理、基础设施初始化、口语化别名(DW数据库连接失败、配置holo/mc资源、创建rg)
不触发:数据开发任务、调度配置、MaxCompute表管理、数据集成任务、ECS/RDS/OSS、工作空间成员管理、数据质量/血缘/预览。独立的工作空间查询由alibabacloud-dataworks-workspace-manage技能处理。
交互流程
所有操作遵循:识别模块→环境检查→收集参数→执行命令→验证结果→引导下一步
常用别名:DW=DataWorks、holo=Hologres、mc/MC/odps=MaxCompute、pg=PostgreSQL、rg=资源组、ds=数据源、RDS=实例模式MySQL/PG/SQLServer、ADB=AnalyticDB
命名建议:数据源{类型}{业务}{用途}、计算资源{类型}{业务}、资源组dw{用途}rg{环境}
模块0:工作空间查询
如果alibabacloud-dataworks-workspace-manage技能可用,优先使用它进行工作空间查询。以下仅为备用方案。
bash
aliyun dataworks-public ListProjects --user-agent AlibabaCloud-Agent-Skills --Status Available --PageSize 100
按名称搜索时,先获取完整列表,然后使用jq过滤.PagingInfo.Projects[]中的Name/DisplayName。
模块1:数据源管理
支持50种数据源类型。详见references/data-sources/README.md。
何时需要单独创建数据源? 创建计算资源(模块2)时会自动创建对应的数据源。仅纯存储类型数据库(MySQL、PostgreSQL、Kafka、MongoDB等)需要单独创建。
注意:以下类型目前不支持OpenAPI:hdfs
连接模式:UrlMode(自建数据库,需要主机/端口)或InstanceMode(阿里云托管实例,需要实例ID)。不确定时主动询问用户。优先使用InstanceMode。
实例查询API:references/data-sources/instance-apis.md
⚠️ 安全限制——参见写API执行门禁(全局规则)进行强制预检查
重要:DeleteDataSource和UpdateDataSourceAPI受DataWorks服务支持,但本技能出于安全原因禁用了修改或删除数据源的操作。在尝试任何写操作之前,代理必须检查写