RAM Permission Diagnosis and Repair
Execution constraint: When executing aliyun CLI commands directly (not generating commands for the user to copy), always append --user-agent AlibabaCloud-Agent-Skills. Do not add this flag to commands shown to users in Path B output or Path C console guidance.
Overview
When a RAM permission error is detected, run through these steps:
- 1. Quick Analysis — parse raw error fields (no tool calls), output a brief summary, ask the user to choose analysis depth
- Deep Analysis — (only if user selects path B) decode if needed, run gap analysis, classify root cause
- Generate Recommendations — least-privilege authorization plan
- Execute Repair — present repair options and wait for user to choose
Permission level (L0–L3) is the agent's internal routing state, inferred implicitly from API call results during the flow. It determines diagnostic depth and available repair paths. Never declare or describe the level to the user. See references/diagnose-flow.md for level definitions.
Step 1: Quick Analysis
Parse raw error fields without any tool calls, then let the user decide how deep to go.
1a. Extract from raw error
- -
error_code: e.g., NoPermission, Forbidden, INLINECODE6 - INLINECODE7 : e.g., INLINECODE8
- INLINECODE9 :
SubUser / AssumedRoleUser / RootUser (from AuthPrincipalType) - INLINECODE14 : UserId or role:session (from
AuthPrincipalDisplayName) - INLINECODE16 :
ImplicitDeny or ExplicitDeny (from NoPermissionType) - INLINECODE20 : e.g.,
AccountLevelIdentityBasedPolicy, AssumeRolePolicy (from PolicyType) - INLINECODE24 : retain
EncodedDiagnosticMessage if present, for use in Step 2 if needed
1b. Output brief summary
Based on the extracted fields, output a concise summary: who is affected, what action is missing, initial root cause inference.
1c. Present depth choice and wait for selection
Present the following and wait for the user to select — do not proceed until a choice is made:
- - A. Quick path (recommended when: ImplicitDeny + all key fields present + common service) — skip Step 2, generate recommendations directly from raw fields and built-in knowledge
- B. Deep path (recommended when: ExplicitDeny, missing fields, or unfamiliar service) — run full Step 2 analysis for a more precise result.
> Requires two optional permissions:
ram:DecodeDiagnosticMessage (decode encoded errors) and system policy
AliyunRAMReadOnlyAccess (gap analysis). Missing permissions limit specific capabilities but the flow continues.
- - Skip — stop here; output manual troubleshooting links
Mark the recommended option clearly and briefly explain why.
If user selects A: proceed to Step 3. Note in the recommendation that it is based on quick analysis; the user can request deep analysis at any time.
If user selects B: proceed to Step 2.
If user selects Skip: output error summary, links to RAM documentation (https://help.aliyun.com/document_detail/93733.html) and RAM console (https://ram.console.aliyun.com/policies), and a note on how to restart diagnosis.
Edge case — ExplicitDeny with path A forced: if NoPermissionType = ExplicitDeny and the user still selects A, explain that the specific Deny policy cannot be identified without deep analysis, and provide a limited recommendation with explicit uncertainty noted.
Step 2: Deep Analysis
Entered only when the user selects path B in Step 1.
First attempt classification using the raw fields from Step 1. DecodeDiagnosticMessage is a supplement — invoke it only when raw data is insufficient to classify with confidence.
Decode when raw data alone cannot resolve the root cause: e.g., ExplicitDeny is present (need MatchedPolicies), AccessDeniedDetail was absent, or PolicyType is missing. For cases where NoPermissionType, AuthAction, AuthPrincipalType, and PolicyType are all available and point to a clear root cause, skip decode and proceed directly.
Transcribe EncodedDiagnosticMessage from the raw error and call:
CODEBLOCK0
If the call returns EntityNotExist, re-run the original failing command and save its output to a temp file (use the system temp dir; name the file after the command context, e.g. /tmp/aliyun_ecs_stopinstance.txt). Extract EncodedDiagnosticMessage from the file and retry the decode. If the field is not found in the file, mark as L0 and continue.
If SubUser identity needs UserName resolution before gap analysis, see references/diagnose-flow.md → Identity Resolution. If resolution fails, mark as L0 and continue.
Root cause categories:
- - MissingAction — identity policy lacks the required Action (most common)
- ExplicitDeny — a Deny statement blocks access (may be identity policy or CP control policy)
- TrustPolicy — role trust policy does not allow the caller to assume the role
- STSInsufficient — STS temporary credential lacks permission; root cause is on the originating Role
- TokenExpired — STS token has expired
- SLRMissing — service-linked role has not been created
- ResourcePolicy — resource-side policy (e.g., OSS Bucket Policy) is restricting access
For gap analysis trigger rules and per-root-cause handling details, see references/diagnose-flow.md.
Gap analysis (when triggered): query current policies attached to the identity, then compare against the required Action. Use ListPoliciesForUser (SubUser), ListPoliciesForRole (AssumedRoleUser), or ListControlPolicies (RootUser). For Custom policies, fetch the policy document with GetPolicyVersion. System policies: use built-in knowledge, do not call GetPolicyVersion.
When permissions are insufficient: if DecodeDiagnosticMessage fails (L0) or policy queries fail (L1), inform the user of the limitation and provide ready-to-use permission request materials for a RAM admin — two independent options: ① decode permission (ram:DecodeDiagnosticMessage) as a custom policy; ② RAM read access via system policy AliyunRAMReadOnlyAccess (covers gap analysis). Either or both can be requested independently. Then continue to Step 3 without waiting.
Step 3: Generate Recommendations
Before generating, check for caller skill permission hints (see references/diagnose-flow.md → Coverage Check).
Knowledge source priority:
- 1. Built-in knowledge — for popular services (ECS, OSS, RDS, FC, SLB, VPC, SLS, STS, etc.), use known Action semantics directly. Reference
references/hot-services-ram.md. - Caller skill hints — if
ram-policies.md was found, use as supplementary context - Web search — search
{product} RAM authorization site:help.aliyun.com; prefer manually maintained docs with business examples over auto-generated Action tables - System policy fallback — recommend
AliyunXxxReadOnlyAccess or AliyunXxxFullAccess with a note to tighten further
Custom policy naming: suggest a name based on service and task semantics (e.g., ai-agent-ecs-permissions), confirm once, reuse in the same session.
System policy: attach directly with a single command, no naming needed.
For the Trust Policy root cause path, recommendations differ — see references/diagnose-flow.md → Handling Each Root Cause.
After presenting the recommendation, add a brief note: the current plan is a starting point; the user can request further refinement at any time — for example, scoping down to specific resources, adding conditions, or using resource-level policies (such as OSS bucket policies) instead of identity-level grants.
Step 4: Execute Repair
Before executing any write operation, present the change summary and all available paths to the user, then wait for the user to select a path — do not proceed or output any commands until the user has chosen:
- - Target (user or role name)
- Change summary (policy name, action, undo method)
- Path options (always present all that are available for the current level — never skip any):
-
A. Direct CLI execution — agent runs commands now
(only at L2)
-
B. Output CLI commands — user copies and runs in their own terminal
(all levels)
-
C. Console guidance — step-by-step in RAM console
(all levels)
-
Skip — do not execute
For pre-query requirements before write operations, and full CLI command examples, see references/ram-cli-commands.md and references/diagnose-flow.md.
Path A: agent executes via Bash. On success → L3 confirmed; report result and undo command. On NoPermission → switch to Path B automatically.
Path B at L0/L1: output incremental Statement JSON only, with a note that existing policies could not be read and the user must merge manually.
Path B at L2: offer two sub-options: ① incremental Statement only, ② complete merged policy JSON.
Path C: provide the RAM console entry (https://ram.console.aliyun.com/policies) and step-by-step instructions for completing the change in the console UI.
After repair, suggest the user retry the previously failed operation. Offer to retry on their behalf if requested.
RAM权限诊断与修复
执行约束:当直接执行aliyun CLI命令(而非生成供用户复制的命令)时,始终追加--user-agent AlibabaCloud-Agent-Skills。请勿将此标志添加到路径B输出或路径C控制台指引中展示给用户的命令中。
概述
检测到RAM权限错误时,按以下步骤执行:
- 1. 快速分析 — 解析原始错误字段(无需工具调用),输出简要摘要,询问用户选择分析深度
- 深度分析 — (仅当用户选择路径B时) 按需解码,执行差距分析,分类根本原因
- 生成建议 — 最小权限授权方案
- 执行修复 — 呈现修复选项并等待用户选择
权限级别(L0–L3) 是代理的内部路由状态,在流程中根据API调用结果隐式推断。它决定诊断深度和可用的修复路径。切勿向用户声明或描述该级别。级别定义请参见references/diagnose-flow.md。
步骤1:快速分析
无需任何工具调用即可解析原始错误字段,然后让用户决定分析深度。
1a. 从原始错误中提取
- - errorcode:例如NoPermission、Forbidden、InvalidSecurityToken
- missingaction:例如ecs:StopInstance
- principaltype:SubUser / AssumedRoleUser / RootUser(来自AuthPrincipalType)
- principaldisplayname:用户ID或角色:会话(来自AuthPrincipalDisplayName)
- nopermissiontype:ImplicitDeny或ExplicitDeny(来自NoPermissionType)
- policytype:例如AccountLevelIdentityBasedPolicy、AssumeRolePolicy(来自PolicyType)
- encoded_message:保留EncodedDiagnosticMessage(如有),供步骤2使用
1b. 输出简要摘要
基于提取的字段,输出简洁摘要:受影响的对象、缺失的操作、初步根本原因推断。
1c. 呈现深度选择并等待选择
呈现以下选项并等待用户选择 — 在做出选择前不继续执行:
- - A. 快速路径 (推荐场景:ImplicitDeny + 所有关键字段齐全 + 常见服务) — 跳过步骤2,直接从原始字段和内置知识生成建议
- B. 深度路径 (推荐场景:ExplicitDeny、字段缺失或不熟悉的服务) — 执行完整的步骤2分析以获得更精确的结果。
> 需要两个可选权限:ram:DecodeDiagnosticMessage(解码编码错误)和系统策略AliyunRAMReadOnlyAccess(差距分析)。权限缺失会限制特定能力,但流程继续执行。
清晰标记推荐选项并简要说明原因。
如果用户选择A:继续执行步骤3。在建议中注明基于快速分析;用户可随时请求深度分析。
如果用户选择B:继续执行步骤2。
如果用户选择跳过:输出错误摘要、RAM文档链接(https://help.aliyun.com/document_detail/93733.html)和RAM控制台链接(https://ram.console.aliyun.com/policies),以及如何重新启动诊断的说明。
边界情况 — 强制路径A下的ExplicitDeny:如果NoPermissionType = ExplicitDeny且用户仍选择A,说明未经深度分析无法识别具体的Deny策略,并提供带有明确不确定性的有限建议。
步骤2:深度分析
仅在用户在步骤1中选择路径B时进入。
首先尝试使用步骤1中的原始字段进行分类。DecodeDiagnosticMessage是补充手段 — 仅在原始数据不足以可靠分类时调用。
当原始数据无法单独解决根本原因时解码:例如存在ExplicitDeny(需要MatchedPolicies)、AccessDeniedDetail缺失或PolicyType缺失。对于NoPermissionType、AuthAction、AuthPrincipalType和PolicyType均可用且指向明确根本原因的情况,跳过解码直接继续。
从原始错误中转录EncodedDiagnosticMessage并调用:
bash
aliyun ram DecodeDiagnosticMessage --EncodedDiagnosticMessage <转录值>
如果调用返回EntityNotExist,重新运行原始失败命令并将其输出保存到临时文件(使用系统临时目录;按命令上下文命名文件,例如/tmp/aliyunecsstopinstance.txt)。从文件中提取EncodedDiagnosticMessage并重试解码。如果文件中未找到该字段,标记为L0并继续。
如果SubUser身份在差距分析前需要解析UserName,请参见references/diagnose-flow.md → 身份解析。如果解析失败,标记为L0并继续。
根本原因类别:
- - MissingAction — 身份策略缺少所需操作(最常见)
- ExplicitDeny — Deny语句阻止访问(可能是身份策略或CP控制策略)
- TrustPolicy — 角色信任策略不允许调用者扮演该角色
- STSInsufficient — STS临时凭证缺少权限;根本原因在原始角色上
- TokenExpired — STS令牌已过期
- SLRMissing — 服务关联角色尚未创建
- ResourcePolicy — 资源端策略(例如OSS存储桶策略)限制访问
差距分析触发规则和各根本原因处理详情,请参见references/diagnose-flow.md。
差距分析(触发时):查询当前附加到身份的现有策略,然后与所需操作进行比较。使用ListPoliciesForUser(SubUser)、ListPoliciesForRole(AssumedRoleUser)或ListControlPolicies(RootUser)。对于自定义策略,使用GetPolicyVersion获取策略文档。系统策略:使用内置知识,不调用GetPolicyVersion。
权限不足时:如果DecodeDiagnosticMessage失败(L0)或策略查询失败(L1),告知用户该限制,并为RAM管理员提供即用型权限申请材料 — 两个独立选项:① 解码权限(ram:DecodeDiagnosticMessage)作为自定义策略;② 通过系统策略AliyunRAMReadOnlyAccess的RAM读取访问权限(涵盖差距分析)。任一或两者均可独立申请。然后继续执行步骤3,无需等待。
步骤3:生成建议
在生成前,检查调用者技能权限提示(参见references/diagnose-flow.md → 覆盖范围检查)。
知识来源优先级:
- 1. 内置知识 — 对于热门服务(ECS、OSS、RDS、FC、SLB、VPC、SLS、STS等),直接使用已知的操作语义。参考references/hot-services-ram.md。
- 调用者技能提示 — 如果找到ram-policies.md,用作补充上下文
- 网络搜索 — 搜索{产品} RAM授权 site:help.aliyun.com;优先选择带有业务示例的手动维护文档,而非自动生成的操作表
- 系统策略回退 — 推荐AliyunXxxReadOnlyAccess或AliyunXxxFullAccess,并注明需进一步收紧
自定义策略命名:基于服务和任务语义建议名称(例如ai-agent-ecs-permissions),确认一次,同一会话中复用。
系统策略:使用单个命令直接附加,无需命名。
对于信任策略根本原因路径,建议有所不同 — 参见references/diagnose-flow.md → 处理各根本原因。
呈现建议后,添加简短说明:当前方案是起点;用户可随时请求进一步优化 — 例如,缩小到特定资源、添加条件或使用资源级策略(如OSS存储桶策略)替代身份级授权。
步骤4:执行修复
在执行任何写操作前,向用户呈现变更摘要和所有可用路径,然后等待用户选择路径 — 在用户做出选择前不继续执行或输出任何命令:
- - 目标(用户名或角色名)
- 变更摘要(策略名称、操作、撤销方法)
- 路径选项(始终呈现当前级别可用的所有选项 — 绝不跳过任何选项):
-
A. 直接CLI执行 — 代理立即运行命令
(仅L2)
-
B. 输出CLI命令 — 用户复制并在自己的终端中运行
(所有级别)
-
C. 控制台指引 — RAM控制台中的分步说明
(所有级别)
-
跳过 — 不执行
写操作前的预查询要求及完整CLI命令示例,请参见references/ram-cli-commands.md和references/diagnose-flow.md。
路径A:代理通过Bash执行。成功 → L3确认;报告结果和撤销命令。NoPermission → 自动切换到路径B。
L0/L1下的路径B:仅输出增量Statement JSON,并注明无法读取现有策略,