Kubernetes Agent Swarm — Platform Operations

A multi-agent system for Kubernetes and OpenShift platform operations. Seven specialized agents work together as a coordinated swarm.

Runtime Requirements

Requirement	Required	Description
INLINECODE0	✅ Yes	Kubernetes CLI — must be in PATH
INLINECODE1

Optional cloud CLIs (aws, az, gcloud, rosa) — only needed for managed cluster operations.

Installation

CODEBLOCK0

Or install individual agents:
CODEBLOCK1

The Swarm — Agent Roster

Agent	Code Name	Domain
Orchestrator	Jarvis	Task routing, coordination, standups
Cluster Ops

How It Works

This is an instruction-only skill. Agents receive markdown instructions describing what commands to run and how to interpret output. No executable scripts are included — the agent translates instructions into actions using the host's installed CLI tools.

Session Setup

Before using the swarm, establish cluster context:

CODEBLOCK2

Agent Communication

Agents communicate via @mentions in shared task comments:
CODEBLOCK3

Escalation Path

1. Agent detects issue
Agent attempts resolution within guardrails
If blocked → @mention another agent or escalate to human
P1 incidents → all relevant agents auto-notified

Heartbeat Schedule

CODEBLOCK4

Agent Capabilities

What Agents CAN Do

- Read cluster state (kubectl get, kubectl describe, oc get)
Deploy via GitOps (argocd app sync, Flux reconciliation)
Create documentation and reports
Investigate and triage incidents
Provision standard resources (namespaces, quotas, RBAC)
Run health checks and audits
Query metrics and logs

What Agents CANNOT Do (Human-in-the-Loop Required)

- Delete production resources
Modify cluster-wide policies
Make direct changes to secrets without rotation workflow
Perform irreversible cluster upgrades
Approve production deployments (can prepare, human approves)

Key Principles

- Roles over genericism — Each agent has a defined domain
Files over mental notes — Only files persist between sessions
Human-in-the-loop — Critical actions require approval
Guardrails over freedom — Define what agents can and cannot do
Audit everything — Every action logged

File Structure

CODEBLOCK5

Detailed Agent Documentation

See individual SKILL.md files for each agent's full capabilities, personality, and workflow instructions.

Kubernetes Agent Swarm — 平台运维

一个用于Kubernetes和OpenShift平台运维的多智能体系统。七个专业智能体以协调集群的方式协同工作。

运行时要求

要求	必需	描述
kubectl	✅ 是	Kubernetes CLI — 必须在PATH环境变量中
oc

可选 | OpenShift CLI — OCP/ROSA/ARO环境需要 | | helm | 可选 | 用于GitOps智能体的Helm操作 | | jq | 可选 | 用于JSON输出解析 | | KUBECONFIG | ✅ 是 | 通过环境变量或~/.kube/config访问集群 |

可选的云CLI工具（aws、az、gcloud、rosa）— 仅托管集群操作需要。

安装

bash
clawhub install kubernetes

或安装单个智能体：
bash
clawhub install orchestrator
clawhub install cluster-ops
clawhub install gitops
clawhub install security
clawhub install observability
clawhub install artifacts
clawhub install developer-experience

集群 — 智能体名册

智能体	代号	领域
编排器	Jarvis	任务路由、协调、站会
集群运维

工作原理

这是一个仅指令技能。智能体接收Markdown格式的指令，描述要运行的命令以及如何解释输出。不包含可执行脚本——智能体使用主机已安装的CLI工具将指令转化为操作。

会话设置

使用集群前，先建立集群上下文：

bash

验证访问

kubectl cluster-info
kubectl get nodes

对于OpenShift

oc status

智能体通信

智能体通过在共享任务评论中使用@提及进行通信：

@Shield 请在同步前检查payment-service v3.2的RBAC配置。
@Pulse CPU峰值与部署相关还是外部流量导致？
@Atlas 预发布集群需要再增加2个工作节点。

升级路径

1. 智能体检测到问题
智能体在安全护栏内尝试解决
如果受阻 → @提及其他智能体或升级给人工处理
P1事件 → 自动通知所有相关智能体

心跳调度

/5 * Atlas、Pulse、Shield （快速响应：事件、告警、CVE）
/10 * Flow、Cache （计划任务：部署、升级）
/15 * Desk、Orchestrator （批量任务：入职、站会）

智能体能力

智能体可以执行的操作

- 读取集群状态（kubectl get、kubectl describe、oc get）
通过GitOps部署（argocd app sync、Flux协调）
创建文档和报告
调查和分类事件
配置标准资源（命名空间、配额、RBAC）
运行健康检查和审计
查询指标和日志

智能体不能执行的操作（需要人工介入）

- 删除生产资源
修改集群级策略
未经轮换流程直接修改密钥
执行不可逆的集群升级
批准生产部署（可准备，但需人工批准）

关键原则

- 角色优于通用 — 每个智能体有明确的领域
文件优于记忆 — 只有文件能在会话间持久化
人工介入 — 关键操作需要审批
安全护栏优于自由 — 定义智能体可以做什么和不能做什么
审计一切 — 每个操作都记录在案

文件结构

kubernetes/
├── SKILL.md # 本文件 — 集群组合
├── AGENTS.md # 集群配置和协议
├── skills/
│ ├── orchestrator/SKILL.md # Jarvis — 任务路由
│ ├── cluster-ops/SKILL.md # Atlas — 集群运维
│ ├── gitops/SKILL.md # Flow — GitOps
│ ├── security/SKILL.md # Shield — 安全
│ ├── observability/SKILL.md # Pulse — 监控
│ ├── artifacts/SKILL.md # Cache — 制品
│ └── developer-experience/SKILL.md # Desk — 开发者体验
├── memory/MEMORY.md # 长期智能体记忆
├── working/WORKING.md # 会话进度
└── logs/LOGS.md # 操作审计追踪

详细智能体文档

请参阅各智能体的SKILL.md文件，了解完整能力、个性特征和工作流程说明。

kubernetesKubernetes

kubernetes

Kubernetes Agent Swarm — Platform Operations

Runtime Requirements

Installation

The Swarm — Agent Roster