Desktop Control Skill
Automate the desktop: screenshots, mouse, keyboard, window management, clipboard, and screen info.
All commands output JSON with "ok": true/false for reliable agent parsing.
Setup
Run the setup script to create a Python venv and install dependencies.
The skill directory is wherever this SKILL.md lives — all paths below are relative to the skill root.
Windows (PowerShell):
CODEBLOCK0
Linux / macOS:
CODEBLOCK1
How to Run
The Python executable lives in the venv. Resolve it relative to the skill directory:
| OS | Python Path |
|---|
| Windows | INLINECODE1 |
| Linux/Mac |
.venv/bin/python |
All commands follow this pattern:
CODEBLOCK2
Agent shorthand — set the working directory to the skill root, then:
CODEBLOCK3
Where <skillPath> is the absolute path to this skill's directory (the folder containing this SKILL.md).
Commands
Screenshot
CODEBLOCK4
Output: INLINECODE4
Mouse
CODEBLOCK5
Keyboard
CODEBLOCK6
Note: For non-ASCII text (CJK, emoji, etc.), key type automatically uses clipboard paste via Ctrl+V.
Window
CODEBLOCK7
Clipboard
CODEBLOCK8
Screen
CODEBLOCK9
Pixel output: INLINECODE6
Wait
CODEBLOCK10
Version
CODEBLOCK11
Scenarios
"Take a full screenshot and show me the desktop"
CODEBLOCK12
"Open Notepad and type some text"
CODEBLOCK13
"Maximize the Chrome window"
CODEBLOCK14
"Read clipboard content"
CODEBLOCK15
"Move and resize a window"
CODEBLOCK16
Safety
- - Failsafe ON by default: Move mouse to top-left corner (0,0) to abort any pyautogui operation.
- Use
--no-failsafe to disable (NOT recommended). - All actions return structured JSON for audit trail.
- Screenshots saved locally only — no network requests.
- Captures directory:
captures/ (relative to skill root).
Optional Dependencies
For advanced workflows, you may also install:
| Package | Use Case |
|---|
| INLINECODE9 | Read/write Excel files |
| INLINECODE10 |
Read/write Word documents |
|
pywin32 | Advanced Windows COM automation |
These are not installed by default setup scripts. Install manually if needed:
CODEBLOCK17
Troubleshooting
| Problem | Solution |
|---|
| INLINECODE12 | Run scripts/setup.ps1 (Windows) or scripts/setup.sh (Linux/Mac) |
| INLINECODE15 |
Use
window list to see available windows; matching is case-insensitive substring |
|
Failed to activate | Window may be minimized — the script tries
restore() first, but some apps resist |
| Screenshot is black | Common with GPU-accelerated apps; try capturing a region instead |
|
key type garbled for CJK | Should auto-use clipboard paste; verify
pyperclip is installed |
|
FAILSAFE triggered | Mouse hit (0,0) corner; this is intentional safety — reposition mouse and retry |
| Permission denied on Linux |
pyautogui needs X11/Wayland access; run from a GUI session, not SSH |
Limitations
- - Windows-primary: Full feature set (window management, all shortcuts) works on Windows. Linux/macOS have partial support via pyautogui but
pygetwindow behavior may differ. - Requires GUI session: Must run in a desktop session with a display. Headless servers or SSH sessions without X forwarding will fail.
- Single monitor:
screenshot captures the primary monitor by default. Multi-monitor capture requires --region. - Admin windows: Cannot interact with windows running as Administrator from a non-admin Python process (Windows UAC).
- Screen scaling: DPI scaling (125%, 150%) may cause coordinate mismatches. Use
screen size to verify actual resolution.
桌面控制技能
自动化桌面操作:截图、鼠标、键盘、窗口管理、剪贴板和屏幕信息。
所有命令输出 JSON 格式,包含 ok: true/false,便于代理可靠解析。
环境设置
运行设置脚本创建 Python 虚拟环境并安装依赖项。
技能目录即此 SKILL.md 文件所在目录——以下所有路径均相对于技能根目录。
Windows (PowerShell):
powershell
powershell -ExecutionPolicy Bypass -File scripts\setup.ps1
Linux / macOS:
bash
bash scripts/setup.sh
运行方式
Python 可执行文件位于虚拟环境中。相对于技能目录解析路径:
| 操作系统 | Python 路径 |
|---|
| Windows | .venv\Scripts\python.exe |
| Linux/Mac |
.venv/bin/python |
所有命令遵循以下模式:
scripts/desktop.py <命令> [子命令] [参数]
代理简写——将工作目录设置为技能根目录,然后:
exec({ command: .venv\\Scripts\\python.exe scripts\\desktop.py <命令> [参数], workdir: <技能路径> })
其中 <技能路径> 是此技能目录的绝对路径(包含此 SKILL.md 的文件夹)。
命令
截图
bash
全屏截图(保存到 captures/ 目录,文件名带时间戳)
python scripts/desktop.py screenshot
保存到指定路径
python scripts/desktop.py screenshot -o C:\tmp\shot.png
区域截图(左 上 宽 高)
python scripts/desktop.py screenshot --region 0 0 800 600
输出:{ok: true, path: ..., width: 1920, height: 1080}
鼠标
bash
当前位置
python scripts/desktop.py mouse pos
移动到坐标
python scripts/desktop.py mouse move 500 300
python scripts/desktop.py mouse move 500 300 --duration 0.5
点击(左/右/中键,单击/双击/三击)
python scripts/desktop.py mouse click 500 300
python scripts/desktop.py mouse click 500 300 --button right
python scripts/desktop.py mouse click --clicks 2
从 (100,100) 拖拽到 (400,400)
python scripts/desktop.py mouse drag 100 100 400 400 --duration 1.0
滚动(正数=向上,负数=向下)
python scripts/desktop.py mouse scroll 3
python scripts/desktop.py mouse scroll -5
python scripts/desktop.py mouse scroll 3 --direction horizontal
键盘
bash
输入 ASCII 文本
python scripts/desktop.py key type hello world
按键间带间隔输入
python scripts/desktop.py key type slow typing --interval 0.1
输入 Unicode / 中日韩文本(自动使用剪贴板粘贴)
python scripts/desktop.py key type 你好世界
按下单个按键(使用 --times 重复)
python scripts/desktop.py key press enter
python scripts/desktop.py key press tab --times 3
快捷键组合
python scripts/desktop.py key hotkey ctrl c
python scripts/desktop.py key hotkey ctrl shift s
python scripts/desktop.py key hotkey alt f4
注意: 对于非 ASCII 文本(中日韩文字、表情符号等),key type 自动通过 Ctrl+V 使用剪贴板粘贴。
窗口
bash
列出所有窗口(标题 + 句柄)
python scripts/desktop.py window list
激活(置前)——匹配标题子字符串,不区分大小写
python scripts/desktop.py window activate Chrome
python scripts/desktop.py window activate 1234567 # 按句柄
最小化 / 最大化
python scripts/desktop.py window minimize Notepad
python scripts/desktop.py window maximize Code
关闭窗口
python scripts/desktop.py window close Notepad
获取窗口信息(位置、大小、状态)
python scripts/desktop.py window info Chrome
调整窗口大小(宽度 高度,单位像素)
python scripts/desktop.py window resize Notepad 800 600
移动窗口(x y 位置)
python scripts/desktop.py window move Notepad 100 100
剪贴板
bash
读取剪贴板内容
python scripts/desktop.py clipboard get
写入剪贴板
python scripts/desktop.py clipboard set copied text
屏幕
bash
获取屏幕分辨率
python scripts/desktop.py screen size
获取 (x, y) 位置的像素颜色
python scripts/desktop.py screen pixel 100 200
像素输出:{ok: true, x: 100, y: 200, r: 255, g: 128, b: 0, hex: #ff8000}
等待
bash
等待 N 秒(在自动化序列中有用)
python scripts/desktop.py wait 2.5
版本
bash
python scripts/desktop.py --version
场景示例
截取全屏截图并显示桌面
exec: .venv\Scripts\python.exe scripts\desktop.py screenshot
→ 返回包含路径的 JSON → 使用图像工具显示截图
打开记事本并输入一些文本
exec: .venv\Scripts\python.exe scripts\desktop.py key hotkey win r
exec: .venv\Scripts\python.exe scripts\desktop.py wait 0.5
exec: .venv\Scripts\python.exe scripts\desktop.py key type notepad
exec: .venv\Scripts\python.exe scripts\desktop.py key press enter
exec: .venv\Scripts\python.exe scripts\desktop.py wait 1
exec: .venv\Scripts\python.exe scripts\desktop.py key type Hello from desktop-control!
最大化 Chrome 窗口
exec: .venv\Scripts\python.exe scripts\desktop.py window maximize Chrome
读取剪贴板内容
exec: .venv\Scripts\python.exe scripts\desktop.py clipboard get
移动并调整窗口大小
exec: .venv\Scripts\python.exe scripts\desktop.py window move Notepad 0 0
exec: .venv\Scripts\python.exe scripts\desktop.py window resize Notepad 1024 768
安全
- - 默认启用安全保护:将鼠标移动到左上角 (0,0) 可中止任何 pyautogui 操作。
- 使用 --no-failsafe 禁用(不推荐)。
- 所有操作返回结构化 JSON 用于审计追踪。
- 截图仅保存在本地——无网络请求。
- 捕获目录:captures/(相对于技能根目录)。
可选依赖
对于高级工作流,您还可以安装:
| 包 | 用途 |
|---|
| openpyxl | 读写 Excel 文件 |
| python-docx |
读写 Word 文档 |
| pywin32 | 高级 Windows COM 自动化 |
默认设置脚本不安装这些。需要时手动安装:
.venv\Scripts\pip install openpyxl python-docx pywin32
故障排除
| 问题 | 解决方案 |
|---|
| pyautogui not installed | 运行 scripts/setup.ps1(Windows)或 scripts/setup.sh(Linux/Mac) |
| Window not found |
使用 window list 查看可用窗口;匹配是不区分大小写的子字符串 |
| Failed to activate | 窗口可能已最小化——脚本会先尝试 restore(),但某些应用会拒绝 |
| 截图显示黑色 | GPU 加速应用常见;尝试改为捕获区域 |
| key type 中日韩文字乱码 | 应自动使用剪贴板粘贴;验证 pyperclip 是否已安装 |
| 触发 FAILSAFE | 鼠标到达 (0,0) 角落;这是有意的安全保护——重新定位鼠标后重试 |
| Linux 上权限被拒绝 | pyautogui 需要 X11/Wayland 访问权限;从 GUI 会话运行,而非 SSH |
限制
- - 主要支持 Windows:完整功能集(窗口管理、所有快捷键)在 Windows 上可用。Linux/macOS 通过 pyautogui 提供部分支持,但 py