E2B Desktop Skill
Control a headless Linux desktop (Ubuntu + XFCE) via the e2b-desktop Python SDK.
All scripts live in scripts/ and wrap the SDK in bash for easy agent use.
Prerequisites
CODEBLOCK0
State Management
- -
start_sandbox.sh saves the sandbox ID to INLINECODE3 - All other scripts auto-load it from there
- Override anytime with INLINECODE4
- Sandboxes survive script exit — reconnect with INLINECODE5
Scripts
| Script | Usage | Description |
|---|
| INLINECODE6 | INLINECODE7 | Create sandbox; optionally start VNC stream |
| INLINECODE8 |
[SANDBOX_ID] | Kill sandbox and remove state |
|
screenshot.sh |
[OUTPUT_FILE] | Take screenshot → PNG (default:
/tmp/e2b_screenshot.png) |
|
click.sh |
X Y | Left click at coordinates |
|
right_click.sh |
X Y | Right click |
|
double_click.sh |
X Y | Double click |
|
middle_click.sh |
X Y | Middle click |
|
move_mouse.sh |
X Y | Move cursor (no click) |
|
drag.sh |
X1 Y1 X2 Y2 | Click-drag between two points |
|
scroll.sh |
AMOUNT | Scroll (positive=up, negative=down) |
|
type_text.sh |
"text" | Type text at current cursor |
|
press_key.sh |
KEY [KEY2...] | Press key or combo (e.g.
ctrl c) |
|
run_command.sh |
"cmd" | Run shell command inside sandbox |
|
open_url.sh |
URL_OR_PATH | Open URL or file in default app |
|
launch_app.sh |
APP_NAME | Launch app (e.g.
firefox,
vscode) |
|
stream_start.sh |
[--auth] | Start VNC stream;
--auth for password-protected |
|
stream_stop.sh |
(none) | Stop VNC stream |
|
get_cursor.sh |
(none) | Print
CURSOR_X and
CURSOR_Y |
|
get_screen_size.sh |
(none) | Print
SCREEN_WIDTH and
SCREEN_HEIGHT |
|
list_windows.sh |
[APP_NAME] | List app windows or show active window |
|
wait.sh |
MILLISECONDS | Wait N ms (sandbox-side) |
Computer-Use Agent Loop Pattern
CODEBLOCK1
Key Notes
- -
scroll.sh AMOUNT: positive = scroll up, negative = scroll down (matches desktop.scroll(amount) API) - INLINECODE56 : multiple args become a key combo via INLINECODE57
- INLINECODE58 exits with the sandbox command's exit code
- All mouse coordinate scripts accept integer pixel coordinates matching sandbox resolution
- VNC stream: only one active stream at a time; stop before switching windows
E2B 桌面技能
通过 e2b-desktop Python SDK 控制无头 Linux 桌面(Ubuntu + XFCE)。
所有脚本位于 scripts/ 目录,将 SDK 封装为 bash 命令,便于智能体使用。
前置条件
bash
pip install e2b-desktop
export E2BAPIKEY=e2b_*
状态管理
- - startsandbox.sh 将沙箱 ID 保存至 ~/.e2bstate
- 其他脚本自动从该文件加载沙箱 ID
- 可通过 export E2BSANDBOXID= 随时覆盖
- 沙箱在脚本退出后仍存活——可通过 Sandbox.connect(sandbox_id) 重新连接
脚本列表
| 脚本 | 用法 | 描述 |
|---|
| startsandbox.sh | [--resolution 1280x800] [--timeout 300] [--stream] | 创建沙箱;可选启动 VNC 流 |
| killsandbox.sh |
[SANDBOX_ID] | 终止沙箱并清除状态 |
| screenshot.sh | [OUTPUT
FILE] | 截屏 → PNG(默认:/tmp/e2bscreenshot.png) |
| click.sh | X Y | 在指定坐标左键单击 |
| right_click.sh | X Y | 右键单击 |
| double_click.sh | X Y | 双击 |
| middle_click.sh | X Y | 中键单击 |
| move_mouse.sh | X Y | 移动光标(无点击) |
| drag.sh | X1 Y1 X2 Y2 | 两点之间点击拖拽 |
| scroll.sh | AMOUNT | 滚动(正数=向上,负数=向下) |
| type_text.sh | text | 在当前光标位置输入文本 |
| press_key.sh | KEY [KEY2...] | 按键或组合键(例如 ctrl c) |
| run_command.sh | cmd | 在沙箱内执行 shell 命令 |
| open
url.sh | URLOR_PATH | 在默认应用中打开 URL 或文件 |
| launch
app.sh | APPNAME | 启动应用(例如 firefox、vscode) |
| stream_start.sh | [--auth] | 启动 VNC 流;--auth 启用密码保护 |
| stream_stop.sh | (无参数) | 停止 VNC 流 |
| get
cursor.sh | (无参数) | 输出 CURSORX 和 CURSOR_Y |
| get
screensize.sh | (无参数) | 输出 SCREEN
WIDTH 和 SCREENHEIGHT |
| list
windows.sh | [APPNAME] | 列出应用窗口或显示活动窗口 |
| wait.sh | MILLISECONDS | 等待 N 毫秒(沙箱端) |
计算机使用智能体循环模式
bash
SCRIPTS=skills/e2b-desktop/scripts
1. 启动沙箱
source <($SCRIPTS/start_sandbox.sh --resolution 1280x800 --stream)
echo 沙箱: $SANDBOX_ID
echo 查看地址: $STREAM_URL
2. 智能体循环
while true; do
# 捕获屏幕
$SCRIPTS/screenshot.sh /tmp/screen.png
# 发送给 LLM,解析动作...(你的代码)
ACTION=$(llm_decide /tmp/screen.png)
case $ACTION in
click:*) IFS=: read -r _ x y <<< $ACTION; $SCRIPTS/click.sh $x $y ;;
type:*) $SCRIPTS/type_text.sh ${ACTION#type:} ;;
key:*) $SCRIPTS/press_key.sh ${ACTION#key:} ;;
done) break ;;
esac
done
3. 清理
$SCRIPTS/kill_sandbox.sh
关键说明
- - scroll.sh AMOUNT:正数 = 向上滚动,负数 = 向下滚动(与 desktop.scroll(amount) API 一致)
- presskey.sh ctrl c:多个参数通过 desktop.press([ctrl, c]) 形成组合键
- runcommand.sh 以沙箱命令的退出码退出
- 所有鼠标坐标脚本接受与沙箱分辨率匹配的整数像素坐标
- VNC 流:同一时间仅允许一个活动流;切换窗口前需先停止