ADB Phone Control
Control Android devices through ADB with a structured observe-locate-act-verify loop.
Requirements
- - adb — Android Debug Bridge, must be in PATH
- python3 — Required for INLINECODE0
- ADBOUTPUTDIR (optional env var) — Directory for saving screenshots and UI dumps; defaults to current working directory
Permissions Used
This skill executes the following on the connected Android device:
- -
adb shell input — tap, swipe, text input - INLINECODE2 — UI hierarchy extraction
- INLINECODE3 — screen capture
- INLINECODE4 — ADBKeyboard IME input (for CJK text)
- INLINECODE5 — clipboard-based text input fallback
Prerequisites
Before any operation, verify device connection:
CODEBLOCK0
If no device found, instruct the user to:
- 1. Connect via USB and enable USB Debugging
- Or connect wirelessly: INLINECODE6
Core Principle
NEVER guess coordinates from screenshots. ALWAYS use UI hierarchy as the primary locator.
Screenshots are for human-readable context and visual verification. UI dumps give exact pixel bounds.
Operation Loop
Every interaction follows this cycle:
CODEBLOCK1
Do NOT skip the VERIFY step. UI transitions may take time; always confirm before proceeding.
Helper Functions
Source the helper script before starting any operation session:
CODEBLOCK2
Available Functions
| Function | Usage | Description |
|---|
| INLINECODE7 | INLINECODE8 | Dump UI hierarchy to INLINECODE9 |
| INLINECODE10 |
adb_screenshot | Capture screen to
/tmp/adb_screen.png |
|
adb_observe |
adb_observe | Dump UI + screenshot in one call |
|
adb_tap_text "Submit" | Find element by text, tap center | |
|
adb_tap_id "btn_send" | Find element by resource-id, tap center | |
|
adb_tap_xy 540 1200 | Tap exact coordinates | |
|
adb_swipe x1 y1 x2 y2 [ms] | Swipe between points (default 300ms) | |
|
adb_input_text "hello" | Type text (supports spaces and CJK) | |
|
adb_key <keycode> | Send keyevent (BACK, HOME, ENTER, etc.) | |
|
adb_hide_keyboard | Press BACK to dismiss keyboard | |
|
adb_scroll_down | Swipe up to scroll content down | |
|
adb_scroll_up | Swipe down to scroll content up | |
|
adb_long_press x y [ms] | Long press at coordinates (default 1000ms) | |
|
adb_wait [seconds] | Sleep before next action (default 1s) | |
|
adb_screen_size | Get device screen resolution | |
|
adb_launch_app <package> | Launch app by package name | |
|
adb_find_package <keyword> | Search installed packages by keyword | |
|
adb_bounds_center "bounds_string" | Parse "[x1,y1][x2,y2]" → center x y | |
Element Lookup Details
INLINECODE30 and adb_tap_id work by:
- 1. Running
adb_dump to get fresh UI hierarchy - Parsing the XML for matching
text= or resource-id= attributes - Extracting the
bounds="[x1,y1][x2,y2]" attribute - Computing center point: INLINECODE36
- Executing INLINECODE37
If multiple matches are found, the function taps the first match and prints a warning.
If no match is found, the function prints an error — fall back to adb_screenshot + Read tool for visual inspection.
Standard Operating Procedure
Phase 1: Setup
CODEBLOCK3
Phase 2: Navigate & Operate
For each interaction step:
CODEBLOCK4
Phase 3: Text Input
CODEBLOCK5
Critical Rules
1. UI Dump First, Screenshot Second
- -
uiautomator dump gives exact bounds, element states (enabled/focused/clickable), text content, and resource IDs - Screenshots only for: visual verification, understanding layout context, or when UI dump fails (e.g., animations, WebView content)
- When UI dump returns elements with
NAF="true", the element has No Accessible Framework info — use screenshot + coordinates as fallback
2. Keyboard Awareness
- - Always hide keyboard before tapping non-input elements. The keyboard shifts the layout, making UI dump bounds stale.
- After typing, call
adb_hide_keyboard then adb_dump before tapping anything else. - If
uiautomator dump returns ERROR: could not get idle state, the keyboard animation may still be running — wait 1s and retry.
3. Wait Strategy
- - After tap: wait 1s before next dump/screenshot
- After launching app: wait 2-3s
- After page navigation: wait 2s
- After typing: wait 0.5s
- If UI hasn't changed after action: wait longer, up to 5s, then re-check
- Never blindly chain actions without verification
4. Chinese / CJK Text Input
INLINECODE45 does not support CJK characters natively. The helper adb_input_text handles this by:
- - Using
adb shell am broadcast with ADBKeyboard if available - Falling back to clipboard-based input: copy to clipboard via
adb shell service call clipboard, then paste
If ADB Keyboard IME is installed (com.android.adbkeyboard), enable it:
CODEBLOCK6
5. Coordinate System
- - All coordinates are in physical pixels matching the device resolution
- INLINECODE50 returns the canonical resolution (e.g., 1080x2340)
- Screenshot pixel dimensions may differ from device resolution — never estimate coordinates from screenshot pixel positions
- Always derive coordinates from
uiautomator dump bounds
6. Handling Failures
If an action doesn't produce the expected result:
- 1. Re-dump UI hierarchy — the element may have moved or state changed
- Take a screenshot — visual context may reveal popups, loading states, or errors
- Check if the element is
enabled="true" and clickable="true" before tapping - If element is not found by text, try partial match or search by resource-id
- If the app is in a WebView, UI dump may not capture web elements — use screenshot + coordinate estimation as fallback
7. App Launch
Prefer adb_find_package + adb_launch_app over monkey command:
CODEBLOCK7
Limitations
- -
uiautomator dump doesn't work during animations — wait for idle state - WebView/Flutter/game content may not appear in UI hierarchy — use screenshot-based approach
- Some custom views may have empty text and no resource-id — use bounds + screenshot cross-reference
- Maximum ~100 actions per task is a reasonable limit to avoid infinite loops
ADB 手机控制
通过结构化的观察-定位-操作-验证循环,使用 ADB 控制 Android 设备。
要求
- - adb — Android 调试桥,必须已加入 PATH 环境变量
- python3 — appexplorer.py 所需
- ADBOUTPUT_DIR(可选环境变量)— 保存截图和 UI 转储文件的目录;默认为当前工作目录
使用的权限
本技能在已连接的 Android 设备上执行以下操作:
- - adb shell input — 点击、滑动、文本输入
- adb shell uiautomator dump — UI 层级结构提取
- adb shell screencap — 屏幕截图
- adb shell am broadcast — ADBKeyboard 输入法输入(用于中日韩文字)
- adb shell service call clipboard — 基于剪贴板的文本输入备用方案
前置条件
在任何操作之前,请验证设备连接:
bash
adb devices
如果未找到设备,请指示用户:
- 1. 通过 USB 连接并启用 USB 调试
- 或通过无线连接:adb connect :5555
核心原则
切勿根据截图猜测坐标。始终优先使用 UI 层级结构作为定位依据。
截图仅用于人类可读的上下文和视觉验证。UI 转储文件提供精确的像素边界。
操作循环
每次交互遵循以下循环:
┌─────────────────────────────────────────┐
│ 1. 观察 — 转储 UI + 截图 │
│ 2. 定位 — 按文本/ID 查找元素 │
│ 3. 操作 — 点击 / 滑动 / 输入 │
│ 4. 验证 — 再次截图 + 转储 │
│ 5. 重复 — 执行下一步或完成 │
└─────────────────────────────────────────┘
切勿跳过验证步骤。 UI 转换可能需要时间;在继续之前务必确认。
辅助函数
在开始任何操作会话之前,请加载辅助脚本:
bash
source $(dirname ${BASH_SOURCE[0]:-$0})/adb-helpers.sh 2>/dev/null || source ./adb-helpers.sh
可用函数
| 函数 | 用法 | 描述 |
|---|
| adbdump | adbdump | 将 UI 层级结构转储到 /tmp/uidump.xml |
| adbscreenshot |
adb
screenshot | 截取屏幕到 /tmp/adbscreen.png |
| adb
observe | adbobserve | 一次调用完成 UI 转储 + 截图 |
| adb
taptext 提交 | 按文本查找元素,点击中心 | |
| adb
tapid btn_send | 按资源 ID 查找元素,点击中心 | |
| adb
tapxy 540 1200 | 点击精确坐标 | |
| adb_swipe x1 y1 x2 y2 [毫秒] | 在两点之间滑动(默认 300 毫秒) | |
| adb
inputtext 你好 | 输入文本(支持空格和中日韩文字) | |
| adb_key <键码> | 发送按键事件(返回、主页、回车等) | |
| adb
hidekeyboard | 按返回键关闭键盘 | |
| adb
scrolldown | 向上滑动以向下滚动内容 | |
| adb
scrollup | 向下滑动以向上滚动内容 | |
| adb
longpress x y [毫秒] | 在坐标处长按(默认 1000 毫秒) | |
| adb_wait [秒] | 在下一步操作前等待(默认 1 秒) | |
| adb
screensize | 获取设备屏幕分辨率 | |
| adb
launchapp <包名> | 按包名启动应用 | |
| adb
findpackage <关键词> | 按关键词搜索已安装的包 | |
| adb
boundscenter 边界字符串 | 解析 [x1,y1][x2,y2] → 中心点 x y | |
元素查找详情
adbtaptext 和 adbtapid 的工作方式:
- 1. 运行 adb_dump 获取最新的 UI 层级结构
- 解析 XML,查找匹配的 text= 或 resource-id= 属性
- 提取 bounds=[x1,y1][x2,y2] 属性
- 计算中心点:((x1+x2)/2, (y1+y2)/2)
- 执行 adb shell input tap
如果找到多个匹配项,函数会点击第一个匹配项并打印警告。
如果未找到匹配项,函数会打印错误 — 回退到 adb_screenshot + 使用读取工具进行视觉检查。
标准操作流程
阶段 1:设置
bash
加载辅助函数
source $(dirname ${BASH_SOURCE[0]:-$0})/adb-helpers.sh 2>/dev/null || source ./adb-helpers.sh
验证连接
adb devices
获取屏幕分辨率(对滑动计算很重要)
adb
screensize
阶段 2:导航与操作
对于每个交互步骤:
bash
1. 观察当前状态
adb_observe
然后使用读取工具查看 /tmp/adb_screen.png 以查看屏幕
2. 定位并操作(优先使用文本/ID,而非原始坐标)
adb
taptext 创建
或:adbtapid iv_send
或作为最后手段:adbtapxy 540 2009
3. 等待转换
adb_wait 2
4. 验证结果
adb_screenshot
然后读取 /tmp/adb_screen.png 以确认操作生效
阶段 3:文本输入
bash
先点击输入框
adb
taptext 搜索...
adb_wait 1
输入文本
adb
inputtext 你好世界
在点击其他元素前关闭键盘
adb
hidekeyboard
adb_wait 1
现在可以安全点击其他按钮
adb
taptext 发送
关键规则
1. 先转储 UI,后截图
- - uiautomator dump 提供精确的边界、元素状态(启用/聚焦/可点击)、文本内容和资源 ID
- 截图仅用于:视觉验证、理解布局上下文,或 UI 转储失败时(例如动画、WebView 内容)
- 当 UI 转储返回带有 NAF=true 的元素时,该元素没有可访问框架信息 — 使用截图 + 坐标作为备用方案
2. 键盘感知
- - 在点击非输入元素之前,务必关闭键盘。 键盘会改变布局,使 UI 转储的边界失效。
- 输入后,在点击其他内容之前,先调用 adbhidekeyboard,然后调用 adb_dump。
- 如果 uiautomator dump 返回 ERROR: could not get idle state,键盘动画可能仍在运行 — 等待 1 秒后重试。
3. 等待策略
- - 点击后:在下次转储/截图前等待 1 秒
- 启动应用后:等待 2-3 秒
- 页面导航后:等待 2 秒
- 输入后:等待 0.5 秒
- 如果操作后 UI 未变化:延长等待时间,最多 5 秒,然后重新检查
- 切勿在未验证的情况下盲目链式操作
4. 中文 / 中日韩文字输入
adb shell input text 本身不支持中日韩字符。辅助函数 adbinputtext 通过以下方式处理:
- - 如果可用,使用 adb shell am broadcast 配合 ADBKeyboard
- 回退到基于剪贴板的输入:通过 adb shell service call clipboard 复制到剪贴板,然后粘贴
如果已安装 ADB Keyboard IME(com.android.adbkeyboard),请启用它:
bash
adb shell ime set com.android.adbkeyboard/.AdbIME
5. 坐标系统
- - 所有坐标均为物理像素,与设备分辨率匹配
- adb shell wm size 返回标准分辨率(例如 1080x2340)
- 截图像素尺寸可能与设备分辨率不同 — 切勿根据截图像素位置估算坐标
- 始终从 uiautomator dump 的边界中获取坐标
6. 处理失败
如果操作未产生预期结果:
- 1. 重新转储 UI 层级结构 — 元素可能已