Ghosthand

Ghosthand is a loopback HTTP server on the Android phone. All interaction is via HTTP GET, POST, and a small amount of DELETE to http://127.0.0.1:5583.

Always do this first:

Step	Command	Purpose
1	INLINECODE4	Is Ghosthand alive?
2

Use this skill to operate Ghosthand as an Android agent substrate.

Ghosthand is not generic Android advice. It is a local runtime with a route-based control surface. Use this skill only when the task is actually about Ghosthand routes, Ghosthand capability state, or acting through Ghosthand.

What Ghosthand is

Ghosthand exposes a local HTTP API for Android observation and control. The important categories are:

- runtime and health: /ping, /health, /state, /device, /foreground, /commands, INLINECODE13
structured UI inspection: /screen, /tree, /focused, INLINECODE17
semantic or coordinate interaction: /click, /tap, /input, /type, /setText, /scroll, /swipe, /longpress, INLINECODE26
app and navigation control: /back, /home, INLINECODE29
sensing and transport: /screenshot, /wait, /clipboard, INLINECODE33

Treat /commands as the current machine-readable capability catalog when route details matter.

When to use this skill

Use it when the task requires any of the following:

- checking whether Ghosthand is running or ready
checking whether a capability is both authorized by Android and allowed by Ghosthand policy
inspecting the current Android surface before acting
finding or clicking UI targets by text, desc, or INLINECODE37
recovering from Ghosthand misses or ambiguous action results
using Ghosthand to type, scroll, swipe, wait, read clipboard, or read notifications
debugging Ghosthand-specific behaviors such as partial output, stale assumptions about selectors, or snapshot-scoped node IDs

Do not use it for:

- generic Android usage advice unrelated to Ghosthand
root-only methods that Ghosthand does not expose
imaginary routes or undocumented behavior when /commands can answer directly

Operating model

1. Start from truth, not intent

Before acting, establish three things:

1. Is Ghosthand alive and usable?
What surface is actually visible now?
Which selector surface and route shape are most plausible for the target?

Typical order:

1. read INLINECODE39
read INLINECODE40
read /commands if route shape, selector support, or response fields are uncertain
read /screen?source=accessibility for the current actionable surface
if accessibility read is unavailable or clearly insufficient, retry with /screen?source=hybrid or INLINECODE44
only then choose /find, /click, or INLINECODE47

2. Capability access has two layers

A capability is usable only when both are true:

- Android/system authorization exists
Ghosthand policy allows the capability

Do not confuse “permission granted” with “usable now”. Read /state before diagnosing failures, especially for accessibility and screenshot capture.

INLINECODE49 is the best live summary. /capabilities is the fuller catalog-style view when an agent needs route-capability mapping and availability details.

3. Node IDs are snapshot-scoped

Treat nodeId as ephemeral. Do not cache it across fresh observations unless the snapshot context is clearly the same. Prefer re-resolving via /screen, /find, or selector-based /click instead of assuming old node IDs remain valid.

Primitive selection

`/screen`

Use /screen first when you need a compact actionable view. The default mode is source=accessibility.

Use it to answer:

- what is visible now
which elements are actionable, editable, or scrollable
whether coordinates are trustworthy enough for INLINECODE58
whether the current surface even contains the target

Important details:

- source=accessibility is the default and supports editable, scrollable, clickable, and package filters
INLINECODE64 or source=ocr is useful when accessibility is temporarily unavailable or operationally insufficient
INLINECODE66 is for compact orientation, not detailed targeting
INLINECODE67 is a hint that a lightweight screenshot fetch is available; /screen does not embed image bytes

If /screen reports partialOutput=true, warnings, foreground drift, or fallback hints, do not assume you saw the whole surface. Escalate to /tree, /screenshot, or a non-accessibility screen mode before blaming the app.

`/tree`

Use /tree when you need fuller structure, raw hierarchy, or to inspect why /screen may have omitted or shaped output. Use it for diagnosis and structural truth, not as your default first read.

`/find`

Use /find when you already have a selector hypothesis and want a bounded lookup.

Prefer it when you need:

- selector testing before interaction
disambiguation by INLINECODE78
confirmation that a target exists before a coordinate fallback
inspection of whether a visible label is discoverable on text, contentDesc, resourceId, or only as a focused node

A miss usually means one of four things:

- wrong screen
wrong selector surface
wrong match semantics
target not exposed the way you assumed

Supported strategies are text, textContains, contentDesc, contentDescContains, resourceId, and focused. text, desc, and id are convenience aliases in the request body; Ghosthand normalizes them internally.

`/click`

Prefer /click over /tap when you have a plausible semantic target. Ghosthand can resolve wrapper targets, bounded selector fallbacks, and clickable ancestors, then expose how it actually landed on an actionable node.

Use /click first for:

- text-labeled controls
content-description labeled controls
stable resource IDs
cases where ancestor click resolution may help

For selector-based /click, Ghosthand treats clickable=true as the default unless you explicitly set clickable=false. That default is optimized for action, not inspection. Use /find or disable clickable resolution when you need to inspect the raw matched node.

`/tap`

Use /tap only when coordinates come from the current trusted surface. Do not guess coordinates. Coordinate fallback is justified only after semantic targeting has narrowed the uncertainty.

`/input` and `/setText`

Use /input for the focused editable field. Prefer it over /type when you need explicit text mutation or Enter dispatch semantics.

Use /type only for simpler focused text entry when the current focus is already correct.

Use /setText only when you have a trusted same-snapshot editable nodeId and need to target that exact node.

When entering text, do not assume the Enter key will successfully submit or confirm the input. If Enter does not work or the field remains uncommitted, use the on-screen IME confirmation action instead, typically the confirm button in the bottom-right corner of the keyboard.

`/scroll` and `/swipe`

Use /scroll when the goal is container movement or list advancement.

Use /swipe when the task is truly geometric.

Do not interpret performed=true as proof that content changed. Check returned change fields, then verify with /screen, /tree, or /wait.

`/wait`

Use /wait after actions that may change UI state.

There are two different uses:

- GET /wait: wait for UI change and inspect final settled state
INLINECODE119: wait for a selector condition

Do not confuse changed=false with action failure. It only means a transition was not observed during the wait window. Re-check the final surface before concluding the action failed.

For POST /wait, the supported strategies are bounded and query rules matter: focused takes no query, while text/content-description/resource-id waits require one.

`/clipboard`, `/notify`, `/screenshot`

Use /clipboard as a transport primitive for long text or repeated entry.

Use /notify to read or post local notifications only when the task is explicitly notification-related.

Use /screenshot when visual truth is needed and structured UI output is insufficient, ambiguous, or suspected stale.

Important details:

- /screenshot supports GET and INLINECODE131
width and height must be provided together or omitted together
screenshot capability is separately policy-gated from accessibility
if /screen publishes previewPath, use that exact path before inventing a new screenshot size

Selector judgment

Selectors are not interchangeable.

`text`

Use text when the visible label is likely the actual text field of the node.

`desc`

Use desc when the control is icon-like, accessibility-labeled, nav-like, or visibly sparse. Many controls that look label-based are actually better matched through content description.

`id`

Use id when a meaningful resourceId is present. This is often the strongest selector.

Exact vs contains

Do not over-read exact-match misses.

If the visible phrase may be part of a longer text block, retry with a contains-style strategy where the route supports it. A visible phrase on screen is not proof that exact text lookup should succeed.

INLINECODE141 supports explicit contains strategies. /click can use bounded contains fallback internally and tells you when it did so; do not mistake that for an exact selector hit.

Recovery rules

When a Ghosthand action misses, do not branch into random retries. Make one bounded correction:

- re-read INLINECODE143
if accessibility is unavailable or weak, re-read /screen?source=hybrid or INLINECODE145
switch text to desc or INLINECODE148
switch exact semantics to contains semantics when justified
if text entry succeeded but submission did not, use the on-screen IME confirm action instead of retrying Enter
move from /click to /tap only after trustworthy coordinates exist
use /capabilities when the route exists but capability availability is ambiguous
use /wait to settle state before the next action

Repeated misses should be classified, not brute-forced.

Minimal workflows

Check whether Ghosthand is ready

1. read INLINECODE153
read INLINECODE154
if needed, read INLINECODE155
if needed, read INLINECODE156

Operate a visible control safely

1. read INLINECODE157
choose text, desc, or INLINECODE160
call INLINECODE161
call /wait or re-read INLINECODE163
if accessibility surface truth is weak, retry /screen?source=hybrid or INLINECODE165
only use /tap if semantic action remains weak but coordinates are trusted

Enter text and confirm it reliably

1. focus the intended editable field
use /input for the focused field, /type for simple focused typing, or /setText for a trusted same-snapshot editable INLINECODE170
verify the text appears in the field or the focused surface reflects the update
if Enter does not submit or confirm the input, use the on-screen IME confirm action, typically the bottom-right keyboard button
call /wait or re-read /screen to confirm the post-input state

Diagnose a miss

1. confirm Ghosthand and capability state with INLINECODE173
re-read INLINECODE174
inspect selector surface mismatch
escalate to /screen?source=hybrid, /tree, or /screenshot if accessibility output is partial, unavailable, or misleading
retry one bounded correction

Reporting standard

When summarizing a Ghosthand run, report only:

- what route you used
what state changed
whether the target was achieved
the first narrow failing step if it was not
the next best correction

Do not dump logs unless the task is explicitly diagnostic.

Reference files

Detailed route notes are in resources/references/ghosthand-api-quick-reference.md.

Ghosthand

Ghosthand 是安卓手机上的一个回环 HTTP 服务器。所有交互均通过 HTTP 的 GET、POST 以及少量 DELETE 请求发送至 http://127.0.0.1:5583。

始终优先执行以下步骤：

步骤	命令	目的
1	GET /ping	Ghosthand 是否存活？
2

使用此技能将 Ghosthand 作为安卓代理基础进行操作。

Ghosthand 并非通用的安卓建议。它是一个具有基于路由控制面的本地运行时。仅当任务确实涉及 Ghosthand 路由、Ghosthand 能力状态或通过 Ghosthand 执行操作时，才使用此技能。

Ghosthand 是什么

Ghosthand 为安卓观察和控制暴露了一个本地 HTTP API。重要的类别包括：

- 运行时与健康状态：/ping、/health、/state、/device、/foreground、/commands、/capabilities
结构化 UI 检查：/screen、/tree、/focused、/find
语义或坐标交互：/click、/tap、/input、/type、/setText、/scroll、/swipe、/longpress、/gesture
应用与导航控制：/back、/home、/recents
感知与传输：/screenshot、/wait、/clipboard、/notify

当路由细节重要时，将 /commands 视为当前机器可读的能力目录。

何时使用此技能

当任务需要以下任何一项时使用：

- 检查 Ghosthand 是否正在运行或已就绪
检查某项能力是否既获得安卓授权又被 Ghosthand 策略允许
在操作前检查当前安卓界面
通过 text、desc 或 id 查找或点击 UI 目标
从 Ghosthand 未命中或模糊的操作结果中恢复
使用 Ghosthand 进行输入、滚动、滑动、等待、读取剪贴板或读取通知
调试 Ghosthand 特定行为，例如部分输出、关于选择器的过时假设或快照作用域内的节点 ID

不要将其用于：

- 与 Ghosthand 无关的通用安卓使用建议
Ghosthand 未暴露的仅 root 方法
当 /commands 可以直接回答时，使用虚构的路由或未记录的行为

操作模型

1. 从事实出发，而非意图

在操作之前，确认三件事：

1. Ghosthand 是否存活且可用？
当前实际可见的界面是什么？
对于目标，哪个选择器界面和路由形状最合理？

典型顺序：

1. 读取 /ping
读取 /state
如果路由形状、选择器支持或响应字段不确定，读取 /commands
读取 /screen?source=accessibility 获取当前可操作界面
如果无障碍读取不可用或明显不足，使用 /screen?source=hybrid 或 /screen?source=ocr 重试
然后才选择 /find、/click 或 /tap

2. 能力访问有两层

一项能力仅在以下两者都为真时才可用：

- 安卓/系统授权存在
Ghosthand 策略允许该能力

不要将权限已授予与当前可用混淆。在诊断失败之前读取 /state，特别是对于无障碍和截图捕获。

/state 是最佳的实时摘要。当代理需要路由-能力映射和可用性详情时，/capabilities 是更完整的目录式视图。

3. 节点 ID 是快照作用域内的

将 nodeId 视为临时性的。除非快照上下文明显相同，否则不要跨新的观察结果缓存它。优先通过 /screen、/find 或基于选择器的 /click 重新解析，而不是假设旧的节点 ID 仍然有效。

原语选择

/screen

当需要紧凑的可操作视图时，首先使用 /screen。默认模式是 source=accessibility。

使用它来回答：

- 当前可见什么
哪些元素是可操作的、可编辑的或可滚动的
坐标是否足够可信以用于 /tap
当前界面是否包含目标

重要细节：

- source=accessibility 是默认值，支持 editable、scrollable、clickable 和 package 过滤器
当无障碍暂时不可用或操作上不足时，source=hybrid 或 source=ocr 很有用
summaryOnly=true 用于紧凑定位，而非详细定位
previewPath 是一个提示，表明轻量级截图获取可用；/screen 不嵌入图像字节

如果 /screen 报告 partialOutput=true、警告、前台漂移或回退提示，不要假设你看到了整个界面。在归咎于应用之前，升级到 /tree、/screenshot 或非无障碍屏幕模式。

/tree

当需要更完整的结构、原始层次结构或检查为什么 /screen 可能省略或塑造了输出时，使用 /tree。将其用于诊断和结构事实，而不是作为默认的首次读取。

/find

当已经有了选择器假设并希望进行有界查找时，使用 /find。

当需要以下内容时优先使用：

- 交互前的选择器测试
通过 index 消除歧义
在坐标回退之前确认目标存在
检查可见标签是否可通过 text、contentDesc、resourceId 发现，或者仅作为焦点节点

未命中通常意味着以下四种情况之一：

- 错误的界面
错误的选择器界面
错误的匹配语义
目标未以你假设的方式暴露

支持的策略是 text、textContains、contentDesc、contentDescContains、resourceId 和 focused。text、desc 和 id 是请求体中的便利别名；Ghosthand 内部会将其规范化。

/click

当有合理的语义目标时，优先使用 /click 而非 /tap。Ghosthand 可以解析包装目标、有界选择器回退和可点击的祖先，然后暴露它如何实际落在一个可操作的节点上。

首先使用 /click 的场景：

- 文本标签控件
内容描述标签控件
稳定的资源 ID
祖先点击解析可能有帮助的情况

对于基于选择器的 /click，除非你明确设置 clickable=false，否则 Ghosthand 默认将 clickable=true 作为默认值。该默认值针对操作进行了优化，而非检查。当你需要检查原始匹配节点时，使用 /find 或禁用可点击解析。

/tap

仅当坐标来自当前可信界面时才使用 /tap。不要猜测坐标。只有在语义定位缩小了不确定性之后，坐标回退才是合理的。

/input 和 /setText

对焦点可编辑字段使用 /input。当需要显式文本变更或 Enter 键发送语义时，优先使用它而非 /type。

仅当当前焦点已经正确且需要更简单的焦点文本输入时，使用 /type。

仅当拥有可信的同一快照可编辑 nodeId 并需要定位到该确切节点时，使用 /setText。

输入文本时，不要假设 Enter 键会成功提交或确认输入。如果 Enter 键不起作用或字段仍未提交，请改用屏幕上的 IME 确认操作，通常是键盘右下角的确认按钮。

/scroll 和 /swipe

当目标是容器移动或列表推进时，使用 /scroll。

当任务确实是几何操作时，使用 /swipe。

不要将 performed=true 解释为内容已更改的证据。检查返回的更改字段，然后通过 /screen、/tree 或 /wait 进行验证。

/wait

在可能改变 UI 状态的操作之后使用 /wait。

有两种不同的用途：

- GET /wait：等待 UI 变化并检查最终稳定状态
POST /wait：等待选择器条件

不要将 changed=false 与操作失败混淆。它仅意味着在等待窗口期间未观察到转换。在断定操作失败之前，重新检查最终界面。

对于 POST /wait，支持的策略是有界的，查询规则很重要：focused 不带查询，而文本/内容描述/资源 ID 等待需要查询。

/clipboard、/notify、/screenshot

将 /clipboard 用作长文本或重复输入的传输原语。

仅当任务明确与通知

ghosthand幽灵手操作

ghosthand

Ghosthand

What Ghosthand is

When to use this skill

Operating model

1. Start from truth, not intent

2. Capability access has two layers

3. Node IDs are snapshot-scoped