UI Element Ops

Parse one or more screenshots into a machine-readable JSON schema with:

- type (normalized UI element type)
INLINECODE1 and INLINECODE2
INLINECODE3 (OCR/caption content when available)
INLINECODE4 flag
optional overlay image with labeled boxes
desktop actions via scripts/operate_ui.py (click/type/key/hotkey/screenshot)
element query and orchestration via scripts/operate_ui.py (find, wait)
coordinate calibration profile for multi-display/DPI/window offset (calibrate)

Quick Start

1. Prepare runtime once per machine:

CODEBLOCK0

2. Parse one screenshot:

CODEBLOCK1

3. Read outputs:

- INLINECODE10
INLINECODE11

4. One-step capture + parse with randomized names:

CODEBLOCK2

Workflow

1. Confirm screenshot path and desired output path.
Run scripts/bootstrap_omniparser_env.sh when .venv or OmniParser weights are missing.
Run scripts/run_parse_ui.sh for standard parsing.
Report absolute output paths and summary counts: total, clickable, by_type.
Call out obvious quality risks for tiny text or dense icon layouts.
Execute desktop actions when requested:

- list elements: python3 skills/ui-element-ops/scripts/operate_ui.py list --elements <json> - find elements: python3 skills/ui-element-ops/scripts/operate_ui.py find --elements <json> --type button --text-contains login - wait for appear/disappear: python3 skills/ui-element-ops/scripts/operate_ui.py wait --elements <json> --state appear --text-contains continue - click by id: python3 skills/ui-element-ops/scripts/operate_ui.py click --elements <json> --id e_0001 - screenshot: python3 skills/ui-element-ops/scripts/operate_ui.py screenshot (defaults to user tmp dir) - calibrate coordinates: INLINECODE23

Tunables

- Edit type mapping keywords in references/type_rules.example.json.
Use advanced parser args via scripts/parse_ui.py --help.
Use --use-paddleocr only when paddleocr/paddlepaddle are installed.

Outputs

- Main JSON output:

- schema_version, pipeline, image, counts, elements - each element has id, type, bbox_px, bbox_norm, text, clickable

- Overlay PNG output:

- same screenshot with labeled detection boxes

Failure Handling

- Missing dependencies or weights: run bootstrap script again.
Permission/cache errors under $HOME: keep temporary caches under /tmp (handled by run script).
CPU-only machine: expect slower inference.
Performance note: parse/capture-and-parse commands are heavy; avoid very tight loops and reuse recent elements.json when possible.
Headless environment limitation:

- usable without GUI: parse/list/find/wait/calibrate on existing files. - requires GUI session: click/click-xy/type/key/hotkey/screenshot/screen-info.

UI 元素操作

将一张或多张截图解析为机器可读的 JSON 结构，包含：

- type（标准化 UI 元素类型）
bboxpx 和 bboxnorm
text（可用时的 OCR/字幕内容）
clickable 标记
可选带标签框的叠加图像
通过 scripts/operateui.py 执行的桌面操作（点击/输入/按键/热键/截图）
通过 scripts/operateui.py 实现的元素查询与编排（find、wait）
用于多显示器/DPI/窗口偏移的坐标校准配置文件（calibrate）

快速开始

1. 每台机器只需准备一次运行环境：

bash skills/ui-element-ops/scripts/bootstrapomniparserenv.sh $PWD

2. 解析一张截图：

bash skills/ui-element-ops/scripts/runparseui.sh /abs/path/to/1.jpeg

3. 读取输出：

- .elements.json
.overlay.png

4. 一步完成截图+解析，使用随机名称：

bash skills/ui-element-ops/scripts/captureandparse.sh

工作流程

1. 确认截图路径和期望的输出路径。
当缺少 .venv 或 OmniParser 权重时，运行 scripts/bootstrapomniparserenv.sh。
运行 scripts/runparseui.sh 进行标准解析。
报告绝对输出路径和汇总计数：total、clickable、by_type。
对于小字体或密集图标布局，指出明显的质量风险。
按需执行桌面操作：

- 列出元素：python3 skills/ui-element-ops/scripts/operate_ui.py list --elements - 查找元素：python3 skills/ui-element-ops/scripts/operate_ui.py find --elements --type button --text-contains login - 等待出现/消失：python3 skills/ui-element-ops/scripts/operate_ui.py wait --elements --state appear --text-contains continue - 按 ID 点击：python3 skills/ui-element-ops/scripts/operateui.py click --elements --id e0001 - 截图：python3 skills/ui-element-ops/scripts/operate_ui.py screenshot（默认保存到用户临时目录） - 校准坐标：python3 skills/ui-element-ops/scripts/operate_ui.py calibrate --parsed-size --actual-size

可调参数

- 在 references/typerules.example.json 中编辑类型映射关键词。
通过 scripts/parseui.py --help 使用高级解析器参数。
仅在安装了 paddleocr/paddlepaddle 时使用 --use-paddleocr。

输出

- 主要 JSON 输出：

- schema_version、pipeline、image、counts、elements - 每个元素包含 id、type、bboxpx、bboxnorm、text、clickable

- 叠加 PNG 输出：

- 带有标记检测框的同一截图

故障处理

- 缺少依赖或权重：重新运行引导脚本。
$HOME 下的权限/缓存错误：将临时缓存保存在 /tmp 下（由运行脚本处理）。
仅 CPU 机器：推理速度会较慢。
性能说明：解析/截图并解析命令较重；避免非常紧密的循环，尽可能重用最近的 elements.json。
无头环境限制：

- 无 GUI 可用：对现有文件进行解析/列表/查找/等待/校准。 - 需要 GUI 会话：点击/点击坐标/输入/按键/热键/截图/屏幕信息。

ui-element-opsUI元素操作

ui-element-ops

UI Element Ops

Quick Start

Workflow

Tunables

Outputs

Failure Handling

UI 元素操作

快速开始

工作流程

可调参数

输出

故障处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ui-element-opsUI元素操作

ui-element-ops

UI Element Ops

Quick Start

Workflow

Tunables

Outputs

Failure Handling

UI 元素操作

快速开始

工作流程

可调参数

输出

故障处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement