screen-vision
macOS screen OCR & click automation CLI powered by Apple Vision + ScreenCaptureKit.
IMPORTANT: Auto-Setup (run first)
Before running any screen-vision command, check if the binary exists. If not, run the setup script:
CODEBLOCK0
This installs screen-vision (via Homebrew or source build) and cliclick automatically.
Requirements
- - macOS 14.0+ (Sonoma)
- Screen Recording permission (System Settings > Privacy & Security > Screen Recording)
Commands
| Command | Description | Output |
|---|
| INLINECODE2 | Full OCR | JSON array INLINECODE3 |
| INLINECODE4 |
OCR list | Human-readable text with coordinates |
|
screen-vision find "text" [--app NAME] | Find text | JSON
{text, x, y, found} |
|
screen-vision has "text" [--app NAME] | Check text exists | Exit code 0 (found) / 1 (not found) |
|
screen-vision tap "text" [--app NAME] [--retry N] | Find + click | JSON
{text, x, y, tapped} |
|
screen-vision wait "text" [--timeout SEC] | Poll until text appears | JSON
{text, x, y, found} |
Capture Priority
CODEBLOCK1
Usage Patterns
OCR a specific app window
CODEBLOCK2
Check if text is visible (for conditionals)
CODEBLOCK3
Click on text with retry
CODEBLOCK4
Wait for text to appear (e.g. loading complete)
CODEBLOCK5
Full screen OCR as JSON (pipe to jq)
CODEBLOCK6
$ARGUMENTS Handling
Parse the user's request to determine which command to run:
- - "화면에 뭐 있어?" / "what's on screen?" → INLINECODE12
- "~찾아" / "find ~" → INLINECODE13
- "~클릭해" / "click ~" → INLINECODE14
- "~보여?" / "is ~ visible?" → INLINECODE15
- "~뜰 때까지 기다려" / "wait for ~" → INLINECODE16
screen-vision
macOS 屏幕OCR与点击自动化命令行工具,基于Apple Vision和ScreenCaptureKit构建。
重要:自动设置(请先运行)
在运行任何screen-vision命令之前,请检查二进制文件是否存在。如果不存在,请运行设置脚本:
bash
command -v screen-vision &>/dev/null || bash ${CLAUDESKILLDIR}/setup.sh
这将自动安装screen-vision(通过Homebrew或源码构建)和cliclick。
系统要求
- - macOS 14.0+ (Sonoma)
- 屏幕录制权限(系统设置 > 隐私与安全性 > 屏幕录制)
命令
| 命令 | 描述 | 输出 |
|---|
| screen-vision ocr [--app NAME] | 完整OCR | JSON数组 [{text, x, y, w, h, confidence}] |
| screen-vision list [--app NAME] |
OCR列表 | 带坐标的人类可读文本 |
| screen-vision find text [--app NAME] | 查找文本 | JSON {text, x, y, found} |
| screen-vision has text [--app NAME] | 检查文本是否存在 | 退出码0(找到)/ 1(未找到) |
| screen-vision tap text [--app NAME] [--retry N] | 查找+点击 | JSON {text, x, y, tapped} |
| screen-vision wait text [--timeout SEC] | 轮询直到文本出现 | JSON {text, x, y, found} |
捕获优先级
--region x,y,w,h > --app AppName > 全屏(默认)
使用模式
OCR特定应用窗口
bash
screen-vision list --app Safari
检查文本是否可见(用于条件判断)
bash
screen-vision has 提交 --app 我的应用 && echo 已找到 || echo 未找到
点击文本并重试
bash
screen-vision tap 确定 --app 我的应用 --retry 3
等待文本出现(例如加载完成)
bash
screen-vision wait 完成 --timeout 30
全屏OCR输出JSON(管道到jq)
bash
screen-vision ocr | jq .[].text
$ARGUMENTS 处理
解析用户请求以确定要运行的命令:
- - 화면에 뭐 있어? / 屏幕上有什么? → screen-vision list
- ~찾아 / 查找~ → screen-vision find text
- ~클릭해 / 点击~ → screen-vision tap text
- ~보여? / ~可见吗? → screen-vision has text
- ~뜰 때까지 기다려 / 等待~出现 → screen-vision wait text