Airpoint — AI Computer Use for macOS
Airpoint gives you an AI agent that can see and control a Mac — open apps,
click UI elements, read on-screen text, type, scroll, drag, and manage windows.
You give it a natural-language instruction and it carries out the task
autonomously by perceiving the screen (accessibility tree + screenshots + visual
locator), planning actions, executing them, and verifying the result.
Everything runs through the airpoint CLI.
Requirements
- - macOS (Apple Silicon or Intel)
- Airpoint app — must be running. Download from airpoint.app.
- Airpoint CLI — the
airpoint command must be on PATH. Install it from the Airpoint app: Settings → Plugins → Install CLI.
Setup
Before using Airpoint's AI agent, the user must configure it in the Airpoint
app (Settings → Assistant):
- 1. AI model API key (required). Set an API key for the chosen provider:
-
OpenAI (recommended): model
gpt-5.1 with reasoning effort
low gives
the best balance of cost, speed, and quality.
- Anthropic and Google Gemini are also supported.
- 2. Gemini API key (recommended). Even when using OpenAI or Anthropic as the
primary model, a Google Gemini API key enables the visual locator — a
secondary model (
gemini-3-flash-preview) that finds UI targets on screen
by analyzing screenshots. Without it, the agent relies on the accessibility
tree only.
- 3. macOS permissions. The app prompts on first launch, but verify these are
granted in System Settings → Privacy & Security:
-
Accessibility — required for mouse/keyboard control.
-
Screen Recording — required for screenshots and screen perception.
- Camera is only needed for hand tracking (not for the AI agent).
- 4. Custom instructions (optional). In Settings → Assistant, add custom
instructions to tailor the agent's behavior (e.g., preferred language,
apps to avoid, workflows to follow).
If the user reports that airpoint ask fails or the agent can't see the
screen, ask them to verify steps 1–3 above.
How to use
- 1. Run
airpoint ask "<your instruction>" to send a task to the on-device agent. - The command blocks until the agent finishes (up to 5 minutes) and returns:
- A text summary of what the agent did and the result.
- One or more
screenshot file paths showing the screen state after the task.
- 3. Read the text output to confirm whether the task succeeded.
- If screenshots were returned, show the last screenshot to the user as
visual confirmation of the result.
- 5. If something went wrong or the task is stuck, run
airpoint stop to cancel.
Example flow:
CODEBLOCK0
After receiving this, show the screenshot to the user so they can see what happened.
Commands
Ask the AI agent to do something (primary command)
This is the most important command. It sends a natural-language task to
Airpoint's built-in computer-use agent which can see the screen, move the
mouse, click, type, scroll, open apps via Spotlight, manage windows, and verify
its own actions.
CODEBLOCK1
Stop a running task
CODEBLOCK2
Cancels the currently running assistant task. Use this if a task is stuck or
taking too long.
Capture a screenshot
CODEBLOCK3
Returns a screenshot of the current display. Useful for verifying state before
or after issuing an ask command.
Check status
CODEBLOCK4
Returns app version and current state (tracking active, etc.).
Hand tracking (secondary)
Airpoint also supports hands-free cursor control via camera-based hand tracking.
These commands start/stop that feature:
CODEBLOCK5
Read or change settings
CODEBLOCK6
Common settings: cursor.sensitivity (default 1.0), cursor.acceleration
(default true), scroll.sensitivity (default 1.0), scroll.inertia
(default true).
System vitals
CODEBLOCK7
Launch the app
CODEBLOCK8
Tips
- - Use
airpoint ask for almost everything. The agent can read the screen,
interact with any app, and chain multi-step workflows autonomously.
- - Always use
--json when you need to parse output programmatically. - The agent can answer questions about what's on screen ("what app is in the
foreground?", "read the error message in this dialog").
- - Airpoint is a notarized, code-signed macOS app. Download it from
airpoint.app.
Airpoint — 适用于 macOS 的 AI 计算机操控工具
Airpoint 为您提供一个能够查看并控制 Mac 的 AI 代理——它可以打开应用、点击 UI 元素、读取屏幕文本、输入文字、滚动页面、拖拽内容以及管理窗口。您只需给出自然语言指令,它便能通过感知屏幕(无障碍树 + 截图 + 视觉定位器)、规划动作、执行操作并验证结果,自主完成任务。
所有操作均通过 airpoint 命令行界面运行。
系统要求
- - macOS(Apple Silicon 或 Intel)
- Airpoint 应用 — 必须处于运行状态。从 airpoint.app 下载。
- Airpoint CLI — airpoint 命令必须位于 PATH 环境变量中。通过 Airpoint 应用安装:设置 → 插件 → 安装 CLI。
配置步骤
在使用 Airpoint 的 AI 代理之前,用户需要在 Airpoint 应用中进行配置(设置 → 助手):
- 1. AI 模型 API 密钥(必需)。 为所选提供商设置 API 密钥:
-
OpenAI(推荐): 使用 gpt-5.1 模型并设置推理努力度为 low,可在成本、速度和效果之间取得最佳平衡。
- 同时支持 Anthropic 和 Google Gemini。
- 2. Gemini API 密钥(推荐)。 即使使用 OpenAI 或 Anthropic 作为主要模型,Google Gemini API 密钥也能启用视觉定位器——这是一个辅助模型(gemini-3-flash-preview),通过分析截图来查找屏幕上的 UI 目标。没有它,代理将仅依赖无障碍树。
- macOS 权限。 应用首次启动时会提示授权,但请确认已在系统设置 → 隐私与安全性中授予以下权限:
-
辅助功能 — 鼠标/键盘控制所需。
-
屏幕录制 — 截图和屏幕感知所需。
- 摄像头仅用于手势追踪(非 AI 代理所需)。
- 4. 自定义指令(可选)。 在设置 → 助手中添加自定义指令,以定制代理的行为(例如:首选语言、要避免的应用、要遵循的工作流程)。
如果用户报告 airpoint ask 失败或代理无法看到屏幕,请让他们验证上述第 1-3 步。
使用方法
- 1. 运行 airpoint ask <您的指令> 将任务发送到设备上的代理。
- 该命令会阻塞直到代理完成(最长 5 分钟),并返回:
- 代理执行内容及结果的文本摘要。
- 一个或多个
截图文件路径,显示任务完成后的屏幕状态。
- 3. 阅读文本输出以确认任务是否成功。
- 如果返回了截图,向用户展示最后一张截图作为结果的视觉确认。
- 如果出现问题或任务卡住,运行 airpoint stop 取消任务。
示例流程:
airpoint ask 打开 Safari 并搜索 OpenClaw
已打开 Safari,在地址栏输入了 OpenClaw,并按下回车键。
搜索结果页面现已显示。
1 张截图已保存到会话 abc123
└ screenshots/step3.png (/Users/you/Library/Application Support/com.medhuelabs.airpoint/sessions/abc123/screenshots/step3.png)
收到此信息后,向用户展示截图,以便他们了解发生了什么。
命令
让 AI 代理执行操作(主要命令)
这是最重要的命令。它将自然语言任务发送给 Airpoint 内置的计算机操控代理,该代理可以查看屏幕、移动鼠标、点击、输入文字、滚动页面、通过 Spotlight 打开应用、管理窗口并验证自身操作。
bash
同步模式 — 等待代理完成(最长 5 分钟)并返回输出
airpoint ask 打开 Safari 并访问 github.com
airpoint ask 我屏幕上现在有什么?
airpoint ask 找到 Slack 通知并读取它
airpoint ask 打开系统设置并启用深色模式
airpoint ask 打开邮件,找到 John 的最新邮件并总结内容
即发即忘模式 — 立即返回
airpoint ask 打开 Spotify 并播放我喜欢的歌曲 --no-wait
运行时显示助手面板
airpoint ask 打开系统设置并启用深色模式 --show-panel
停止正在运行的任务
bash
airpoint stop
取消当前正在运行的助手任务。如果任务卡住或耗时过长,请使用此命令。
截取屏幕截图
bash
airpoint see
返回当前显示器的截图。在发出 ask 命令前后用于验证状态。
检查状态
bash
airpoint status
airpoint status --json
返回应用版本和当前状态(追踪是否激活等)。
手势追踪(辅助功能)
Airpoint 还支持通过摄像头手势追踪实现免提光标控制。以下命令用于启动/停止该功能:
bash
airpoint tracking on
airpoint tracking off
airpoint tracking # 显示当前状态
读取或更改设置
bash
airpoint settings list # 所有当前设置
airpoint settings list --json # 机器可读格式
airpoint settings get cursor.sensitivity
airpoint settings set cursor.sensitivity 1.5
常用设置:cursor.sensitivity(默认 1.0)、cursor.acceleration(默认 true)、scroll.sensitivity(默认 1.0)、scroll.inertia(默认 true)。
系统运行状况
bash
airpoint vitals # CPU、内存、温度
airpoint vitals --json
启动应用
bash
airpoint open # 打开/聚焦 Airpoint macOS 应用
使用技巧
- - 几乎所有操作都使用 airpoint ask。 代理可以读取屏幕、与任何应用交互,并自主串联多步骤工作流程。
- 当需要以编程方式解析输出时,始终使用 --json。
- 代理可以回答关于屏幕内容的问题(哪个应用在前台?、读取此对话框中的错误信息)。
- Airpoint 是经过公证和代码签名的 macOS 应用。从 airpoint.app 下载。