Gemini Computer Use
Quick start
- 1. Source the env file and set your API key:
CODEBLOCK0
- 2. Create a virtual environment and install dependencies:
CODEBLOCK1
- 3. Run the agent script with a prompt:
CODEBLOCK2
Browser selection
- - Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with
COMPUTER_USE_BROWSER_CHANNEL. - Use a custom Chromium-based executable (e.g., Brave) with
COMPUTER_USE_BROWSER_EXECUTABLE.
If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.
Core workflow (agent loop)
- 1. Capture a screenshot and send the user goal + screenshot to the model.
- Parse
function_call actions in the response. - Execute each action in Playwright.
- If a
safety_decision is require_confirmation, prompt the user before executing. - Send
function_response objects containing the latest URL + screenshot. - Repeat until the model returns only text (no actions) or you hit the turn limit.
Operational guidance
- - Run in a sandboxed browser profile or container.
- Use
--exclude to block risky actions you do not want the model to take. - Keep the viewport at 1440x900 unless you have a reason to change it.
Resources
- - Script: INLINECODE8
- Reference notes: INLINECODE9
- Env template: INLINECODE10
Gemini 计算机使用
快速开始
- 1. 加载环境变量文件并设置您的 API 密钥:
bash
cp env.example env.sh
$EDITOR env.sh
source env.sh
- 2. 创建虚拟环境并安装依赖:
bash
python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium
- 3. 使用提示运行代理脚本:
bash
python scripts/computeruseagent.py \
--prompt 查找 example.com 上最新的博客文章标题 \
--start-url https://example.com \
--turn-limit 6
浏览器选择
- - 默认:Playwright 捆绑的 Chromium(无需环境变量)。
- 使用 COMPUTERUSEBROWSERCHANNEL 选择浏览器渠道(Chrome/Edge)。
- 使用 COMPUTERUSEBROWSEREXECUTABLE 指定自定义的基于 Chromium 的可执行文件(例如 Brave)。
如果两者都设置了,COMPUTERUSEBROWSER_EXECUTABLE 优先。
核心工作流程(代理循环)
- 1. 捕获屏幕截图,并将用户目标 + 屏幕截图发送给模型。
- 解析响应中的 functioncall 操作。
- 在 Playwright 中执行每个操作。
- 如果 safetydecision 为 requireconfirmation,则在执行前提示用户确认。
- 发送包含最新 URL + 屏幕截图的 functionresponse 对象。
- 重复此过程,直到模型仅返回文本(无操作)或达到轮次限制。
操作指南
- - 在沙盒化的浏览器配置文件或容器中运行。
- 使用 --exclude 阻止您不希望模型执行的危险操作。
- 保持视口为 1440x900,除非有理由更改。
资源
- - 脚本:scripts/computeruseagent.py
- 参考笔记:references/google-computer-use.md
- 环境模板:env.example