gemini-computer-use

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Computer Use

Quick start

快速开始

Source the env file and set your API key:

bash

cp env.example env.sh
$EDITOR env.sh
source env.sh

Create a virtual environment and install dependencies:

bash

python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium

Run the agent script with a prompt:

bash

python scripts/computer_use_agent.py \
  --prompt "Find the latest blog post title on example.com" \
  --start-url "https://example.com" \
  --turn-limit 6

加载环境文件并设置你的API密钥：

bash

cp env.example env.sh
$EDITOR env.sh
source env.sh

创建虚拟环境并安装依赖：

bash

python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium

使用提示词运行代理脚本：

bash

python scripts/computer_use_agent.py \
  --prompt "Find the latest blog post title on example.com" \
  --start-url "https://example.com" \
  --turn-limit 6

Browser selection

浏览器选择

Default: Playwright's bundled Chromium (no env vars required).
Choose a channel (Chrome/Edge) with
```
COMPUTER_USE_BROWSER_CHANNEL
```
.
Use a custom Chromium-based executable (e.g., Brave) with
```
COMPUTER_USE_BROWSER_EXECUTABLE
```
.

If both are set,

COMPUTER_USE_BROWSER_EXECUTABLE

takes precedence.

默认：Playwright自带的Chromium（无需环境变量）。
通过
```
COMPUTER_USE_BROWSER_CHANNEL
```
选择渠道（Chrome/Edge）。
通过
```
COMPUTER_USE_BROWSER_EXECUTABLE
```
使用自定义的基于Chromium的可执行文件（如Brave）。

如果两者都设置，

COMPUTER_USE_BROWSER_EXECUTABLE

优先级更高。

Core workflow (agent loop)

核心工作流（代理循环）

Capture a screenshot and send the user goal + screenshot to the model.
Parse
```
function_call
```
actions in the response.
Execute each action in Playwright.
If a
```
safety_decision
```
is
```
require_confirmation
```
, prompt the user before executing.
Send
```
function_response
```
objects containing the latest URL + screenshot.
Repeat until the model returns only text (no actions) or you hit the turn limit.

捕获截图并将用户目标+截图发送给模型。
解析响应中的
```
function_call
```
操作。
在Playwright中执行每个操作。
如果
```
safety_decision
```
为
```
require_confirmation
```
，执行前需提示用户确认。
发送包含最新URL+截图的
```
function_response
```
对象。
重复循环，直到模型仅返回文本（无操作）或达到循环次数限制。

Operational guidance

操作指南

Run in a sandboxed browser profile or container.
Use
```
--exclude
```
to block risky actions you do not want the model to take.
Keep the viewport at 1440x900 unless you have a reason to change it.

在沙箱浏览器配置文件或容器中运行。
使用
```
--exclude
```
参数阻止你不希望模型执行的风险操作。
除非有特殊原因，否则保持视口为1440x900。

Resources

资源

Script:
```
scripts/computer_use_agent.py
```
Reference notes:
```
references/google-computer-use.md
```
Env template:
```
env.example
```

脚本：
```
scripts/computer_use_agent.py
```
参考笔记：
```
references/google-computer-use.md
```
环境变量模板：
```
env.example
```