gemini-computer-use
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Computer Use
Gemini Computer Use
Quick start
快速开始
-
Source the env file and set your API key:bash
cp env.example env.sh $EDITOR env.sh source env.sh -
Create a virtual environment and install dependencies:bash
python -m venv .venv source .venv/bin/activate pip install google-genai playwright playwright install chromium -
Run the agent script with a prompt:bash
python scripts/computer_use_agent.py \ --prompt "Find the latest blog post title on example.com" \ --start-url "https://example.com" \ --turn-limit 6
-
加载环境文件并设置你的API密钥:bash
cp env.example env.sh $EDITOR env.sh source env.sh -
创建虚拟环境并安装依赖:bash
python -m venv .venv source .venv/bin/activate pip install google-genai playwright playwright install chromium -
使用提示词运行代理脚本:bash
python scripts/computer_use_agent.py \ --prompt "Find the latest blog post title on example.com" \ --start-url "https://example.com" \ --turn-limit 6
Browser selection
浏览器选择
- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with .
COMPUTER_USE_BROWSER_CHANNEL - Use a custom Chromium-based executable (e.g., Brave) with .
COMPUTER_USE_BROWSER_EXECUTABLE
If both are set, takes precedence.
COMPUTER_USE_BROWSER_EXECUTABLE- 默认:Playwright自带的Chromium(无需环境变量)。
- 通过选择渠道(Chrome/Edge)。
COMPUTER_USE_BROWSER_CHANNEL - 通过使用自定义的基于Chromium的可执行文件(如Brave)。
COMPUTER_USE_BROWSER_EXECUTABLE
如果两者都设置,优先级更高。
COMPUTER_USE_BROWSER_EXECUTABLECore workflow (agent loop)
核心工作流(代理循环)
- Capture a screenshot and send the user goal + screenshot to the model.
- Parse actions in the response.
function_call - Execute each action in Playwright.
- If a is
safety_decision, prompt the user before executing.require_confirmation - Send objects containing the latest URL + screenshot.
function_response - Repeat until the model returns only text (no actions) or you hit the turn limit.
- 捕获截图并将用户目标+截图发送给模型。
- 解析响应中的操作。
function_call - 在Playwright中执行每个操作。
- 如果为
safety_decision,执行前需提示用户确认。require_confirmation - 发送包含最新URL+截图的对象。
function_response - 重复循环,直到模型仅返回文本(无操作)或达到循环次数限制。
Operational guidance
操作指南
- Run in a sandboxed browser profile or container.
- Use to block risky actions you do not want the model to take.
--exclude - Keep the viewport at 1440x900 unless you have a reason to change it.
- 在沙箱浏览器配置文件或容器中运行。
- 使用参数阻止你不希望模型执行的风险操作。
--exclude - 除非有特殊原因,否则保持视口为1440x900。
Resources
资源
- Script:
scripts/computer_use_agent.py - Reference notes:
references/google-computer-use.md - Env template:
env.example
- 脚本:
scripts/computer_use_agent.py - 参考笔记:
references/google-computer-use.md - 环境变量模板:
env.example