gemini-computer-use

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Computer Use

Gemini Computer Use

Quick start

快速开始

  1. Source the env file and set your API key:
    bash
    cp env.example env.sh
    $EDITOR env.sh
    source env.sh
  2. Create a virtual environment and install dependencies:
    bash
    python -m venv .venv
    source .venv/bin/activate
    pip install google-genai playwright
    playwright install chromium
  3. Run the agent script with a prompt:
    bash
    python scripts/computer_use_agent.py \
      --prompt "Find the latest blog post title on example.com" \
      --start-url "https://example.com" \
      --turn-limit 6
  1. 加载环境文件并设置你的API密钥:
    bash
    cp env.example env.sh
    $EDITOR env.sh
    source env.sh
  2. 创建虚拟环境并安装依赖:
    bash
    python -m venv .venv
    source .venv/bin/activate
    pip install google-genai playwright
    playwright install chromium
  3. 使用提示词运行代理脚本:
    bash
    python scripts/computer_use_agent.py \
      --prompt "Find the latest blog post title on example.com" \
      --start-url "https://example.com" \
      --turn-limit 6

Browser selection

浏览器选择

  • Default: Playwright's bundled Chromium (no env vars required).
  • Choose a channel (Chrome/Edge) with
    COMPUTER_USE_BROWSER_CHANNEL
    .
  • Use a custom Chromium-based executable (e.g., Brave) with
    COMPUTER_USE_BROWSER_EXECUTABLE
    .
If both are set,
COMPUTER_USE_BROWSER_EXECUTABLE
takes precedence.
  • 默认:Playwright自带的Chromium(无需环境变量)。
  • 通过
    COMPUTER_USE_BROWSER_CHANNEL
    选择渠道(Chrome/Edge)。
  • 通过
    COMPUTER_USE_BROWSER_EXECUTABLE
    使用自定义的基于Chromium的可执行文件(如Brave)。
如果两者都设置,
COMPUTER_USE_BROWSER_EXECUTABLE
优先级更高。

Core workflow (agent loop)

核心工作流(代理循环)

  1. Capture a screenshot and send the user goal + screenshot to the model.
  2. Parse
    function_call
    actions in the response.
  3. Execute each action in Playwright.
  4. If a
    safety_decision
    is
    require_confirmation
    , prompt the user before executing.
  5. Send
    function_response
    objects containing the latest URL + screenshot.
  6. Repeat until the model returns only text (no actions) or you hit the turn limit.
  1. 捕获截图并将用户目标+截图发送给模型。
  2. 解析响应中的
    function_call
    操作。
  3. 在Playwright中执行每个操作。
  4. 如果
    safety_decision
    require_confirmation
    ,执行前需提示用户确认。
  5. 发送包含最新URL+截图的
    function_response
    对象。
  6. 重复循环,直到模型仅返回文本(无操作)或达到循环次数限制。

Operational guidance

操作指南

  • Run in a sandboxed browser profile or container.
  • Use
    --exclude
    to block risky actions you do not want the model to take.
  • Keep the viewport at 1440x900 unless you have a reason to change it.
  • 在沙箱浏览器配置文件或容器中运行。
  • 使用
    --exclude
    参数阻止你不希望模型执行的风险操作。
  • 除非有特殊原因,否则保持视口为1440x900。

Resources

资源

  • Script:
    scripts/computer_use_agent.py
  • Reference notes:
    references/google-computer-use.md
  • Env template:
    env.example
  • 脚本:
    scripts/computer_use_agent.py
  • 参考笔记:
    references/google-computer-use.md
  • 环境变量模板:
    env.example