跳至主要内容
小龙虾小龙虾AI
🤖

Gemini Computer Use

Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

下载2.9k
星标2
版本1.0.0
自动化
安全通过
⚙️脚本

技能说明


name: gemini-computer-use description: Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.

Gemini Computer Use

Quick start

  1. Source the env file and set your API key:

    cp env.example env.sh
    $EDITOR env.sh
    source env.sh
    
  2. Create a virtual environment and install dependencies:

    python -m venv .venv
    source .venv/bin/activate
    pip install google-genai playwright
    playwright install chromium
    
  3. Run the agent script with a prompt:

    python scripts/computer_use_agent.py \
      --prompt "Find the latest blog post title on example.com" \
      --start-url "https://example.com" \
      --turn-limit 6
    

Browser selection

  • Default: Playwright's bundled Chromium (no env vars required).
  • Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL.
  • Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE.

If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence.

Core workflow (agent loop)

  1. Capture a screenshot and send the user goal + screenshot to the model.
  2. Parse function_call actions in the response.
  3. Execute each action in Playwright.
  4. If a safety_decision is require_confirmation, prompt the user before executing.
  5. Send function_response objects containing the latest URL + screenshot.
  6. Repeat until the model returns only text (no actions) or you hit the turn limit.

Operational guidance

  • Run in a sandboxed browser profile or container.
  • Use --exclude to block risky actions you do not want the model to take.
  • Keep the viewport at 1440x900 unless you have a reason to change it.

Resources

  • Script: scripts/computer_use_agent.py
  • Reference notes: references/google-computer-use.md
  • Env template: env.example

如何使用「Gemini Computer Use」?

  1. 打开小龙虾AI(Web 或 iOS App)
  2. 点击上方「立即使用」按钮,或在对话框中输入任务描述
  3. 小龙虾AI 会自动匹配并调用「Gemini Computer Use」技能完成任务
  4. 结果即时呈现,支持继续对话优化

相关技能