This new Google Gemini model scrolls the internet just like you do – how it works

0

Google DeepMind has launched a new AI model in public preview that can navigate websites and interact with them much like a human user — marking another step toward AI systems that operate across web environments with minimal human input.

Built on Gemini 2.5 Pro, the new Computer Use model can perform browser-based actions such as clicking buttons, typing text, and scrolling through pages in real time. It joins a growing list of web-interaction AI tools from competitors like OpenAI and Anthropic, though Google has openly acknowledged challenges, including the risk of hallucinated or incorrect outputs.
How Users Interact with It

Prompting the model is simple: write a natural-language request — for example, “Open Wikipedia, search for ‘Atlantis,’ and summarize its history in Western thought.” The AI will autonomously:

  • Locate and open the target website.
  • Capture and analyze screenshots of the interface.
  • Execute each step sequentially — from searching to reading to summarizing.
  • While carrying out tasks, the model displays its reasoning process openly in a text feed, so users can see
    exactly what it’s doing. Sensitive instructions, like purchases or account changes, trigger a confirmation
    request to ensure consent.

Built for Continuous Context

Gemini 2.5 Computer Use relies on an iterative looping function, enabling it to keep a running log of recent actions. This growing memory of interactions helps the model adapt on the fly, improving speed and accuracy the longer it operates in a given website’s interface.

Part of a Bigger Push

The release follows Google’s earlier experiments with Project Mariner, a Chrome extension designed to take limited automated actions inside web pages. With the new model, DeepMind is clearly positioning itself alongside — and in direct competition with — the newest web-browsing AI agents from OpenAI and Anthropic.

While still in preview, the technology offers a glimpse of what fully autonomous, browser-native AI could look like — a tool capable of running complex workflows, assisting with research, and even managing online accounts, all without the user manually clicking a single button.

LEAVE A REPLY

Please enter your comment!
Please enter your name here