Gemini 3.5 Flash Gets Computer Use: Google's AI Now Controls the Browser, Apps, and Desktop on Its Own - Basic Tutorials - News Bunkers

Google has fully integrated Computer Use into Gemini 3.5 Flash. This allows the model to directly interact with browsers, mobile interfaces, and desktop applications: clicking, tapping, scrolling, and working through multi-step tasks. Previously, this capability was housed in a separate model; starting June 24, 2026, it will run natively within the standard Flash model. For developers, this means a model that can see, think, and act all at once.
Until now, Computer Use at Google was a special model based on Gemini 2.5. While it could interact with user interfaces, it couldn’t simultaneously access other tools like Google Search or Maps grounding. It is precisely this separation that has now been eliminated. In Gemini 3.5 Flash, screen control is one of the built-in tools, alongside Function Calling and the familiar search and Maps integration.
In practice, it works like this: The model receives a screenshot of the current interface, recognizes buttons, text fields, and menus, and then decides what to do next. It clicks buttons, fills out forms, switches tabs, or types text. Google cites as examples a functional analysis of its own Gemini app and a self-audit of its documentation for accessibility. The model thus covers three environments: web browsers, mobile operating systems, and traditional desktop software.
The real appeal lies in the tedious tasks. Continuous software testing, clicking through multiple business applications, knowledge work across various tools—in other words, tasks that involve many steps and previously required a great deal of manual labor.
The big question, of course, is how well the whole thing really works. The benchmark used is OSWorld-Verified, which tests Computer Use agents across Ubuntu, Windows, and macOS. Gemini 3.5 Flash scores 78.4 percent there. By comparison, its predecessor, Gemini 3 Flash, scored 65.1 percent—a jump of over 13 points between generations.
There are two things to keep in mind. First, all figures on the OSWorld Verified leaderboard are self-reported by the vendors and will not be independently verified until June 2026. The benchmark is useful for getting a general idea, but I would be cautious about making direct comparisons down to the decimal places. Second, Flash and GPT-5.5 are separated by a mere 0.3 points; the real difference lies in the price. Gemini 3.5 Flash costs $1.50 per million input tokens and $9 per million output tokens, whereas GPT-5.5 costs $5 and $30, respectively. With large agent workloads, that adds up quickly.
Even though the raw numbers seem tempting, there’s a wider gap between benchmarking and production use for computer use than for most other AI tasks. OSWorld measures predefined tasks in a stable environment. Real-world agents, on the other hand, operate in applications that are constantly changing, require logins, and display screen states that the model has never seen before. Google itself advises against using Computer Use for critical decisions, sensitive data, or situations where errors cannot be corrected.
An AI that navigates browsers, forms, and file systems on its own has a completely different scope than a text-only chatbot. If it is granted the privileges of a power user, that very capability becomes a vulnerability. Google is therefore relying on targeted adversarial training for Gemini 3.5 Flash to reduce prompt injection risks in live environments.
In addition, there are two optional protection systems for enterprises. One requires explicit user confirmation before sensitive or irreversible actions are carried out. The other automatically halts a task as soon as it detects indirect prompt injection. Google also recommends a defense-in-depth approach with secure sandboxes, a human in the loop, and strict access permissions. Google provides further details in its best practices documentation.
This integration isn’t a one-off step but rather aligns with Google’s approach over the past few months. Gemini 3.5 Flash was designed as an agent-based model from the start and, since its unveiling at Google I/O 2026, has been the driving force behind features such as the persistent agent Gemini Spark. You can read more about the model itself and its agent-based capabilities in our article “Introducing Gemini 3.5 Flash.”
This approach is also evident in everyday use. On Android, Gemini Intelligence is increasingly taking on proactive tasks and automating workflows across multiple apps. And in the Gemini app itself, Google has been shifting the focus for months from a pure chatbot to an active assistant. Computer Use in 3.5 Flash is essentially the technical foundation upon which many of these promises are built.
If you want to try out Computer Use, you now have several options. Developers and businesses can access the feature via the Gemini API and the Gemini Enterprise Agent Platform. For a quick test, there’s a demo environment hosted by Browserbase, and to help you get started, Google provides a reference implementation on GitHub.
With Computer Use in Gemini 3.5 Flash, Google is making the leap from pure assistance to execution. A model that can simultaneously use Google Search, call its own functions, and operate a browser on the side is a real game-changer for automation and enterprise workflows. The cost advantage over GPT-5.5 makes this particularly attractive when dealing with many parallel agents.
At the same time, a sober assessment is warranted. The benchmark figures are self-reported, and the transition from the test environment to actual production use is especially delicate with Computer Use. The fact that Google itself advises sandboxing, human oversight, and caution when handling critical tasks should be taken seriously. It remains exciting nonetheless, as the next logical step—Gemini 3.5 Pro—is already in the starting blocks.
What is “Computer Use” in Gemini 3.5 Flash?
It is a built-in tool that allows the model to interact with browsers, apps, and desktop programs on its own. It analyzes screenshots of the user interface and then performs actions such as clicking, tapping, and scrolling.
How does Gemini 3.5 Flash perform in the “Computer Use” test?
In the OSWorld Verified Benchmark, the model achieved 78.4 percent, placing it almost on par with GPT-5.5 (78.7 percent). According to the reported values, Claude Opus 4.8 leads the pack with 83.4 percent.
Is Computer Use safe?
Google uses adversarial training and offers two optional protection systems: a confirmation requirement for critical actions and an automatic stop when prompt injection is detected. Google advises against using it for sensitive or non-recoverable tasks.
Written by
Comment
Yes, I’d like to subscribe to the newsletter!
E-Mail-Benachrichtigung bei weiteren Kommentaren. Auch möglich: Abo ohne Kommentar.

Basic Tutorials is a gaming and tech blog that keeps you up to date with in-depth news, reviews, tutorials and the latest deals from the world of gaming and technology.
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

source

Gemini 3.5 Flash Gets Computer Use: Google's AI Now Controls the Browser, Apps, and Desktop on Its Own – Basic Tutorials

Leave a Reply Cancel Reply