Gemini 3.5 Flash computer use is useful only when it is wrapped in a disciplined harness. The model can look at screenshots and suggest UI actions, but your application owns the browser, the permissions, the logging, the allowed actions, and the approval gates.

For the broad overview, start with the Gemini computer-use guide for AI ad ops. This page is the implementation spoke: how an ad team should set up the first safe browser-agent loop without pretending it is ready to run spend unattended.

The target workflow

Start with a read-only QA task:

"Open this landing page, inspect the hero, CTA, form, mobile layout, UTM behavior, and tracking indicators. Return pass/fail findings with screenshot evidence. Do not submit forms or change settings."

That prompt is intentionally boring. It has bounded scope, visible success criteria, and low blast radius. If the agent fails, a human loses minutes, not budget.

The architecture

A production-quality loop has six parts:

Layer	Responsibility
Task router	Decides whether this is a browser-agent task or a structured API task
Browser sandbox	Runs Playwright, Browserbase, or an internal browser profile with limited permissions
Gemini request	Sends the user goal, current screenshot, URL, tool config, and constraints
Action gate	Allows safe actions, blocks dangerous actions, asks for confirmation when needed
Evidence logger	Stores screenshots, proposed actions, executed actions, URLs, and final findings
Human approval	Reviews any action that can submit, spend, delete, grant access, or alter account state

The model is one component. The harness is the product.

Step 1: Define the environment

Google's docs support browser, mobile, and desktop environments. For ad ops, use browser first. It is easier to sandbox, easier to log, and closer to the surfaces marketing teams actually check: landing pages, platform UIs, ad previews, analytics dashboards, and competitor sites.

Run the browser in a separate profile. Do not use a human's everyday Chrome profile. The browser should have only the cookies and permissions needed for the task, and it should not carry broad admin access unless the workflow has explicit approval gates.

Step 2: Start with screenshots, not trust

Computer use works by observation. Your application sends a screenshot and receives a proposed action. Treat the screenshot as the evidence boundary.

For a landing-page QA task, log:

starting URL
viewport size
screenshot before each action
action proposed by Gemini
action allowed or blocked by the gate
final screenshot
final checklist result

This makes the workflow reviewable. If the agent says the form failed, the team can see what it saw.

Step 3: Exclude dangerous actions

For the first ad-ops version, block:

billing pages
account-access pages
budget and bid edits
campaign activation
deletion
form submission with customer data
password and 2FA screens
file upload unless explicitly requested

The agent can still navigate, scroll, inspect, type into harmless test fields, and capture findings. That is enough to make the first workflows valuable.

Step 4: Give it a task-specific rubric

Generic prompts create generic findings. Each browser-agent workflow needs a rubric.

For landing-page QA:

Check	Pass condition
Hero promise	The first viewport matches the ad offer
CTA	Primary CTA is visible and action-oriented
Form	Required fields are clear and error states are readable
Mobile layout	No text overlap, broken hero, or hidden CTA
UTM	Tracking parameters survive navigation where expected
Pixel/tracking	Debug indicators or network events are visible when available

For creative preview QA:

Check	Pass condition
First frame	Product or promise is visible immediately
Safe area	Captions and CTA are not clipped
Brand	Logo, colors, and claims match the brief
Format	Aspect ratio fits the placement
Risk	No unsupported claims or misleading before/after language

Step 5: Pair it with structured data

Do not make the browser agent read numbers from dashboards if an API exists. Pull metrics through Google Ads MCP, Meta MCP, GA4, Shopify, or Soku's own connectors. Then use computer use to inspect the UI surface behind a finding.

Example:

Structured data says conversion rate fell after a landing page update.
Browser agent opens the landing page, tests the CTA path, and captures the broken mobile form.
Soku writes the recommendation: pause scaling, fix mobile form, retest before increasing budget.

The data tells you where to look. Computer use shows what a human would have seen.

Step 6: Keep the first deployment read-only

For the first 30 days, treat Gemini computer use as an inspector:

Week 1: landing-page QA only.
Week 2: creative-preview QA.
Week 3: reporting screenshot reconciliation.
Week 4: platform alert triage.

Only after the logs are boring should you allow low-risk actions such as downloading a report, filling a test-only form, or navigating to a specific settings screen. Spend-impacting actions should stay behind human approval.

How Soku fits

Soku already knows when a campaign needs attention because it reads performance data across channels. Gemini computer use gives Soku a way to inspect the screens behind that signal. The result is a stronger recommendation: not "CPA rose," but "CPA rose after the landing page changed; mobile CTA is below the fold; fix before scaling."

That is the right role for browser agents in ad ops. They do not replace the media buyer. They make the diagnosis faster and more verifiable.