Browser Agents

Browser Agents are AI-powered systems that can autonomously navigate websites, complete tasks, and extract information using natural language instructions.

What is a Browser Agent?

A Browser Agent combines:

Large Language Models (LLMs) for reasoning and decision-making
Browser Sessions for executing actions
Vision capabilities to understand web pages
Autonomous planning to complete multi-step tasks

Unlike scripted automation, agents can adapt to changes, handle unexpected scenarios, and complete tasks without predefined workflows.

Quick Start

Create and run an agent in a few lines:

agent_quickstart.py

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    agent = client.Agent(
        session=session,
        reasoning_model="gemini/gemini-2.0-flash",
        max_steps=10
    )

    result = agent.run(
        task="Go to example.com and find the contact email"
    )

    print(result.answer)

Agents run within browser sessions. Use context managers to ensure sessions are automatically stopped when done. This prevents orphaned sessions and unexpected costs.

How Agents Work

1. Observation

The agent observes the current page state:

Visible elements and their properties
Interactive components (buttons, forms, links)
Text content and structure
Current URL and page metadata

2. Reasoning

Using the LLM, the agent:

Understands the current page
Plans the next action to complete the task
Decides which element to interact with
Determines when the task is complete

3. Action

The agent executes browser actions:

Navigate to URLs
Click buttons and links
Fill forms
Extract data
Scroll and interact with dynamic content

4. Iteration

This cycle repeats until:

The task is successfully completed
Maximum steps are reached
An error occurs that can’t be resolved

Agents vs Scripted Automation

Both agents and scripted automation run on browser sessions—the cloud browser infrastructure. The difference is how you control what happens in that session.

Aspect	Scripted Automation	Agent
Control	You write the code	AI decides each step
Flexibility	Fixed workflow	Adapts to changes
Speed	Fast (direct execution)	Slower (LLM reasoning per step)
Cost	Browser minutes only	Browser minutes + LLM calls
Reliability	Deterministic	Can vary based on page state
Use Case	Known, stable workflows	Unknown or dynamic workflows

Use scripted automation when:

You know the exact steps to take
Speed and cost are critical
The target pages rarely change

Use agents when:

You don’t know the exact steps
Pages change frequently
You need intelligent decision-making

You can combine both approaches: use an agent to figure out a workflow, then convert it to a function for faster, cheaper repeated execution.

Agent Capabilities

Agents come with powerful built-in capabilities:

Structured Output

Get type-safe responses using Pydantic models

Vaults & Personas

Use credentials and identities in automations

Visual Understanding

Analyze images and visual page elements

Replay & Debugging

Debug with MP4 replays of agent execution

Agent Fallback

Automatic recovery from script failures

Batch Execution

Run multiple agents in parallel

Key Concepts

Natural Language Tasks

Give instructions in plain English:

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    agent = client.Agent(session=session)
    agent.run(task="Find the cheapest laptop under $1000 and add it to cart")

Structured Output

Get responses in a specific format:

from pydantic import BaseModel
from notte_sdk import NotteClient

client = NotteClient()

class ContactInfo(BaseModel):
    email: str
    phone: str | None


with client.Session() as session:
    agent = client.Agent(session=session)
    result = agent.run(
        task="Extract contact information",
        response_format=ContactInfo
    )

Starting URL

Begin at a specific page:

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    agent.run(
        task="Find pricing information",
        url="https://example.com/products"
    )

Step Limits

Control maximum actions:

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    agent = client.Agent(
        session=session,
        max_steps=20  # Limit to 20 actions
    )

Error Handling

Agents can fail for various reasons. Always check the result:

from notte_sdk import NotteClient

client = NotteClient()

with client.Session() as session:
    result = agent.run(task="Complete task")

    if result.success:
        print(result.answer)
    else:
        print(f"Agent failed: {result.answer}")

Next Steps

Agent Lifecycle

Create, manage, and stop agents

Agent Configuration

All configuration options

Structured Output

Get typed responses from agents

Convert to Functions

Turn agent runs into reusable code

Getting Started

Sessions

Agents

Functions

Agent Tools

Scraping

Guides

Integrations

Browser Agents

What is a Browser Agent?

Quick Start

How Agents Work

1. Observation

2. Reasoning

3. Action

4. Iteration

Agents vs Scripted Automation

Agent Capabilities

Structured Output

Vaults & Personas

Visual Understanding

Replay & Debugging

Agent Fallback

Batch Execution

Key Concepts

Natural Language Tasks

Structured Output

Starting URL

Step Limits

Error Handling

Next Steps

Agent Lifecycle

Agent Configuration

Structured Output

Convert to Functions

Getting Started

Sessions

Agents

Functions

Agent Tools

Scraping

Guides

Integrations

​What is a Browser Agent?

​Quick Start

​How Agents Work

​1. Observation

​2. Reasoning

​3. Action

​4. Iteration

​Agents vs Scripted Automation

​Agent Capabilities

Structured Output

Vaults & Personas

Visual Understanding

Replay & Debugging

Agent Fallback

Batch Execution

​Key Concepts

​Natural Language Tasks

​Structured Output

​Starting URL

​Step Limits

​Error Handling

​Next Steps

Agent Lifecycle

Agent Configuration

Structured Output

Convert to Functions

What is a Browser Agent?

Quick Start

How Agents Work

1. Observation

2. Reasoning

3. Action

4. Iteration

Agents vs Scripted Automation

Agent Capabilities

Key Concepts

Natural Language Tasks

Structured Output

Starting URL

Step Limits

Error Handling

Next Steps