The large language model used for agent reasoning and decision-making. Supported models include gemini/gemini-2.0-flash, anthropic/claude-3.5-sonnet, anthropic/claude-3.5-haiku, openai/gpt-4o, and openai/gpt-4o-mini.
Whether to enable vision capabilities for the agent. Vision allows the agent to analyze images, screenshots, and visual page elements. Not all models support vision.
Copy
Ask AI
agent = client.Agent( session=session, use_vision=True # Agent can understand images)
Maximum number of actions the agent can take before stopping. Must be between 1 and 50. Higher values allow more complex tasks but increase cost and execution time.
Copy
Ask AI
agent = client.Agent( session=session, max_steps=20 # Allow up to 20 actions)
Optional Pydantic model defining the structure of the agent’s response. Use this to get type-safe, structured output. See Structured Output for details.
Experimental - The step number from which the agent should gather information from the session history. If not provided, the agent has fresh memory. Use this to make the agent aware of previous actions.
Copy
Ask AI
# Execute some actions firstsession.execute(type="goto", url="https://example.com")session.execute(type="click", selector="button.search")# Agent remembers actions from step 0result = agent.run( task="Continue from where we left off", session_offset=0)
# Simple navigation and extractionreasoning_model="gemini/gemini-2.0-flash"# Complex reasoning and decision-makingreasoning_model="anthropic/claude-3.5-sonnet"
# Good - start where neededagent.run( task="Extract product details", url="https://example.com/product/123")# Less efficient - agent must navigate firstagent.run( task="Go to product page and extract details", url="https://example.com")