How to Use browser-use to Automate Your Browser with AI Agents

2 min readFeb 14, 2025

How to Use browser-use to Automate Your Browser with AI Agents

The evolution of artificial intelligence (AI) has made browser automation more powerful than ever. With tools like browser-use, you can integrate AI agents to perform automated tasks such as web scraping, form filling, and data extraction, making your workflow more efficient. In this blog, we’ll explore how to use browser-use for AI-driven browser automation.

What is browser-use?

browser-use is a tool that allows AI agents to interact with web browsers, mimicking human-like browsing behavior. It enables automation of repetitive tasks such as:

Navigating websites
Clicking buttons and filling forms
Extracting data from web pages
Managing cookies and authentication

Getting Started with browser-use

Step 1: Prepare the environment

First, we recommend using uv to setup the Python environment.

uv venv --python 3.11

and activate it with:

# For Mac/Linux:
source .venv/bin/activate

# For Windows:
.venv\Scripts\activate

Install the dependencies:

uv pip install browser-use

Then install playwright:

playwright install

Step 2: In Root Create an agent.py file

We have used google’s Gemini modal for this example.

import os
import sys
from pathlib import Path
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use.agent.views import ActionResult
from pydantic import SecretStr
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio

from langchain_openai import ChatOpenAI

from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext

browser = Browser(
 config=BrowserConfig(
  # NOTE: you need to close your chrome browser - so that this can open your browser in debug mode
  chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
 )
)

api_key = 'GEMINI_API_KEY'
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(api_key))

async def main():
 agent = Agent(
  task='open google document and write an blog about latest tech trends',
  llm=llm,
  browser=browser,
 )

 await agent.run()
 await browser.close()

 input('Press Enter to close...')


if __name__ == '__main__':
 asyncio.run(main())

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Hardik Desai

3 Followers

1 Following

Transforming ideas into seamless online solutions

Responses (1)

Write a response

What are your thoughts?

Also publish to my profile

Michael

3 days ago

Thanks for sharing, Hardik. Could you share a few more lines please? Such as:
- How to prepare a scenario
- How to trigger a run (i.e execute scenario)

Recommended from Medium

MCPO: Supercharge Open-WebUI with MCP Tools

Minyang Chen

MCPO: Supercharge Open-WebUI with MCP Tools

Open WebUI officially supports MCP Tool Servers — as long as the MCP Tool Server is fronted by an OpenAPI-compatible proxy…

5d ago

Amos Gyamfi

The Top 7 MCP-Supported AI Frameworks

Create AI apps with Python and Typescript frameworks that leverage MCP servers to provide context to LLMs.

Apr 1

Everyday AI

Manpreet Singh

Craziest MCP Servers You Must Try

I remember when I first heard about MCP (Model Context Protocol). I thought

Mar 9

How I created UI with ChatGPT’s new image generator (4o)

Bootcamp

Xinran Ma

How I created UI with ChatGPT’s new image generator (4o)

Prompts, walkthroughs, and surprises

Mar 31

This new IDE from Google is an absolute game changer

Coding Beauty

Tari Ibaba

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Mar 11

208

LangGraph + MCP + Ollama: The Key To Powerful Agentic AI

Data Science Collective

Gao Dalie (高達烈)

LangGraph + MCP + Ollama: The Key To Powerful Agentic AI

In this story, I have a super quick tutorial showing you how to create a multi-agent chatbot using LangGraph, MCP, and Ollama to build a…

Mar 28

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech