How to Use browser-use to Automate Your Browser with AI Agents

Hardik Desai
2 min readFeb 14, 2025
How to Use browser-use to Automate Your Browser with AI Agents

The evolution of artificial intelligence (AI) has made browser automation more powerful than ever. With tools like browser-use, you can integrate AI agents to perform automated tasks such as web scraping, form filling, and data extraction, making your workflow more efficient. In this blog, we’ll explore how to use browser-use for AI-driven browser automation.

What is browser-use?

browser-use is a tool that allows AI agents to interact with web browsers, mimicking human-like browsing behavior. It enables automation of repetitive tasks such as:

  • Navigating websites
  • Clicking buttons and filling forms
  • Extracting data from web pages
  • Managing cookies and authentication

Getting Started with browser-use

Step 1: Prepare the environment

First, we recommend using uv to setup the Python environment.

uv venv --python 3.11

and activate it with:

# For Mac/Linux:
source .venv/bin/activate

# For Windows:
.venv\Scripts\activate

Install the dependencies:

uv pip install browser-use

Then install playwright:

playwright install

Step 2: In Root Create an agent.py file

We have used google’s Gemini modal for this example.

import os
import sys
from pathlib import Path
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use.agent.views import ActionResult
from pydantic import SecretStr
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio

from langchain_openai import ChatOpenAI

from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext

browser = Browser(
config=BrowserConfig(
# NOTE: you need to close your chrome browser - so that this can open your browser in debug mode
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)

api_key = 'GEMINI_API_KEY'
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(api_key))

async def main():
agent = Agent(
task='open google document and write an blog about latest tech trends',
llm=llm,
browser=browser,
)

await agent.run()
await browser.close()

input('Press Enter to close...')


if __name__ == '__main__':
asyncio.run(main())

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Hardik Desai
Hardik Desai

Written by Hardik Desai

Transforming ideas into seamless online solutions

Responses (1)

Write a response

Thanks for sharing, Hardik. Could you share a few more lines please? Such as:
- How to prepare a scenario
- How to trigger a run (i.e execute scenario)