How to Use browser-use to Automate Your Browser with AI Agents

The evolution of artificial intelligence (AI) has made browser automation more powerful than ever. With tools like browser-use, you can integrate AI agents to perform automated tasks such as web scraping, form filling, and data extraction, making your workflow more efficient. In this blog, we’ll explore how to use browser-use for AI-driven browser automation.
What is browser-use?
browser-use is a tool that allows AI agents to interact with web browsers, mimicking human-like browsing behavior. It enables automation of repetitive tasks such as:
- Navigating websites
- Clicking buttons and filling forms
- Extracting data from web pages
- Managing cookies and authentication
Getting Started with browser-use
Step 1: Prepare the environment
First, we recommend using uv to setup the Python environment.
uv venv --python 3.11
and activate it with:
# For Mac/Linux:
source .venv/bin/activate
# For Windows:
.venv\Scripts\activate
Install the dependencies:
uv pip install browser-use
Then install playwright:
playwright install
Step 2: In Root Create an agent.py file
We have used google’s Gemini modal for this example.
import os
import sys
from pathlib import Path
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use.agent.views import ActionResult
from pydantic import SecretStr
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent, Controller
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use.browser.context import BrowserContext
browser = Browser(
config=BrowserConfig(
# NOTE: you need to close your chrome browser - so that this can open your browser in debug mode
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)
api_key = 'GEMINI_API_KEY'
llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(api_key))
async def main():
agent = Agent(
task='open google document and write an blog about latest tech trends',
llm=llm,
browser=browser,
)
await agent.run()
await browser.close()
input('Press Enter to close...')
if __name__ == '__main__':
asyncio.run(main())