Browser Use 浏览器自动化 Agent
霍格沃兹测试开发学社
我们给大家推荐一款支持结构化识别的智能体。
Browser Use
以纯文本形式自动执行浏览器任务。 Browser Use 是一个非常好用的 AI 自动化工具, 可以实现用人类语言自动化操作浏览器。
\
这也是学社测试和 review 代码后认为最好的 web agent 之一。

Agent 架构
- 客户端:命令行 代码风格 云服务
- 智能体:大模型与工具清单
- 大模型:ollama openai 等
- 浏览器:playwright cdp
- 工具:内置工具 自定义工具
安装
uv venv --python 3.12
source .venv/bin/activate
uv pip install browser-use
uvx playwright install chromium --with-deps
快速开始
from browser_use import Agent, ChatOpenAI
from dotenv import load_dotenv
import asyncio
load_dotenv()
async def main():
llm = ChatOpenAI(model="gpt-4.1-mini")
task = "打开https://ceshiren.com 进入搜索 进入高级搜索 搜索python 打开第一条搜索结果的链接,返回界面标题,断言标题中有python"
agent = Agent(task=task, llm=llm)
await agent.run()
if __name__ == "__main__":
asyncio.run(main())
这是一份 browser use 框架的使用示例。 它提供了 Agent 类,进行初始化。 第一个参数是你的任务 task, 第二个参数是你使用的大模型。 直接执行即可,用起来还是非常简单的。
可控配置
- 配置文件 OPENAI_API_KEY
- 环境变量 OPENAI_BASE_URL
- 参数 ChatOpenAI(model="gpt-4.1-mini", base_url=...)
\
# .env
OPENAI_API_KEY=...
ANONYMIZED_TELEMETRY=false
使用案例
import asyncio
import os
import sys
import pytest
from browser_use import Agent, Browser
from browser_use.llm.openai.chat import ChatOpenAI
from browser_use.tools.service import Tools
# 某些版本可能会下载额外的工具,偶尔需要代理
# os.environ['https_proxy'] = 'http://127.0.0.1:3129'
async def main(task):
llm = ChatOpenAI(model="gpt-4.1-mini", base_url=os.getenv('OPENAI_BASE_URL'))
tools = Tools(exclude_actions=['search'])
browser = Browser(headless=False)
agent = Agent(
task=task,
llm=llm,
browser=browser,
tools=tools,
use_vision=False,
)
result = await agent.run()
print(result.model_dump_json(indent=2))
@pytest.mark.parametrize(
'case',
[
"打开ceshiren.com 进入搜索 进入高级搜索 搜索python",
"打开ceshiren.com 进入搜索 进入高级搜索 搜索python",
"打开ceshiren.com 进入搜索 进入高级搜索 搜索python",
"打开ceshiren.com 进入搜索 进入高级搜索 搜索python",
"打开ceshiren.com 进入搜索 进入高级搜索 搜索python",
]
)
def test_hogwarts(case):
asyncio.run(main(case))
if __name__ == '__main__':
asyncio.run(main(sys.argv[1]))
Browser Use Cli
$ pip install browser-use[cli]
$ browser-use --help
Usage: browser-use [OPTIONS] COMMAND [ARGS]...
Browser Use - AI Agent for Web Automation
Run without arguments to start the interactive TUI.
Options:
--version Print version and exit
--model TEXT Model to use (e.g., gpt-5-mini, claude-4-sonnet,
gemini-2.5-flash)
--debug Enable verbose startup logging
--headless Run browser in headless mode
--window-width INTEGER Browser window width
--window-height INTEGER Browser window height
--user-data-dir TEXT Path to Chrome user data directory (e.g.
~/Library/Application Support/Google/Chrome)
--profile-directory TEXT Chrome profile directory name (e.g. "Default",
"Profile 1")
--cdp-url TEXT Connect to existing Chrome via CDP URL (e.g.
http://localhost:9222)
--proxy-url TEXT Proxy server for Chromium traffic (e.g.
http://host:8080 or socks5://host:1080)
--no-proxy TEXT Comma-separated hosts to bypass proxy (e.g.
localhost,127.0.0.1,*.internal)
--proxy-username TEXT Proxy auth username
--proxy-password TEXT Proxy auth password
-p, --prompt TEXT Run a single task without the TUI (headless mode)
--mcp Run as MCP server (exposes JSON RPC via
stdin/stdout)
--help Show this message and exit.
Commands:
auth Authenticate with Browser Use Cloud to sync your runs
browser-use --model gpt-4.1-mini -p '打开ceshiren.com 进入搜索 进入高级搜索 搜索ai 测试开发'
Browser Use Web-UI


除了比较成熟的框架外,官方提供了一个比较简单的 UI 界面,可以辅助操作,适合新人入手。可以通过 UI 界面配置 Agent 与大模型。不过这个项目可用度和定制性并不高,仅供参考。
这是 browser use webui 的基本界面。 你可以通过这个界面配置浏览器的配置,配置大模型,并执行任务。 也可以查看执行结果与每次结果的录制数据。
源代码安装
# Clone the repository
git clone https://github.com/browser-use/web-ui.git
cd web-ui
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your preferred text editor and add your API keys
python webui.py --ip 127.0.0.1 --port 7788
这是使用源代码启动的方式,git clone 项目,进入目录后 copy 对应的配置文件,然后直接启动。
docker compose 方式启动
# Clone the repository
git clone https://github.com/browser-use/web-ui.git
cd web-ui
# Copy and configure environment variables
cp .env.example .env
# Edit .env with your preferred text editor and add your API keys
## docker方式启动
# Build and start the container with default settings (browser closes after AI tasks)
docker compose up --build
# Or run with persistent browser (browser stays open between AI tasks)
CHROME_PERSISTENT_SESSION=true docker compose up --build
这是使用 docker 启动的方式,在项目的根目录下有对应的 docker compose 的配置文件。 使用 docker compose up 启动即可
Run Agent

在运行界面可以输入自己的任务并执行,执行后还可以在结果里查看运行记录。底层使用的是 gradio 框架实现的。感兴趣的同学可以自行探索。
hogwarts-browser-use
- 增加命令行启动支持
- 去掉 google 搜索
- 支持命令行参数配置大模型

因为 browser use 是一个代码框架,没有提供一些便捷的工具封装, 再加上 google 搜索的问题,导致用起来会比较麻烦。 为了让霍格沃兹测试开发学社的小伙伴们更方便的使用。 我们做了一个封装版,可以支持纯命令行调用,从而让大家可以轻松的使用。 它还支持通过命令行参数进行大模型的配置。 相关代码可以从学员论坛节点里找到。
命令行用法
# 依赖python 3.11以上版本
hogwarts-browser-use 打开ceshiren.com 进入搜索 点击高级搜索 搜索python
hogwarts-browser-use -m gpt-4o-mini 打开ceshiren.com 进入搜索 点击高级搜索 搜索python
hogwarts-browser-use -m mistral 打开ceshiren.com 进入搜索 点击高级搜索 搜索python
hogwarts-browser-use -m qwen2.5 打开ceshiren.com 进入搜索 点击高级搜索 搜索python
这是这个工具的基本用法,详情可参考官网文档。
Agent
Agent 是核心 Api 入口
from browser_use import Agent, ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4.1-mini"),
)
async def main():
history = await agent.run(max_steps=100)
参数配置
class Agent(Generic[Context, AgentStructuredOutput]):
@time_execution_sync('--init')
def __init__(
self,
task: str,
llm: BaseChatModel | None = None,
# Optional parameters
browser_profile: BrowserProfile | None = None,
browser_session: BrowserSession | None = None,
browser: Browser | None = None, # Alias for browser_session
tools: Tools[Context] | None = None,
controller: Tools[Context] | None = None, # Alias for tools
# Initial agent run parameters
sensitive_data: dict[str, str | dict[str, str]] | None = None,
initial_actions: list[dict[str, dict[str, Any]]] | None = None,
# Cloud Callbacks
register_new_step_callback: (
Callable[['BrowserStateSummary', 'AgentOutput', int], None] # Sync callback
| Callable[['BrowserStateSummary', 'AgentOutput', int], Awaitable[None]] # Async callback
| None
) = None,
register_done_callback: (
Callable[['AgentHistoryList'], Awaitable[None]] # Async Callback
| Callable[['AgentHistoryList'], None] # Sync Callback
| None
) = None,
register_external_agent_status_raise_error_callback: Callable[[], Awaitable[bool]] | None = None,
register_should_stop_callback: Callable[[], Awaitable[bool]] | None = None,
# Agent settings
output_model_schema: type[AgentStructuredOutput] | None = None,
use_vision: bool | Literal['auto'] = 'auto',
save_conversation_path: str | Path | None = None,
save_conversation_path_encoding: str | None = 'utf-8',
max_failures: int = 3,
override_system_message: str | None = None,
extend_system_message: str | None = None,
generate_gif: bool | str = False,
available_file_paths: list[str] | None = None,
include_attributes: list[str] | None = None,
max_actions_per_step: int = 10,
use_thinking: bool = True,
flash_mode: bool = False,
max_history_items: int | None = None,
page_extraction_llm: BaseChatModel | None = None,
injected_agent_state: AgentState | None = None,
source: str | None = None,
file_system_path: str | None = None,
task_id: str | None = None,
calculate_cost: bool = False,
display_files_in_done_text: bool = True,
include_tool_call_examples: bool = False,
vision_detail_level: Literal['auto', 'low', 'high'] = 'auto',
llm_timeout: int | None = None,
step_timeout: int = 120,
directly_open_url: bool = True,
include_recent_events: bool = False,
sample_images: list[ContentPartTextParam | ContentPartImageParam] | None = None,
final_response_after_failure: bool = True,
_url_shortening_limit: int = 25,
**kwargs,
): ...
支持模型
llm = ChatOpenAI(
model="o3",
)
llm = ChatOllama(model="llama3.1:8b")
api_key = os.getenv('MODELSCOPE_API_KEY')
base_url = 'https://api-inference.modelscope.cn/v1/'
llm = ChatOpenAI(model='Qwen/Qwen2.5-VL-72B-Instruct', api_key=api_key, base_url=base_url)
与 LangChain 集成
from langchain_openai import ChatOpenAI
from browser_use import Agent
from .chat import ChatLangchain
async def main():
"""Basic example using ChatLangchain with OpenAI through LangChain."""
# Create a LangChain model (OpenAI)
langchain_model = ChatOpenAI(
model='gpt-4.1-mini',
temperature=0.1,
)
# Wrap it with ChatLangchain to make it compatible with browser-use
llm = ChatLangchain(chat=langchain_model)
agent = Agent(
task="Go to google.com and search for 'browser automation with Python'",
llm=llm,
)
history = await agent.run()
print(history.history)
Browser
浏览器应用
from browser_use import Agent, Browser, ChatOpenAI
browser = Browser(
headless=False, # Show browser window
window_size={'width': 1000, 'height': 700}, # Set window size
)
agent = Agent(
task='Search for Browser Use',
browser=browser,
llm=ChatOpenAI(model='gpt-4.1-mini'),
)
async def main():
await agent.run()
浏览器配置参数
def __init__(
self,
# Core configuration
id: str | None = None,
cdp_url: str | None = None,
is_local: bool = False,
browser_profile: BrowserProfile | None = None,
# BrowserProfile fields that can be passed directly
# From BrowserConnectArgs
headers: dict[str, str] | None = None,
# From BrowserLaunchArgs
env: dict[str, str | float | bool] | None = None,
executable_path: str | Path | None = None,
headless: bool | None = None,
args: list[str] | None = None,
ignore_default_args: list[str] | Literal[True] | None = None,
channel: str | None = None,
chromium_sandbox: bool | None = None,
devtools: bool | None = None,
downloads_path: str | Path | None = None,
traces_dir: str | Path | None = None,
# From BrowserContextArgs
accept_downloads: bool | None = None,
permissions: list[str] | None = None,
user_agent: str | None = None,
screen: dict | None = None,
viewport: dict | None = None,
no_viewport: bool | None = None,
device_scale_factor: float | None = None,
record_har_content: str | None = None,
record_har_mode: str | None = None,
record_har_path: str | Path | None = None,
record_video_dir: str | Path | None = None,
record_video_framerate: int | None = None,
record_video_size: dict | None = None,
# From BrowserLaunchPersistentContextArgs
user_data_dir: str | Path | None = None,
# From BrowserNewContextArgs
storage_state: str | Path | dict[str, Any] | None = None,
# BrowserProfile specific fields
use_cloud: bool | None = None,
cloud_browser: bool | None = None, # Backward compatibility alias
disable_security: bool | None = None,
deterministic_rendering: bool | None = None,
allowed_domains: list[str] | None = None,
keep_alive: bool | None = None,
proxy: ProxySettings | None = None,
enable_default_extensions: bool | None = None,
window_size: dict | None = None,
window_position: dict | None = None,
minimum_wait_page_load_time: float | None = None,
wait_for_network_idle_page_load_time: float | None = None,
wait_between_actions: float | None = None,
filter_highlight_ids: bool | None = None,
auto_download_pdfs: bool | None = None,
profile_directory: str | None = None,
cookie_whitelist_domains: list[str] | None = None,
# DOM extraction layer configuration
cross_origin_iframes: bool | None = None,
highlight_elements: bool | None = None,
dom_highlight_elements: bool | None = None,
paint_order_filtering: bool | None = None,
# Iframe processing limits
max_iframes: int | None = None,
max_iframe_depth: int | None = None,
): ...
Tools
工具调用体系
- function calling
- tool call
- tool call result
- final answer

工具自定义
- 内置工具
- 自定义工具
from browser_use import Tools, ActionResult, Browser
tools = Tools()
@tools.action('Ask human for help with a question')
def ask_human(question: str, browser: Browser) -> ActionResult:
answer = input(f'{question} > ')
return f'The human responded with: {answer}'
agent = Agent(
task='Ask human for help',
llm=llm,
tools=tools,
)
工具响应
@tools.action('My tool')
def my_tool() -> str:
return "Task completed successfully"
@tools.action('Advanced tool')
def advanced_tool() -> ActionResult:
return ActionResult(
extracted_content="Main result",
long_term_memory="Remember this info",
error="Something went wrong",
is_done=True,
success=True,
attachments=["file.pdf"],
)
基于 CDP 的浏览器自动化框架 Actor
Actor 架构
因为 Playwright 的稳定性和性能问题原因,Browser Use 开发了一个新的自动化框架。 基于 CDP 协议,具有直接和完整的 CDP 控制和精确的元素交互。
graph TD
A[Browser] --> B[Page]
B --> C[Element]
B --> D[Mouse]
B --> E[AI Features]
C --> F[DOM Interactions]
D --> G[Coordinate Operations]
E --> H[LLM Integration]
{.bg-white}
基本自动化
from browser_use import Browser
browser = Browser()
await browser.start()
# Create pages
page = await browser.new_page() # Blank tab
page = await browser.new_page("https://example.com") # With URL
# Get all pages
pages = await browser.get_pages()
current = await browser.get_current_page()
# Close page
await browser.close_page(page)
await browser.stop()
元素操作
page = await browser.new_page('https://github.com')
# CSS selectors (immediate return)
elements = await page.get_elements_by_css_selector("input[type='text']")
buttons = await page.get_elements_by_css_selector("button.submit")
# Element actions
await elements[0].click()
await elements[0].fill("Hello World")
await elements[0].hover()
# Page actions
await page.press("Enter")
screenshot = await page.screenshot()
与 LLM 结合
from browser_use.llm.openai import ChatOpenAI
from pydantic import BaseModel
llm = ChatOpenAI(api_key="your-api-key")
# Find elements using natural language
button = await page.get_element_by_prompt("login button", llm=llm)
await button.click()
# Extract structured data
class ProductInfo(BaseModel):
name: str
price: float
product = await page.extract_content(
"Extract product name and price",
ProductInfo,
llm=llm
)