Content
Tool List
Let your Agent directly operate on the real Chrome browser with the MCP service.
It's not a sandbox browser or a simple web scraper, but a connection to your locally opened Chrome, preserving:
- Login state
- Cookies
- Opened tabs
- Real page context
Suitable for scenarios like:
- Let Hermes directly read your Xiaohongshu, backend system, knowledge base, or management page
- Automate already logged-in websites instead of logging into a stateless browser
- Switch to CDP/real mouse and keyboard operations when ordinary browser automation is unstable
- Have page scanning, JS execution, CDP control, screenshot, and physical input in one MCP tool
In one sentence:
> This project packages real browser automation into a standard MCP, allowing Agents to operate not just on sandbox browsers but also on your daily browser workflow.
## Core Capabilities
- Real Chrome tab discovery and switching
- Page scanning and simplified content extraction
- JavaScript execution within the page
- Native CDP single command/batch invocation
- Page screenshot/desktop screenshot
- Cookie reading
- Mouse movement, click, and drag
- Keyboard input and hotkeys
If you want Hermes, Claude Desktop, Cursor, or other MCP clients to directly operate on your local real browser, this project is for you.
## What This MCP Can Do
This project packages real browser automation capabilities into a standard MCP tool, focusing on:
### 1. Browser Tabs and Navigation
- View currently connected real tabs
- Switch to a specified tab
- Open a URL in the current tab
- Open a new tab
### 2. Page Reading
- Scan current page content
- Extract simplified HTML/text
- Suitable for reading information streams, post lists, search result pages
### 3. Page Execution and CDP Control
- Execute arbitrary JavaScript in the page
- Directly invoke Chrome DevTools Protocol (CDP)
- Support single and batch commands
- Can be used for screenshot, DOM query, click, file upload, and more complex operations
### 4. Screenshot Capability
- Page screenshot (via CDP)
- Desktop screenshot (for auxiliary real desktop operation)
### 5. Real Physical Input
- Mouse movement
- Mouse click
- Mouse drag
- Keyboard input
- Hotkey sending
These capabilities are suitable for handling:
- Websites that must retain login state
- Websites where ordinary browser automation tools are easily wind-controlled
- Scenarios requiring real clicks/real keyboard input
- Scenarios needing to read complex page structures
## Suitable Scenarios
For example:
- Use Hermes to read your current Xiaohongshu homepage recommendations
- Open a backend page in a real browser and scrape information
- Call CDP to screenshot a page
- When page JS is insufficient, fall back to real mouse/keyboard operations
- Let Agents directly operate on your already logged-in website instead of logging into a stateless browser
## Working Principle
The project consists of three layers:
1. Chrome Extension
- Injected into real web pages
- Access tabs/cookies/debugger/management via Chrome API
- Communicate with local bridge service
2. TMWebDriver Local Bridge
- Default listening:
- WebSocket: `127.0.0.1:18765`
- HTTP: `127.0.0.1:18766`
- Responsible for connecting extensions, maintaining sessions, and forwarding execution results
3. MCP Service
- Expose browser capabilities as MCP tools
- For Hermes, Claude Desktop, Cursor, and other clients to directly call
## Main Tools
Currently exposed main MCP tools include:
### Browser/Tabs
- `get_setup_status`
- `list_tabs`
- `switch_tab`
- `open_url`
- `open_new_tab`
- `extension_path`
- `list_extensions`
### Page Read/Execute
- `scan_page`
- `execute_js`
### CDP and Screenshot
- `cdp_command`
- `cdp_batch`
- `get_cookies`
- `capture_page_screenshot`
- `capture_desktop_screenshot`
### Physical Input
- `mouse_move`
- `mouse_click`
- `mouse_drag`
- `type_text`
- `hotkey`
- `pointer_info`
## Installation Requirements
Recommended environment:
- macOS or Windows
- Python 3.10+
- Google Chrome
- Any MCP-supported client, such as:
- Hermes Agent
- Claude Desktop
- Cursor
## Installation
Clone locally and execute:
```bash
cd agent-browser-mcp
pip install -e .
```
If you want to build a wheel and install it:
```bash
python -m pip install --upgrade build
python -m build
pip install dist/agent_browser_mcp-0.1.0-py3-none-any.whl
```
## Command Line Tool
After installation, a CLI is provided:
```bash
agent-browser-mcp
```
It has several commonly used subcommands:
### Output Chrome Extension Directory
```bash
agent-browser-mcp extension-path
```
### Output Hermes Configuration Fragment
```bash
agent-browser-mcp print-hermes-config
```
### Environment Diagnosis
```bash
agent-browser-mcp doctor
```
This command outputs JSON to help you check:
- Extension directory location
- Whether `config.js` is generated
- Port status
- Number of currently connected tabs
- Next steps and suggestions
## Chrome Extension Installation
This project contains an unpacked Chrome extension that needs to be loaded manually.
### Step 1: Get Extension Directory
```bash
agent-browser-mcp extension-path
```
### Step 2: Load in Chrome
Open:
```text
chrome://extensions
```
Then:
- Enable "Developer mode"
- Click "Load unpacked"
- Select the directory output in the previous step
### Step 3: Open a Normal Webpage
Note that do not stay on `about:blank`.
Please open a normal webpage in Chrome, such as:
- `https://www.baidu.com`
- `https://www.xiaohongshu.com`
Otherwise, a valid session will not be established.
## Hermes Configuration
Add the following to `~/.hermes/config.yaml`:
```yaml
mcp_servers:
agent_browser:
command: agent-browser-mcp
timeout: 120
connect_timeout: 60
```
The project also includes example files:
- `examples/hermes-config.yaml`
After configuration, restart Hermes or reload MCP.
You can verify with the following commands:
```bash
hermes mcp list
hermes mcp test agent_browser
```
If the test is successful, Hermes can discover and call these browser tools.
## Claude Desktop/Cursor Configuration
The repository also includes examples:
- `examples/claude-desktop-config.json`
- `examples/cursor-mcp.json`
The configuration structure is simple, and the core is:
```json
{
"mcpServers": {
"agent_browser": {
"command": "agent-browser-mcp",
"args": []
}
}
}
```
## Typical Usage Process
1. Install Python package
2. Load extension in Chrome
3. Open a real webpage
4. Access this service in MCP client
5. Start calling browser tools
For example, Agent can:
- Open Xiaohongshu homepage
- Read recommendations
- Scan post lists
- Screenshot pages with CDP
- Perform real mouse/keyboard operations when necessary
## Security Reminder
This project operates on your real browser and desktop.
This means:
- Mouse movement is real
- Click is real
- Input is real
- Hotkey is real
- Login state in the browser is also real
Please use it only in trusted MCP clients and Agent environments.
## Frequently Asked Questions
### 1. Hermes can see MCP service but is not connected to any tabs
Please check:
- Whether the extension is loaded in `chrome://extensions`
- Whether a normal webpage is opened in Chrome
- Whether you are just staying on `about:blank`
You can also run:
```bash
agent-browser-mcp doctor
```
### 2. `connected_tabs` is 0
Usually due to one of the following reasons:
- Extension not loaded successfully
- No normal webpage currently
- Extension just reloaded, page not refreshed
Suggestions:
- Refresh current webpage
- Open a new normal URL
- Run `doctor` again
### 3. Physical input does not work on macOS
Please grant system permissions to the terminal/MCP client:
- Accessibility
- Screen recording (if you need desktop screenshot)
### 4. `hermes mcp test agent_browser` fails
Please check:
- Whether the package is installed successfully
- Whether `agent-browser-mcp` is in PATH
- Whether Hermes configuration is correct
- Run `agent-browser-mcp doctor` to see diagnostic output
## Acknowledgements
The browser automation capability of this project is extracted and repackaged into an MCP service from GenericAgent's browser stack.
Special thanks to the GenericAgent project and its author for providing the original implementation ideas and core capability sources.
Original project address:
- https://github.com/lsdefine/GenericAgent
The following parts of this project come from or are adapted from GenericAgent:
- `TMWebDriver.py`
- `simphtml.py`
- `tmwd_cdp_bridge` Chrome extension resources
If you continue to develop or publish based on this project, it is also recommended to retain the acknowledgment and source description of GenericAgent.
## License
MIT
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
servers
Model Context Protocol Servers
cc-switch
All-in-One Assistant for Claude Code, Codex & Gemini CLI across platforms.
Time
A Model Context Protocol server for time and timezone conversions.