Content
```html
<!-- markdownlint-disable MD033 MD041 MD024 -->
<p align="center">
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/MaaAssistantArknights/design@main/logo/maa-logo_512x512.png" width="256" height="256" />
</p>
<div align="center">
# MaaMCP



[](https://github.com/MaaXYZ/MaaFramework)
[](https://pypi.org/project/maa-mcp/)
MCP Server based on [MaaFramework](https://github.com/MaaXYZ/MaaFramework)
Provides Android device and Windows desktop automation capabilities for AI assistants
[English](README_EN.md) | 中文
</div>
---
## Introduction
MaaMCP is an MCP server that exposes the powerful automation capabilities of MaaFramework to AI assistants (such as Claude) through a standardized MCP interface. Through this server, AI assistants can:
- 🤖 **Android Automation** - Connect to and control Android devices/emulators via ADB
- 🖥️ **Windows Automation** - Control Windows desktop applications
- 🎯 **Background Operation** - Screenshots and controls on Windows run in the background without occupying the mouse and keyboard, allowing you to continue using your computer for other tasks
- 🔗 **Multi-Device Collaboration** - Control multiple devices/windows simultaneously to achieve cross-device automation
- 👁️ **Intelligent Recognition** - Use OCR to recognize text content on the screen
- 🎯 **Precise Operation** - Perform actions such as clicking, swiping, text input, and key presses
- 📸 **Screenshot** - Obtain real-time screenshots for visual analysis
Talk is cheap, please see: **[🎞️ Bilibili Video Demonstration](https://www.bilibili.com/video/BV1eGmhBaEZz/)**
## Features
### 🔍 Device Discovery and Connection
- `find_adb_device_list` - Scan for available ADB devices
- `find_window_list` - Scan for available Windows windows
- `connect_adb_device` - Connect to an Android device
- `connect_window` - Connect to a Windows window
### 👀 Screen Recognition
- `screencap_and_ocr` - Optical Character Recognition (efficient, recommended for priority use)
- `screencap_only` - Screenshot, then processed by large model vision (use on demand, high token cost)
### 🎮 Device Control
- `click` - Click the specified coordinates (supports multi-touch/mouse button selection, long press)
- Supports specifying mouse buttons on Windows: left, right, middle
- `double_click` - Double-click the specified coordinates
- `swipe` - Swipe gesture (preferred for scrolling/paging on Android devices)
- `input_text` - Input text
- `click_key` - Key operation (supports long press)
- Android can simulate system keys: Back (4), Home (3), Menu (82), Volume keys, etc.
- Windows supports virtual key codes: Enter (13), ESC (27), arrow keys, etc.
- `keyboard_shortcut` - Keyboard shortcut
- Supports combination keys: Ctrl+C, Ctrl+V, Alt+Tab, etc.
- `scroll` - Mouse wheel (Windows only)
### 📝 Pipeline Generation and Execution
- `get_pipeline_protocol` - Get the Pipeline protocol documentation
- `save_pipeline` - Save Pipeline JSON to a file (supports creating and updating)
- `load_pipeline` - Read an existing Pipeline file
- `run_pipeline` - Run the Pipeline and return the execution result
- `open_pipeline_in_browser` - Open the Pipeline visualization interface in a browser
## Quick Start
### Installation
#### Method 1: Install via uv (Recommended)
You need to install [uv](https://docs.astral.sh/uv/#installation) first
```bash
uvx maa-mcp
```
#### Method 2: Install via pip
```bash
pip install maa-mcp
```
#### Method 3: Install from source
1. **Clone the repository**
```bash
git clone https://github.com/MistEO/MaaMCP.git
cd MaaMCP
```
2. **Install Python dependencies**
```bash
pip install -e .
```
### Configure Client
In software such as Cursor, add the MCP server:
```json
{
"mcpServers": {
"MaaMCP": {
"command": "maa-mcp"
}
}
}
```
Or
In software such as Cherry Studio, add the MCP command:
```shell
uvx maa-mcp
```
## Usage Examples
After configuration, you can use it like this:
**Android Automation Example:**
```text
Please use the MaaMCP tool to help me connect to my Android device, open Meituan to order a takeaway for me, I want Chinese food, for one person, around 20 yuan
```
**Windows Automation Example:**
```text
Please use the MaaMCP tool to see how I can add a rotation effect to this PPT page, show me how to do it
```
**Pipeline Generation Example:**
```text
Please use the MaaMCP tool to connect to my device, help me open the settings, enter the display settings, and adjust the brightness to 50%.
After the operation is completed, help me generate the Pipeline JSON for this process so that it can be run directly later.
```
MaaMCP will automatically:
1. Scan available devices/windows
2. Establish a connection
3. Automatically download and load OCR resources
4. Perform recognition and operation tasks
## Large Model Prompt
If you want AI to complete automation tasks quickly and efficiently, and don't want to see detailed explanations of the running process, you can add the following to your prompt:
```
# Role: UI Automation Agent
## Workflow Optimization Rules
1. **Minimize Round-Trips**: Your goal is to complete the task with the fewest number of interactions.
2. **Critical Pattern**: When it comes to form/chat input, you must follow the atomic operation sequence of **[Click Focus -> Input Text -> Send Key]**.
- 🚫 Wrong way: Click first, wait for the result; then Input, wait for the result; then Press Enter.
- ✅ Correct way: After `click`, there is no need to wait for a return, directly append `input_text` and `click_key` in the same `tool_calls` list according to logic inference.
## Communication Style
- **NO YAPPING**: Do not repeat the user's instructions, do not explain your steps.
- **Direct Execution**: Receive instructions -> (internal thinking) -> directly output JSON tool calls.
```
### Performance Suggestions
For the fastest running speed, it is recommended to use **Flash versions** of large language models (such as Claude 3 Flash), which can significantly improve response speed while maintaining a high level of intelligence.
## Workflow
MaaMCP follows a simple operation process and supports multi-device/multi-window collaboration:
```mermaid
graph LR
A[Scan Devices] --> B[Establish Connection]
B --> C[Perform Automation Operations]
```
1. **Scan** - Use `find_adb_device_list` or `find_window_list`
2. **Connect** - Use `connect_adb_device` or `connect_window` (can connect multiple devices/windows to get multiple controller IDs)
3. **Operate** - Perform OCR, click, swipe, and other automation operations on multiple devices/windows by specifying different controller IDs
## Pipeline Generation Function
MaaMCP supports AI to convert executed operations into [MaaFramework Pipeline](https://maafw.xyz/docs/3.1-PipelineProtocol) JSON format, realizing **operate once, reuse infinitely**.
### Working Principle
```mermaid
graph LR
A[AI Executes Operation] --> B[Operation Completed]
B --> C[AI Reads Pipeline Documentation]
C --> D[AI Intelligently Generates Pipeline]
D --> E[Save JSON File]
E --> F[Run Verification]
F --> G{Successful?}
G -->|Yes| H[Complete]
G -->|No| I[Analyze Failure Reason]
I --> J[Modify Pipeline]
J --> F
```
1. **Execute Operation** - AI normally executes OCR, click, swipe, and other automation operations
2. **Get Documentation** - Call `get_pipeline_protocol` to get the Pipeline protocol specification
3. **Intelligent Generation** - AI converts **valid operations** into Pipeline JSON according to the documentation specification
4. **Save File** - Call `save_pipeline` to save the generated Pipeline
5. **Run Verification** - Call `run_pipeline` to verify whether the Pipeline is running normally
6. **Iterative Optimization** - Analyze the cause of the failure based on the running results and modify the Pipeline until it succeeds
### Advantages of Intelligent Generation
Compared with mechanical recording, AI intelligent generation has the following advantages:
- **Only Keep Successful Paths**: If multiple paths are tried during the operation process (such as entering menu A first and not finding it, and then returning and entering menu B to find it), AI will only keep the final successful path and remove the failed attempts
- **Understand Operation Intent**: AI can understand the purpose of each operation and generate semantically clear node names
- **Optimize Recognition Conditions**: Intelligently set the recognition area and matching conditions based on OCR results
- **Verification and Iteration**: Discover problems through running verification, automatically fix and enhance robustness
### Verification and Iterative Optimization
After the Pipeline is generated, AI will automatically verify and optimize it:
1. **Run Verification** - Execute the Pipeline to check whether it is successful
2. **Failure Analysis** - If it fails, analyze which node failed and the reason
3. **Intelligent Repair** - Common optimization methods:
- Add alternative recognition nodes (add multiple candidates to the next list)
- Relax OCR matching conditions (use regular expressions or partial matching)
- Adjust the roi recognition area
- Increase the waiting time (post_delay)
- Add intermediate state detection nodes
4. **Re-verification** - Run again after modification until stable success
If it is found that there is a problem with the Pipeline logic itself, AI can also re-execute the automation operation and combine new and old experiences to generate a more complete Pipeline.
### Example Output
```json
{
"Start Task": {
"recognition": "DirectHit",
"action": "DoNothing",
"next": ["Click Settings"]
},
"Click Settings": {
"recognition": "OCR",
"expected": "Settings",
"action": "Click",
"next": ["Enter Display"]
},
"Enter Display": {
"recognition": "OCR",
"expected": "Display",
"action": "Click",
"next": ["Adjust Brightness"]
},
"Adjust Brightness": {
"recognition": "OCR",
"expected": "Brightness",
"action": "Swipe",
"begin": [200, 500],
"end": [400, 500],
"duration": 200
}
}
```
## Precautions
📌 **Windows Automation Limitations**:
- Anti-cheat mechanisms in some games or applications may intercept background control operations
- If the target application is running with administrator privileges, MaaMCP also needs to be started with administrator privileges
- Does not support operating on minimized windows, please keep the target window in a non-minimized state
- If the default background screenshot/input method is unavailable (such as the screenshot is empty, the operation is unresponsive), the AI assistant may try to switch to the foreground method, which will occupy the mouse and keyboard
## Common Issues
### OCR recognition fails, reporting "Failed to load det or rec" or prompting that the resource does not exist
The OCR model file will be downloaded automatically for the first time. However, download failures may occur. Please check the data directory:
- Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\resource\model\ocr\`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/resource/model/ocr/`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/resource/model/ocr/`
1. Check whether there are model files in the above directory (`det.onnx`, `rec.onnx`, `keys.txt`)
2. Check whether there are resource download exceptions in `model/download.log`
3. Manually execute `python -c "from maa_mcp.download import download_and_extract_ocr; download_and_extract_ocr()"` to try downloading again
### About ISSUE
When submitting a problem, please provide the log file. The log file path is as follows:
- Windows: `C:\Users\<Username>\AppData\Local\MaaXYZ\MaaMCP\debug\maa.log`
- macOS: `~/Library/Application Support/MaaXYZ/MaaMCP/debug/maa.log`
- Linux: `~/.local/share/MaaXYZ/MaaMCP/debug/maa.log`
## License
This project is licensed under the [GNU AGPL v3](LICENSE) license.
## Acknowledgments
- **[MaaFramework](https://github.com/MaaXYZ/MaaFramework)** - Provides a powerful automation framework
- **[FastMCP](https://github.com/jlowin/fastmcp)** - Simplifies MCP server development
- **[Model Context Protocol](https://modelcontextprotocol.io/)** - Defines AI tool integration standards
```
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.