Content
# Android MCP (模型上下文协议) 框架
This is an innovative Android device control and application self-learning framework based on Model Context Protocol (MCP). The framework can control Android devices through natural language instructions and automatically learn how to operate applications on the device.
## Project Overview
The framework mainly includes the following functional modules:
1. **MCP 核心协议**:Defines the basic data structure and operation types for device control
2. **设备通信接口**:Provides TCP communication with Android devices
3. **应用自学习引擎**:Automatically learns and remembers the UI structure and operation methods of Android applications
4. **深度应用探索**:Provides deeper application UI exploration and element detection functions
5. **自然语言理解**:Parses user instructions through AI models and converts them into device operation sequences
6. **HTTP API**:Provides RESTful API interfaces for easy integration into other systems
## System Architecture
```
+----------------+
| HTTP APIs |
+-------+--------+
|
+----------------+v+----------------+
| Model Interface |
| (自然语言理解和操作生成) |
+----------------+-----------------+
|
+------------+ +----------v-----------+ +--------------+
| App Learner |<-| MCP Context |->| MCP Protocol |
+------^------+ +----------+-----------+ +--------------+
| |
| +----------v-----------+
+-------->| App Deep Explorer |
+----------------------+
|
+---------v----------+
| MCP Server |
+---------+----------+
|
+---------v----------+
| Android Device |
+--------------------+
```
## Technology Stack
- **后端**:Python, Flask
- **AI模型**:Supports any model of OPEN AI sdk
- **通信协议**:Custom TCP protocol, HTTP RESTful API
- **存储**:JSON files (application knowledge base)
## Installation Requirements
- Python 3.8+
- Flask
- OpenAI Python SDK (Supports any model of OPEN AI sdk)
- Android device or emulator (requires a matching client application to be installed)
## Installation Steps
1. Clone the repository to your local machine:
```bash
git clone https://github.com/lmee/mcp_for_android.git
cd mcp_for_android
```
2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Configure API key:
Update the API key of the model used in `main.py`, or set the environment variable:
```bash
export DEEPSEEK_API_KEY="your-api-key-here"
```
## Running the Server
Start the MCP server and Flask application:
```bash
python main.py
```
The server will run on the following ports:
- MCP TCP server: 8080
- HTTP API server: 5000
## API Interface Description
### Device Registration
```
POST /register_device
{
"device_id": "your-device-id",
"capabilities": ["click", "swipe", "type_text", ...]
}
```
### Execute Command
```
POST /execute
{
"device_id": "your-device-id",
"command": "打开微信并发送消息给张三",
"session_id": "optional-session-id"
}
```
### Learn Application
```
POST /learn_app
{
"device_id": "your-device-id",
"package_name": "com.example.app" // Optional, omit to learn all applications
}
```
### Text Analysis
```
POST /analyze
{
"text": "打开微信发消息",
"device_id": "your-device-id" // Optional
}
```
### Get System Status
```
GET /status
```
## Client Settings
To use this framework, you need to install the matching MCP client application on your Android device. The client is responsible for:
1. Connecting to the MCP server
2. Receiving and executing operation instructions
3. Providing device UI status information
4. Supporting application exploration and learning
Client installation steps will be provided in a separate document.
## Application Self-Learning Function
One of the core features of this framework is the ability to automatically learn how to operate applications on the device. The learning process includes:
1. **应用发现**:Scanning installed applications on the device
2. **UI探索**:Launching the application and exploring its UI structure
3. **元素识别**:Identifying key UI elements in the application (buttons, input boxes, etc.)
4. **操作学习**:Learning common operations (search, play, navigation, etc.)
5. **知识存储**:Saving the learned knowledge to the application knowledge base
After learning, the system can automatically execute corresponding application operations according to the user's natural language instructions.
## Example Usage
### 1. Control the device using natural language
```python
import requests
api_url = "http://localhost:5000/execute"
data = {
"device_id": "my-android-phone",
"command": "打开微信发送'你好'给张三"
}
response = requests.post(api_url, json=data)
print(response.json())
```
### 2. Learn a specific application
```python
import requests
api_url = "http://localhost:5000/learn_app"
data = {
"device_id": "my-android-phone",
"package_name": "com.tencent.mm" # WeChat package name
}
response = requests.post(api_url, json=data)
print(response.json())
```
## Deep Application Exploration
In addition to basic application learning functions, the system also provides deep application exploration functions, which can:
1. Wait for the application to fully load
2. Detect more types of UI elements
3. Support hierarchical exploration, access more screens by clicking on key elements
4. Generate a more complete application knowledge graph
Through deep exploration, the system can obtain more comprehensive application knowledge and improve control accuracy.
## Project Structure
```
android-mcp-framework/
├── mcp/
│ ├── mcp_protocol.py # Protocol definition
│ ├── mcp_interface.py # MCP server implementation
│ ├── model_interface.py # Model interface
│ └── route_handler.py # API route handling
├── app_learn/
│ ├── app_learner.py # Application learning engine
│ └── app_deep_explorer.py # Deep exploration module
├── main.py # Main program
├── requirements.txt # List of dependencies
└── README.md # Project description
```
## Precautions
1. This project only provides ideas and solutions. The code is sample code and is not suitable for out-of-the-box use.
2. This framework requires a matching Android client to work.
3. Some applications may have anti-crawling or security mechanisms that may limit automated operations.
4. Requires a valid DeepSeek AI API key.
5. The application knowledge base will grow with learning, so ensure there is enough storage space.
## Problems to be solved
1. The accuracy of application learning needs to be improved
2. After setting a large number of prompt words, the reasoning speed of the model is very slow
## Contribution Guide
Contributions and issue reports are welcome! Please follow these steps:
1. Fork the project
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Create a Pull Request
## License
This project is licensed under the MIT License - see the LICENSE file for details
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.