Content
# Windows MCP.Net
[English](README.en.md) | [English](README.en.md)
A .NET-based Windows desktop automation MCP (Model Context Protocol) server that provides AI assistants with the ability to interact with the Windows desktop environment.
## Table of Contents
- [Features](#features)
- [Use Cases](#use-cases)
- [Demo Screenshots](#demo-screenshots)
- [Tech Stack](#tech-stack)
- [Project Structure](#project-structure)
- [Feature Extension Suggestions](#feature-extension-suggestions)
- [Configuration](#configuration)
- [Contribution Guide](#contribution-guide)
- [Changelog](#changelog)
- [Support](#support)
## Quick Start
### Prerequisites
- Windows Operating System
- .NET 10.0 Runtime or higher
**Important Note**: This project requires .NET 10 to run. Please ensure that .NET 10 is installed on your local machine. If not installed, please visit the [.NET 10 Download Page](https://dotnet.microsoft.com/en-us/download/dotnet/10.0) to download and install.
### 1. MCP Client Configuration
Add the following configuration to your MCP client configuration:
#### Using Globally Installed Tools (Recommended)
```json
{
"mcpServers": {
"WindowsMCP.Net": {
"type": "stdio",
"command": "dnx",
"args": ["WindowsMCP.Net@", "--yes"],
"env": {}
}
}
}
```
#### Using Project Source Code Directly (Development Mode)
**Method 1: Workspace Configuration**
Create a `.vscode/mcp.json` file in the project root directory:
```json
{
"mcpServers": {
"Windows-MCP.Net-Dev": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"cwd": "${workspaceFolder}",
"env": {}
}
}
}
```
**Method 2: User Configuration**
Run `MCP: Open User Configuration` through the VS Code command panel and add:
```json
{
"mcpServers": {
"Windows-MCP.Net-Local": {
"type": "stdio",
"command": "dotnet",
"args": ["run", "--project", "src/Windows-MCP.Net.csproj"],
"env": {}
}
}
}
```
> **Note**: Using project source code is convenient for development and debugging. After modifying the code, there is no need to reinstall it, and the changes will take effect. VS Code version 1.102+ supports automatic discovery and management of MCP servers.
### 2. Installation and Running
#### Method 1: Global Installation (Recommended)
```bash
dotnet tool install --global WindowsMCP.Net
```
#### Method 2: Running from Source Code
```bash
# Clone repository
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
# Build project
dotnet build
# Run project
dotnet run --project src/Windows-MCP.Net.csproj
```
### 3. Getting Started
After completing the configuration, restart your MCP client to start using the Windows desktop automation feature!
## Features
### Core Features
- **Application Launch**: Launch applications from the start menu by name
- **PowerShell Integration**: Execute PowerShell commands and return results
- **Desktop State Capture**: Get the current desktop state, including active applications, UI elements, etc.
- **Clipboard Operations**: Copy and paste text content
- **Mouse Operations**: Click, drag, and move the mouse cursor
- **Keyboard Operations**: Text input, key operations, and shortcut combinations
- **Window Management**: Adjust window size, position, and switch applications
- **Scrolling Operations**: Perform scrolling operations at specified coordinates
- **Web Scraping**: Get web content and convert it to Markdown format
- **Browser Operations**: Open a specified URL in the default browser
- **Screenshot Function**: Capture the screen and save it to a temporary directory
- **File System Operations**: Create, read, write, copy, move, and delete files and directories
- **OCR Text Recognition**: Extract text from the screen or a specified region, and find text positions
- **System Control**: Adjust screen brightness, system volume, screen resolution, and other system settings
- **Wait Control**: Add delays between operations
### Supported Tools
## Desktop Operation Tools
| Tool Name | Function Description |
|---------|----------|
| **LaunchTool** | Launch applications from the start menu |
| **PowershellTool** | Execute PowerShell commands and return status codes |
| **StateTool** | Capture desktop state information, including applications and UI elements |
| **ClipboardTool** | Clipboard copy and paste operations |
| **ClickTool** | Mouse click operations (support left, right, middle buttons, single, double, triple clicks) |
| **TypeTool** | Input text at specified coordinates, support clearing and enter |
| **ResizeTool** | Adjust window size and position |
| **SwitchTool** | Switch to a specified application window |
| **ScrollTool** | Scroll at specified coordinates or current mouse position |
| **DragTool** | Drag from source coordinates to target coordinates |
| **MoveTool** | Move the mouse cursor to specified coordinates |
| **ShortcutTool** | Execute keyboard shortcut combinations |
| **KeyTool** | Press a single keyboard key |
| **WaitTool** | Pause execution for a specified number of seconds |
| **ScrapeTool** | Scrape web content and convert it to Markdown format |
| **ScreenshotTool** | Capture the screen and save it to a temporary directory, return image path |
| **OpenBrowserTool** | Open a specified URL in the default browser |
## File System Tools
| Tool Name | Function Description |
|---------|----------|
| **ReadFileTool** | Read the content of a specified file |
| **WriteFileTool** | Write content to a file |
| **CreateFileTool** | Create a new file and write specified content |
| **CopyFileTool** | Copy a file to a specified location |
| **MoveFileTool** | Move or rename a file |
| **DeleteFileTool** | Delete a specified file |
| **GetFileInfoTool** | Get file information (size, creation time, etc.) |
| **ListDirectoryTool** | List files and subdirectories in a directory |
| **CreateDirectoryTool** | Create a new directory |
| **DeleteDirectoryTool** | Delete a directory and its content |
| **SearchFilesTool** | Search for files in a specified directory |
## OCR Image Recognition Tools
| Tool Name | Function Description |
|---------|----------|
| **ExtractTextFromScreenTool** | Extract text from the entire screen using OCR |
| **ExtractTextFromRegionTool** | Extract text from a specified region of the screen using OCR |
| **FindTextOnScreenTool** | Find specified text on the screen using OCR |
| **GetTextCoordinatesTool** | Get the coordinate position of text on the screen |
| **ExtractTextFromFileTool** | Extract text from an image file using OCR |
## UI Element Recognition Tools
| Tool Name | Function Description |
|---------|----------|
| **FindElementByTextTool** | Find UI elements by text content |
| **FindElementByClassNameTool** | Find UI elements by class name |
| **FindElementByAutomationIdTool** | Find UI elements by automation ID |
| **GetElementPropertiesTool** | Get property information of specified coordinate elements |
| **WaitForElementTool** | Wait for a specified element to appear on the interface |
## System Control Tools
| Tool Name | Function Description |
|---------|----------|
| **BrightnessTool** | Adjust screen brightness, support increase and decrease, and set specific percentage |
| **VolumeTool** | Adjust system volume, support increase and decrease, and set specific percentage |
| **ResolutionTool** | Set screen resolution (high, medium, low three gears) |
## Use Cases
### AI Assistant Desktop Automation
- **Intelligent Customer Service Robot**: AI assistants can automatically operate Windows applications to help users complete complex desktop tasks
- **Voice Assistant Integration**: Combine voice recognition with voice commands to control desktop applications
- **Intelligent Office Assistant**: AI assistants automatically handle daily office tasks, such as document sorting, email sending, etc.
### Office Automation
- **Data Entry Automation**: Automatically extract data from web pages or documents and enter it into Excel or other applications
- **Report Generation**: Automatically collect system information, screenshots, and generate formatted report documents
- **Batch File Processing**: Automatically organize, rename, and classify a large number of files and documents
- **Email Automation**: Automatically send regular reports and notification emails
### Software Testing and Quality Assurance
- **UI Automation Testing**: Simulate user operations and automatically test the functions of desktop applications
- **Regression Testing**: Automatically execute repetitive test cases to ensure software quality
- **Performance Monitoring**: Automatically collect application performance data and generate monitoring reports
- **Bug Reproduction**: Automatically reproduce user-reported issues and assist developers in debugging
### Business Process Automation
- **Customer Service**: Automatically process customer requests and update CRM systems
- **Order Processing**: Automatically collect order information from multiple channels and enter it into the system
- **Inventory Management**: Automatically update inventory data and generate replenishment reminders
- **Financial Reconciliation**: Automatically compare financial data from different systems and mark differences
### Data Collection and Analysis
- **Web Data Scraping**: Automatically collect product prices, news, and other information from multiple websites
- **Competitor Analysis**: Regularly collect competitor product information and price data
- **Market Research**: Automatically collect and organize market data and generate analysis reports
- **Social Media Monitoring**: Monitor brand mentions and automatically collect user feedback
### Game and Entertainment
- **Game Assistant**: Automatically execute repetitive game tasks (please follow game rules)
- **Live Assistant**: Automatically manage live software, switch scenes, and send messages
- **Media Management**: Automatically organize music and video files and update media libraries
### Medical and Health
- **Medical Record Entry**: Automatically convert paper medical records to electronic format
- **Medical Image Analysis**: Combine OCR technology to automatically extract key information from medical reports
- **Appointment Management**: Automatically process patient appointment requests and update hospital management systems
### Education and Training
- **Online Examination**: Automatically grade multiple-choice questions and generate score reports
- **Course Management**: Automatically update course information and send notifications to students
- **Learning Progress Tracking**: Automatically record student learning activities and generate progress reports
### Manufacturing and Logistics
- **Production Data Collection**: Automatically collect data from production equipment and update ERP systems
- **Quality Inspection**: Combine image recognition to automatically detect product quality
- **Logistics Tracking**: Automatically update cargo status and send tracking information to customers
### System Operation and Maintenance
- **Server Monitoring**: Automatically check server status and generate monitoring reports
- **Log Analysis**: Automatically analyze system logs and identify abnormal patterns
- **Backup Management**: Automatically execute data backup and verify backup integrity
- **Software Deployment**: Automate software installation and configuration processes
## Demo Screenshots
### Text Input Demo
Automatically input text in Notepad using TypeTool:

### Web Search Demo
Open and search web content using ScrapeTool:

### Demo Video
Complete desktop automation operation demo:
[Web Search Demo](assets/video.mp4)
## Tech Stack
- **.NET 10.0**: Based on the latest .NET framework
- **Model Context Protocol**: Use MCP protocol for communication
- **Microsoft.Extensions.Hosting**: Application hosting framework
- **Serilog**: Structured logging
- **HtmlAgilityPack**: HTML parsing and web scraping
- **ReverseMarkdown**: HTML to Markdown conversion
## Project Structure
```
src/
├── Windows-MCP.Net/ # Main project
│ ├── .mcp/ # MCP server configuration
│ │ └── server.json # Server configuration file
│ ├── Exceptions/ # Custom exception classes (to be extended)
│ ├── Interface/ # Service interface definitions
│ │ ├── IDesktopService.cs # Desktop service interface
│ │ ├── IFileSystemService.cs # File system service interface
│ │ └── IOcrService.cs # OCR service interface
│ ├── Models/ # Data models (to be extended)
│ ├── Prompts/ # Prompt templates (to be extended)
│ ├── Services/ # Core service implementations
│ │ ├── DesktopService.cs # Desktop operation service
│ │ ├── FileSystemService.cs # File system service
│ │ └── OcrService.cs # OCR service
│ ├── Tools/ # MCP tool implementations
│ │ ├── Desktop/ # Desktop operation tools
│ │ │ ├── ClickTool.cs # Click tool
│ │ │ ├── ClipboardTool.cs # Clipboard tool
│ │ │ ├── DragTool.cs # Drag tool
│ │ │ ├── GetWindowInfoTool.cs # Window information tool
│ │ │ ├── KeyTool.cs # Key tool
│ │ │ ├── LaunchTool.cs # Launch tool
│ │ │ ├── MoveTool.cs # Mouse move tool
│ │ │ ├── OpenBrowserTool.cs # Browser open tool
│ │ │ ├── PowershellTool.cs # PowerShell execution tool
│ │ │ ├── ResizeTool.cs # Window resize tool
│ │ │ ├── ScrapeTool.cs # Web scraping tool
│ │ │ ├── ScreenshotTool.cs # Screenshot tool
│ │ │ ├── ScrollTool.cs # Scroll tool
│ │ │ ├── ShortcutTool.cs # Shortcut tool
│ │ │ ├── StateTool.cs # Desktop state tool
│ │ │ ├── SwitchTool.cs # Application switch tool
│ │ │ ├── TypeTool.cs # Text input tool
│ │ │ ├── UIElementTool.cs # UI element operation tool
│ │ │ └── WaitTool.cs # Wait tool
│ │ ├── FileSystem/ # File system tools
│ │ │ ├── CopyFileTool.cs # File copy tool
│ │ │ ├── CreateDirectoryTool.cs # Directory creation tool
│ │ │ ├── CreateFileTool.cs # File creation tool
│ │ │ ├── DeleteDirectoryTool.cs # Directory deletion tool
│ │ │ ├── DeleteFileTool.cs # File deletion tool
│ │ │ ├── GetFileInfoTool.cs # File information tool
│ │ │ ├── ListDirectoryTool.cs # Directory list tool
│ │ │ ├── MoveFileTool.cs # File move tool
│ │ │ ├── ReadFileTool.cs # File read tool
│ │ │ ├── SearchFilesTool.cs # File search tool
│ │ │ └── WriteFileTool.cs # File write tool
│ │ └── OCR/ # OCR recognition tools
│ │ ├── ExtractTextFromRegionTool.cs # Region text extraction tool
│ │ ├── ExtractTextFromScreenTool.cs # Screen text extraction tool
│ │ ├── FindTextOnScreenTool.cs # Screen text search tool
│ │ └── GetTextCoordinatesTool.cs # Text coordinate acquisition tool
│ ├── Program.cs # Program entry point
│ └── Windows-MCP.Net.csproj # Project file
└── Windows-MCP.Net.Test/ # Test project
├── DesktopToolsExtendedTest.cs # Desktop tool extended test
├── FileSystemToolsExtendedTest.cs # File system tool extended test
├── OCRToolsExtendedTest.cs # OCR tool extended test
├── ToolTest.cs # Tool basic test
├── UIElementToolTest.cs # UI element tool test
└── Windows-MCP.Net.Test.csproj # Test project file
```
## Feature Extension Suggestions
### Planned Features
#### Advanced UI Recognition and Interaction
- **Enhanced UI Element Recognition**: Support more UI frameworks (WPF, WinForms, UWP)
- **OCR Text Recognition Optimization**: Multi-language support, improve recognition accuracy
- **Intelligent Waiting Mechanism**: Dynamically wait for elements to load
#### File System Operation Enhancement
- **Advanced File Search**: Support content search, regular expression matching
- **Batch File Operation**: Support batch copy, move, rename
- **File Monitoring**: Real-time monitoring of file system changes
#### System Monitoring and Performance Analysis
- **System Resource Monitoring**: CPU, memory, disk, and network usage
- **Process Management**: Process list retrieval, performance monitoring, and process control
- **Performance Analysis Report**: Generate detailed system performance reports
#### Multimedia Processing Capabilities
- **Audio Control**: System volume control, audio device management
- **Image Processing**: Image scaling, cropping, and format conversion
- **Screen Recording**: Support for screen recording and playback
#### Network and Communication Functions
- **Network Diagnosis**: Ping, port scanning, and connectivity testing
- **HTTP Client**: Support for RESTful API calls
- **WiFi Management**: WiFi network scanning and connection management
#### Security and Permission Management
- **Permission Check**: User permission verification and management
- **Data Encryption**: Sensitive data encryption storage
- **Operation Audit**: Complete operation logs and audit tracking
### Development Roadmap
#### Phase 1 (High Priority) - Core Function Enhancement
- ✅ UI Element Recognition Tool (Completed Windows API implementation)
- 🔄 File Management Tool Enhancement
- 📋 System Monitoring Tool
- 🔒 Basic Security Tool
#### Phase 2 (Medium Priority) - Function Extension
- 📋 OCR Text Recognition Optimization
- 📋 Advanced File Search
- 📋 Audio Control Tool
- 📋 Network Diagnosis Tool
- 📋 Excel Operation Support
#### Phase 3 (Low Priority) - Advanced Functions
- 📋 Image Processing Tool
- 📋 Task Scheduling System
- 📋 Database Operation Support
- 📋 Macro Recording and Playback
## 🔧 Configuration
### Log Configuration
The project uses Serilog for logging, and log files are saved in the `logs/` directory:
- Console output: Real-time log display
- File output: Rolling by day, retaining for 31 days
- Log level: Debug and above
### Environment Variables
| Variable Name | Description | Default Value |
|--------|------|--------|
| `ASPNETCORE_ENVIRONMENT` | Runtime environment | `Production` |
## 📝 License
This project is open-sourced under the MIT License. See the [LICENSE](LICENSE) file for details.
## 🔗 Related Links
- [Model Context Protocol](https://modelcontextprotocol.io/)
- [.NET Documentation](https://docs.microsoft.com/dotnet/)
- [Windows API Documentation](https://docs.microsoft.com/windows/win32/)
## 🤝 Contribution Guide
We welcome community contributions! If you want to contribute to the project, please follow these steps:
### Development Environment Setup
1. **Clone the Repository**
```bash
git clone https://github.com/AIDotNet/Windows-MCP.Net.git
cd Windows-MCP.Net
```
2. **Install Dependencies**
```bash
dotnet restore
```
3. **Run Tests**
```bash
dotnet test
```
4. **Build the Project**
```bash
dotnet build
```
### Contribution Process
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Create a Pull Request
### Code Standards
- Follow C# coding standards
- Add unit tests for new features
- Update relevant documentation
- Ensure all tests pass
### Issue Reporting
When reporting issues, please provide:
- Operating system version
- .NET version
- Detailed error information
- Steps to reproduce
## 📞 Support
If you encounter issues or have suggestions, please:
1. Check [Issues](https://github.com/xuzeyu91/Windows-MCP.Net/issues)
2. Create a new Issue
3. Participate in discussions
4. Check the [Wiki](https://github.com/xuzeyu91/Windows-MCP.Net/wiki) for more help
**Note**: This tool requires appropriate Windows permissions to perform desktop automation operations. Ensure usage in a trusted environment.
**Disclaimer**: Use this tool for automation operations in compliance with relevant laws and software usage agreements. The developer does not assume any responsibility for misuse of the tool.
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
markitdown
Python tool for converting files and office documents to Markdown.
awesome-claude-skills
A curated list of awesome Claude Skills, resources, and tools for...
antigravity-awesome-skills
The Ultimate Collection of 130+ Agentic Skills for Claude...
context-mode
MCP is the protocol for tool access. We're the virtualization layer for context.
claude-context-mode
claude-context-mode plugin reduces MCP context bloat, saving up to 99% of tokens.