Content

# MCP Framework Architecture Documentation
## 1. Overview
The MCP Framework (Model Context Protocol Framework) is a middleware system designed to connect large language model (LLM) services and tools. It provides an interface compatible with the OpenAI API, allowing client applications to interact with various LLM services through a standard API while integrating multiple tool services (MCP server) to enhance the capabilities of AI assistants.
Core Features:
- Compatible with OpenAI's Chat Completion API (OpenAI Python SDK)
- Supports multiple LLM service providers
- Plugin-based tool integration architecture (ModelContextProtocol Python SDK)
- Session management
- Interactive tool invocation
## 2. System Architecture
The MCP Framework adopts a modular architecture design, primarily consisting of the following core components:
### System Architecture Diagram
```
Client Application
│
▼
┌───────────────────────────────────────────────┐
│ │
│ MCP Framework (FastAPI) │
│ │
├───────────┬─────────────┬───────────┬─────────┤
│ │ │ │ │
│ API Layer │ Chat Handler │ Session Manager │ Configuration │
│ │ │ │ │
├───────────┴──────┬──────┴───────────┴─────────┤
│ │ │
│ LLM Service Client │ MCP Manager │
│ │ │
└──────┬───────────┴────────────┬───────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────────┐
│ │ │ │
│ LLM Service │ │ MCP Tool Services │
│ (OpenAI, etc.) │ │ (File System, Time, etc.) │
│ │ │ │
└─────────────────┘ └─────────────────────────┘
```
Architecture Explanation:
- The client application interacts with the MCP Framework via HTTP API.
- The MCP Framework processes requests and coordinates LLM services and tool services.
- The LLM Service Client is responsible for communicating with different LLM providers.
- The MCP Manager is responsible for starting, managing, and invoking various tool services.
- The Session Manager maintains the client session state.
- The Configuration Manager handles system settings and parameters.
### 2.1 Core Components
#### LLM Service Client (LLMServiceClient)
Responsible for communicating with various LLM services (such as OpenAI, Anthropic, Ollama, LM Studio, etc.). Features include:
- Automatic detection of service types
- Unified API call interface
- Streaming response handling
- Health checks and model list queries
#### MCP Manager (MCPManager)
Manages connections and tool invocations for multiple MCP tool servers. Main functions:
- Start and manage multiple tool servers
- Parse tool names and route tool calls
- Handle tool execution results
- Provide a list of all available tools in the system
#### Session Manager (SessionManager)
Maintains the client session state. Functions include:
- Create and manage sessions
- Store session message history
- Handle session timeouts and automatic cleanup
- Limit the number of sessions
#### Chat Handler (ChatHandler)
Handles chat requests from clients, coordinating LLM services and tool invocations. Main functions:
- Process chat completion requests
- Manage tool interactions
- Handle streaming responses
- Construct system prompts and context
#### Web API (FastAPI Application)
Provides a REST API interface compatible with OpenAI for client-system interaction. Main endpoints:
- `/chat/completions` - Main endpoint for chat completions
- `/models` - Retrieve the list of available models
- `/health` - System health check
### 2.2 Data Models
The system uses Pydantic models to define various data structures, primarily including:
- `ChatMessage` - Format for chat messages
- `ChatCompletionRequest` - Client request
- `ChatCompletionResponse` - API response
- `ModelObject` - Model information
- `ModelListResponse` - Model list response
## 3. Workflow
### 3.1 Startup Process
1. The program entry point is `run.py`, which calls the `main()` function in `main.py`.
2. Set up the logging system.
3. Load the configuration file (`config.json`).
4. Create the FastAPI application.
5. Initialize the main components:
- LLM Service Client
- MCP Manager
- Session Manager
- Chat Handler
6. Start the configured MCP server.
7. Start the web server.
### 3.2 Request Handling Process
1. The client sends a chat completion request to the `/chat/completions` endpoint.
2. The API layer receives the request and creates a session (if it does not exist).
3. The Chat Handler enhances the request (adding tool information and system prompts).
4. The request is sent to the LLM service.
5. Process the LLM response and check for tool invocations.
6. If there are tool invocations, execute the tools and collect results.
7. Add the tool results to the context.
8. Continue interacting with the LLM until the task is completed.
9. Return the final result to the client in a streaming manner.
## 4. MCP Tool Services
The MCP Framework supports various tool servers, each providing different types of tools. The servers interact with the framework via the stdio communication protocol.
Currently configured tool services include:
- Time Service (`time`) - Provides time-related functionalities
- File System Service (`filesystem`) - File and directory operations
- Browser Automation Service (`firecrawl`) - Web scraping and automation
- SQLite Database Service (`sqlite`) - Database operations
- Sequential Thinking Service (`sequential-thinking`) - Enhances model thinking capabilities
Tool services communicate with the framework through the MCP client library (`mcp`), providing initialization, tool lists, and execution functionalities.
## 5. Configuration Management
System configuration is managed using the `config.json` file, primarily including:
- Server Configuration - Host, port, CORS settings
- LLM Service Configuration - Service URL, default model, timeout, etc.
- MCP Server Configuration - Commands, parameters, environment variables, etc.
- Session Configuration - Timeout, maximum number of sessions
- Logging Configuration - Log level and output
Configuration is loaded and managed by the `config.py` module, providing global access to configuration settings.
## 6. Special Features
### 6.1 Interactive Tool Invocation
The system supports multi-turn interactions between models and tools, implemented in `tool_interactive.py`:
- Store tool execution results
- Parse result references in parameters
- Provide context information to the model
This allows the model to:
1. Invoke tools to obtain information
2. View execution results
3. Decide the next action based on results
4. Reference previous results in subsequent tool invocations
### 6.2 LLM Service Adaptation
The system can automatically adapt to various LLM services:
- OpenAI API
- Azure OpenAI
- Anthropic
- Ollama
- LM Studio
- Qwen (通义千问)
Each service has specific authentication methods and parameter formats, which the system handles automatically.
## 7. Extensibility
The MCP Framework is designed to be easily extensible:
1. **Add New Tool Services**: Add new MCP server definitions in the configuration.
2. **Support New LLM Services**: Extend LLMServiceClient to add new service types.
3. **Increase API Functionality**: Add new endpoints in the FastAPI application.
4. **Customize System Prompts**: Modify prompt templates in ChatHandler.
## 8. Deployment
The system supports various deployment methods:
- **Local Run**: Directly execute `run.py`.
- **Docker Container**: Use the provided Dockerfile and docker-compose.yml.
- **Cloud Services**: Can be deployed on any cloud platform that supports Python.
Dependencies are defined in `requirements.txt`.
## 9. Security Considerations
- All requests to LLM services are proxied through the framework.
- MCP tool services run as subprocesses, limited by the framework's permissions.
- Sessions have a timeout mechanism to prevent resource exhaustion.
- CORS settings are supported to control access origins.
## 10. Conclusion
The MCP Framework provides a flexible and extensible architecture for integrating large language models and tool services. It enables developers to create powerful AI assistant applications while maintaining modularity and maintainability in the system.
Core Advantages:
- Standard interface compatible with OpenAI API
- Flexible tool integration mechanism
- Support for multiple LLM service providers
- Interactive tool invocation capabilities
- Modular and extensible architecture
You Might Also Like
Ollama
Ollama enables easy access to large language models on various platforms.

n8n
n8n is a secure workflow automation platform for technical teams with 400+...
OpenWebUI
Open WebUI is an extensible web interface for customizable applications.

Dify
Dify is a platform for AI workflows, enabling file uploads and self-hosting.

Zed
Zed is a high-performance multiplayer code editor from the creators of Atom.
MarkItDown MCP
markitdown-mcp is a lightweight MCP server for converting various URIs to Markdown.