Content
# McpDocServer
[English Documentation](README_EN.md)
A development documentation server based on the MCP protocol, specifically designed for various development framework documentation. It provides features such as multi-threaded document crawling, local document loading, keyword search, and document detail retrieval.
## Core Function Demonstration
### 1. Document Crawling Demonstration

*The complete document crawling process from configuration to executing `npm run crawl`*
### 2. MCP Server Call Demonstration

*The process of querying APIs in Cursor and obtaining accurate document results*
## Solving the Cursor Illusion Problem
When using Cursor for various framework development, it is common to encounter the "illusion" problem caused by AI's imprecise understanding of framework APIs:
- **Accuracy Issues**: AI may recommend non-existent or outdated framework APIs and components.
- **Version Confusion**: Mixing API documentation from different versions, leading to code that cannot run properly.
- **Parameter Errors**: Misunderstanding of method parameters, especially for framework-specific features.
- **Compatibility Misjudgment**: Inability to accurately determine the compatibility of an API across different environments or platforms.
This MCP server effectively addresses the above issues by providing precise document retrieval capabilities:
- **Real-time Accurate Queries**: Directly obtain the latest accurate API information from official documentation sources.
- **Contextual Association**: Display related API and component documentation together, providing complete references.
- **Precise Parameter Matching**: Provide complete method signatures and parameter lists to eliminate parameter errors.
- **Cross-platform Compatibility Marking**: Clearly indicate the compatibility of APIs across different platforms.
- **Example Code**: Provide official example code to ensure correct usage.
By integrating this MCP server, the accuracy and efficiency of Cursor in various framework development processes can be significantly improved, avoiding development obstacles caused by "illusions."
## Features
- Supports loading framework documentation data from local JSON files.
- Provides powerful document search functionality.
- Offers document detail queries.
- Automatically identifies available document sources.
- Supports targeted queries for specific document sources.
- Supports crawling external documents and automatically converting them into locally usable formats.
- Supports reloading documents (triggered by searching for "reload").
## Directory Structure
```
/
├── server.js # Server entry file
├── docs/ # Documentation data directory
│ ├── taro-docs.json # Taro framework documentation
│ └── taroify-docs.json # Taroify component library documentation
├── scripts/ # Scripts directory
│ └── crawl.js # Document crawling script
├── tests/ # Tests directory
│ └── mcp.test.js # MCP test script
├── config/ # Configuration file directory
│ └── doc-sources.js # Document source configuration
└── package.json # Project configuration
```
### Installation and Running
If you have Chrome browser installed locally and want Puppeteer to use your existing version, you can set the PUPPETEER_SKIP_DOWNLOAD environment variable:
macOS/Linux:
```bash
export PUPPETEER_SKIP_DOWNLOAD=true
npm install
```
Windows (Command Prompt):
```bash
set PUPPETEER_SKIP_DOWNLOAD=true
npm install
```
Windows (PowerShell):
```bash
$env:PUPPETEER_SKIP_DOWNLOAD = $true
npm install
```
2. Crawling Document Data
The crawler is used to obtain framework documentation and is an important step before using the server. You need to create a crawler configuration file first, then run the crawler script.
### Creating Crawler Configuration
Create a `doc-sources.js` file in the `config` directory, following the format below:
```javascript
// config/doc-sources.js
// Document source configuration
export const docSources = [
{
// Document source name - used as the source parameter during search
name: "taro",
// Base URL of the documentation site
url: "https://docs.taro.zone/docs",
// Include patterns - specify the URL paths to crawl (an empty array means all pages)
includePatterns: [
],
// Exclude patterns - specify the URL paths not to crawl (supports regular expressions)
excludePatterns: [
/\d\.x/, // Exclude version number pages
/apis/ // Exclude API pages
]
},
{
name: "taroify",
url: "https://taroify.github.io/taroify.com/introduce/",
includePatterns: [
"/components/", // All component pages
"/components/*/", // Component subpages
"/components/*/*/" // Component sub-subpages
],
excludePatterns: []
},
{
name: "jquery",
url: "https://www.jquery123.com/",
includePatterns: [], // An empty array means crawl all pages
excludePatterns: [
/version/ // Exclude version-related pages
]
}
];
// Global crawler configuration
export const crawlerConfig = {
// Number of threads for concurrent crawling
maxConcurrency: 40,
// Page load timeout (milliseconds)
pageLoadTimeout: 30000,
// Content load timeout (milliseconds)
contentLoadTimeout: 5000,
// Whether to show the browser window (false for headless mode)
headless: false,
// Number of retries
maxRetries: 3,
// Retry delay (milliseconds)
retryDelay: 2000,
// Request delay (milliseconds)
requestDelay: 1000
};
```
### Running the Crawler
After configuration is complete, execute the following command to start the crawler:
```bash
npm run crawl
```
The crawler will automatically crawl the specified documentation site according to the configuration and save the results in a JSON format that meets the MCP server requirements.
### Crawler Output Example
After the crawler completes, a JSON file in the following format will be generated in the `docs` directory:
```javascript
{
"source": {
"name": "taro",
"url": "https://docs.taro.zone/docs"
},
"lastUpdated": "2024-05-20T12:00:00.000Z",
"pages": {
"https://docs.taro.zone/docs/components-desc": {
"title": "Component Library Description | Taro Documentation",
"content": "Page content...",
"lastCrawled": "2024-05-20T12:00:00.000Z"
},
"https://docs.taro.zone/docs/components/viewcontainer/view": {
"title": "View | Taro Documentation",
"content": "The View component is a container component...",
"lastCrawled": "2024-05-20T12:00:00.000Z"
}
// ... More pages
}
}
```
### Customizing the Crawler
If you need to customize the crawler behavior, you can modify the `scripts/crawl.js` file. You can add parsing logic for specific websites, customize content handling, or enhance crawling capabilities.
3. Start the MCP Server
```bash
npm start
```
After starting, the server will detect and load the document files in the docs directory and provide interface services through the MCP protocol. The server will output the loaded document source information and the number of pages.
4. Run Tests
```bash
npm test
```
Execute the test script to verify that the MCP server's basic functions and interfaces are working correctly.
## Document Format
Document files should be in JSON format, containing the following structure:
```javascript
{
"source": {
"name": "taro",
"url": "https://docs.taro.zone/docs"
},
"lastUpdated": "2024-03-27T12:00:00.000Z",
"pages": {
"https://docs.taro.zone/docs/components-desc": {
"title": "Component Library Description | Taro Documentation",
"content": "Page content..."
},
// More pages...
}
}
```
Document loading process:
1. The server will automatically detect and load JSON files in the `docs` directory when it starts.
2. If no documents are found in the project directory, it will attempt to load from the current working directory.
3. The page ID defaults to using the URL as the key, and there is no need to specify the url field.
4. All source names will be automatically converted to lowercase to ensure consistency.
## Crawler Functionality
The system's built-in crawler supports fetching content from various framework official documentation sites and converting it into a locally usable document format. Crawler features include:
1. **Multi-site Support**: Supports documentation sites for any framework and library, fully configurable.
2. **Selective Crawling**: Can configure include and exclude patterns to precisely control the content to be crawled.
3. **Intelligent Content Extraction**: Automatically identifies the title, body content, and structure of documentation pages.
4. **Multi-threaded Crawling**: Supports high-concurrency crawling to improve efficiency.
5. **Automatic Conversion**: Converts crawled content into a standard document JSON format.
6. **Fault Tolerance Mechanism**: Provides timeout handling and retry mechanisms to enhance stability.
## MCP Tools
The server provides the following MCP tools:
1. `search_docs` - Search documents
- Parameters:
- `query`: Search keyword (string, required)
- `source`: Document source name (string, optional)
- `limit`: Maximum number of results (number, optional, default 10)
- Special Function:
- When the query is "reload", it will trigger the reloading of documents.
2. `get_doc_detail` - Get document details
- Parameters:
- `id`: Document ID (string, required)
- `source`: Document source name (string, optional)
## Usage Example
```javascript
// Search documents
const searchRequest = {
jsonrpc: "2.0",
id: "search1",
method: "tools/call",
params: {
name: "search_docs",
arguments: {
query: "组件",
source: "taro",
limit: 5
}
}
};
// Get document details
const detailRequest = {
jsonrpc: "2.0",
id: "detail1",
method: "tools/call",
params: {
name: "get_doc_detail",
arguments: {
id: "https://docs.taro.zone/docs/components-desc",
source: "taro"
}
}
};
// Reload documents
const reloadRequest = {
jsonrpc: "2.0",
id: "reload1",
method: "tools/call",
params: {
name: "search_docs",
arguments: {
query: "reload"
}
}
};
```
## Configuring Cursor
To use this server in Cursor, you need to add the following configuration to `mcp.json`:
```json
{
"mcpServers": {
"Document MCP Server": {
"command": "node",
"args": ["/absolute/path/server.js"],
"env": { "NODE_ENV": "development" }
}
}
}
```
> Note: Please ensure to use the complete absolute path of the server file, not a relative path. The server will automatically output a configuration example suitable for Cursor upon startup.
## Testing
The project includes automated tests, which can be run with the following command:
```bash
npm test
```
The tests will check the basic functionality of the server:
- Initialize the MCP server
- Call the search tool
- Call the document detail tool
## Future Plans
The project is under continuous development, and here are the features we plan to add:
1. **Local Document Loading** - Add support for directly loading and parsing local document files without relying on network resources.
2. **Internationalization Support** - Add support for multi-language documentation.
If you have feature suggestions or encounter issues, feel free to submit an Issue or Pull Request.