Content
# NCBI Model Context Protocol (MCP)
A Python implementation of the Model Context Protocol for interacting with NCBI databases.
## Setup
1. Clone this repository
2. Install dependencies:
```
pip install -r requirements.txt
```
3. Create a `.env` file with your NCBI API key:
```
NCBI_API_KEY=your_api_key_here
NCBI_EMAIL=your_email@example.com
```
## Running the MCP Server
```
python ncbi_mcp.py
```
## Using with Cursor/Claude
Once the MCP server is running, you can interact with it using natural language in Cursor/Claude.
### Using Natural Language Queries
You can use natural language to perform searches and retrieve information:
```
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Find research articles about BRCA1"
}
}
```
Or more simply, just use the query directly:
```
@ncbi-mcp Find research articles about BRCA1
```
### Example Natural Language Queries
Here are some example natural language queries you can try:
1. Gene function information:
```
@ncbi-mcp Please summarize the function of TNF-alpha
```
2. Genome size and statistics:
```
@ncbi-mcp How big is the genome for Saccharomyces cerevisiae?
```
3. Assembly statistics:
```
@ncbi-mcp What is the reported L50 and N50 statistics for the most recent E.coli genome?
```
4. Dataset counts:
```
@ncbi-mcp How many datasets are available in the biosample database for b16f10 mouse melanoma cells?
```
5. Search for scientific articles:
```
@ncbi-mcp Find the latest research on COVID-19 vaccines
```
6. Get gene information:
```
@ncbi-mcp Tell me about the BRCA1 gene
```
7. Fetch genome information:
```
@ncbi-mcp Get genome information for Homo sapiens
```
## Testing
To test the MCP server with various queries, you can use the included test files:
```
# Test natural language query functionality (default)
.\run_test.bat
# Test all tools
.\run_test.bat all
# Test specific test file
.\run_test.bat test_all_tools.jsonl
# Test high-level tools
.\run_test.bat test_high_level_tools.jsonl
```
The test script will:
1. Start the MCP server in background
2. Send test requests from the specified file
3. Wait for a few seconds to allow processing
4. Terminate the server and display the output
This approach is used because the MCP server is designed to run continuously as a service. For manual testing without automatic termination, you can use:
```
# Run manually with any test file
type test_nlp_query.jsonl | python ncbi_mcp.py
```
The test files contain example JSON-RPC requests that simulate how Cursor/Claude would interact with the MCP server.
## Available Tools
The NCBI MCP provides both high-level tools that understand natural language and low-level tools for direct database interaction.
## Tool Usage Guidelines for LLMs
### Recommended Workflow Patterns
**For most biological queries, start with `nlp-query`** - it's the most intelligent tool that can handle complex questions and automatically route to appropriate specialized tools.
**Common Research Workflows:**
1. **Gene Analysis Workflow:**
- Start with `nlp-query` for general gene questions
- Use `summarize-gene` for comprehensive gene information
- Use `get_gene_info` for detailed structured data
- Use `ncbi-search` + `ncbi-fetch` for specific database queries
2. **Genome Analysis Workflow:**
- Use `genome-stats` for organism genome statistics
- Use `get_genome_info` for detailed genome metadata
- Use `count-datasets` to explore available genome assemblies
3. **Literature Research Workflow:**
- Use `nlp-query` for natural language literature searches
- Use `ncbi-search` with database="pubmed" for precise searches
- Use `ncbi-fetch` to get full publication details
4. **Dataset Discovery Workflow:**
- Use `count-datasets` to assess data availability
- Use `nlp-query` to explore datasets with natural language
- Use `ncbi-search` for systematic database exploration
5. **E-utilities Workflow (Advanced):**
- Use `ncbi-info` to discover available databases
- Use `ncbi-global-query` to see which databases contain your search term
- Use `ncbi-search` to find specific UIDs in target databases
- Use `ncbi-summary` to get overview information about records
- Use `ncbi-fetch` to retrieve complete records
- Use `ncbi-link` to find related records across databases
6. **Cross-Database Analysis Workflow:**
- Use `ncbi-search` to find genes of interest
- Use `ncbi-link` to find related proteins, structures, or literature
- Use `ncbi-summary` to get metadata about related records
- Use `ncbi-fetch` to retrieve detailed information
### Tool Selection Guide
**High-Level Tools (Recommended for most users):**
- **`nlp-query`**: Use for general biological questions, complex queries, and when you're unsure which tool to use
- **`summarize-gene`**: Use for comprehensive gene analysis and understanding gene function
- **`genome-stats`**: Use for genome size, assembly quality, and organism comparison
- **`count-datasets`**: Use for research planning and data availability assessment
- **`get_gene_info`**: Use for detailed, structured gene information
- **`get_genome_info`**: Use for detailed, structured genome information
**Low-Level E-utilities Tools (For advanced users):**
- **`ncbi-search` (ESearch)**: Use for precise database searches with specific filters, Boolean operators, and field qualifiers
- **`ncbi-fetch` (EFetch)**: Use to retrieve complete records after searching, supports multiple formats (GenBank, FASTA, XML)
- **`ncbi-summary` (ESummary)**: Use to get document summaries without fetching complete records
- **`ncbi-link` (ELink)**: Use to find related records across databases (e.g., gene to protein, protein to structure)
- **`ncbi-info` (EInfo)**: Use to discover available databases and their capabilities
- **`ncbi-global-query` (EGQuery)**: Use to search across all databases simultaneously
- **`ncbi-spell` (ESpell)**: Use to get spelling suggestions for search terms
- **`ncbi-citation-match` (ECitMatch)**: Use to find PMIDs from citation information
### Biological Context and Terminology
**Understanding NCBI Databases:**
- **Gene**: Contains gene records with symbols, names, functions, and genomic locations
- **Protein**: Contains protein sequences and annotations
- **Nucleotide**: Contains DNA/RNA sequences (genes, transcripts, genomic regions)
- **PubMed**: Contains scientific literature and publications
- **BioSample**: Contains biological sample metadata (tissues, cell lines, etc.)
- **BioProject**: Contains research project information
- **SRA**: Contains raw sequencing data
- **Assembly**: Contains genome assembly information
**Common Biological Terms:**
- **Gene Symbol**: Short abbreviation (e.g., BRCA1, TP53, TNF)
- **Gene ID**: Unique NCBI identifier (e.g., 672 for BRCA1)
- **Accession**: Unique sequence identifier (e.g., NM_001126114.3)
- **N50/L50**: Assembly quality metrics (larger N50 = better assembly)
- **Reference Genome**: High-quality representative genome for a species
- **Organism**: Use scientific names (Homo sapiens) or common names (human)
**Search Strategies:**
- Use specific gene symbols for precise results
- Include organism names to avoid ambiguity
- Use Boolean operators (AND, OR, NOT) for complex searches
- Use field qualifiers like [Gene], [Organism], [Protein Name] for targeted searches
### High-Level Tools
#### Natural Language Query Processor
```
tools/call
{
"name": "nlp-query",
"arguments": {
"query": "Please summarize the function of TNF-alpha"
}
}
```
#### Gene Summarizer
```
tools/call
{
"name": "summarize-gene",
"arguments": {
"gene_name": "BRCA1"
}
}
```
#### Genome Statistics
```
tools/call
{
"name": "genome-stats",
"arguments": {
"organism": "Escherichia coli"
}
}
```
#### Dataset Counter
```
tools/call
{
"name": "count-datasets",
"arguments": {
"database": "biosample",
"query": "mouse melanoma b16f10"
}
}
```
### Low-Level Tools
#### Search NCBI Databases
```
tools/call
{
"name": "ncbi-search",
"arguments": {
"database": "pubmed",
"term": "BRCA1",
"filters": {
"organism": "Homo sapiens",
"date_range": {
"start": "2020"
}
}
}
}
```
#### Fetch NCBI Records
```
tools/call
{
"name": "ncbi-fetch",
"arguments": {
"database": "gene",
"ids": ["70"],
"rettype": "gb"
}
}
```
#### Get Gene Information
```
tools/call
{
"name": "get_gene_info",
"arguments": {
"gene_id": "672"
}
}
```
#### Get Genome Information
```
tools/call
{
"name": "get_genome_info",
"arguments": {
"organism": "Homo sapiens",
"reference": true
}
}
```
## License
Apache-2.0
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
markitdown
Python tool for converting files and office documents to Markdown.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.