Content

# Epstein Files RAG MCP Server A Model Context Protocol (MCP) server that provides Retrieval-Augmented Generation (RAG) capabilities over the Epstein Files dataset from HuggingFace. ## Features - 🔍 Semantic search over 20K+ Epstein Files documents - 🚀 Runs entirely on CPU and RAM - 💾 Vector storage on NVME via Qdrant Docker - 🎯 Uses `all-MiniLM-L6-v2` embedding model - 📦 Zero local files needed - run directly via `uv` ## Prerequisites 1. **Docker** - for running Qdrant 2. **UV** - Python package manager Install UV: ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` ## Quick Start ### 1. Start Qdrant Docker Container ```bash # Create storage directory on your NVME mkdir -p /path/to/nvme/qdrant_storage # Run Qdrant docker run -d \ --name qdrant \ -p 6333:6333 \ -p 6334:6334 \ -v /path/to/nvme/qdrant_storage:/qdrant/storage \ qdrant/qdrant ``` ### 2. Configure Your LLM Client Add this to your MCP servers configuration (e.g., Claude Desktop config): ```json { "mcpServers": { "epstein-rag": { "command": "uvx", "args": [ "--from", "git+https://github.com/justinlime/epstein-rag-mcp.git", "epstein-rag-mcp" ], "env": { "QDRANT_HOST": "localhost", "QDRANT_PORT": "6333" } } } } ``` ### 3. Restart Your LLM Client That's it! The server will automatically: - Download and install dependencies - Load the embedding model - Fetch the dataset from HuggingFace (first run only) - Create embeddings and index them in Qdrant - Be ready to answer queries ## Usage Once configured, your LLM can use the `search_epstein_files` tool: **Example queries:** - "Find documents mentioning flight logs" - "Search for references to specific individuals" - "What documents discuss financial transactions?" ## Configuration ### Environment Variables - `QDRANT_HOST` - Qdrant server host (default: `localhost`) - `QDRANT_PORT` - Qdrant server port (default: `6333`) ### First Run The first time you run the server, it will: 1. Download the `all-MiniLM-L6-v2` model (~80MB) 2. Fetch the Epstein Files dataset from HuggingFace (~100MB) 3. Generate embeddings for all documents (10-30 minutes on CPU) 4. Index them in Qdrant Subsequent runs will be instant as the data is persisted in Qdrant. ## Development ### Local Installation ```bash git clone https://github.com/yourusername/epstein-rag-mcp.git cd epstein-rag-mcp uv pip install -e . ``` ### Running Locally ```bash uv run epstein-rag-mcp ``` ## Docker Management ```bash # View logs docker logs qdrant # Stop Qdrant docker stop qdrant # Start Qdrant docker start qdrant # Access web UI # Open http://localhost:6333/dashboard ``` ## Troubleshooting ### "Failed to connect to Qdrant" Make sure the Docker container is running: ```bash docker ps | grep qdrant ``` ### Port Already in Use Change the port mapping: ```bash docker run -d --name qdrant -p 6335:6333 ... ``` Then update `QDRANT_PORT` to `6335` in your config. ### Slow Indexing This is normal on CPU. The first run can take 10-30 minutes depending on your hardware. Reduce `BATCH_SIZE` in the code if you run out of memory. ## Architecture ``` ┌─────────────┐ │ LLM Client │ └──────┬──────┘ │ MCP Protocol │ ┌──────▼──────────────────┐ │ epstein-rag-mcp │ │ ┌──────────────────┐ │ │ │ Sentence │ │ │ │ Transformer │ │ │ │ (all-MiniLM-L6) │ │ │ └─────────┬────────┘ │ │ │ │ │ ┌─────────▼────────┐ │ │ │ Dataset Loader │ │ │ │ (HuggingFace) │ │ │ └──────────────────┘ │ └────────┬─────────────────┘ │ ┌────▼─────┐ │ Qdrant │ │ Docker │ └────┬─────┘ │ ┌────▼─────┐ │ NVME │ └──────────┘ ``` ## License MIT ## Contributing Contributions welcome! Please open an issue or PR. ## Credits - Dataset: [tensonaut/EPSTEIN_FILES_20K](https://huggingface.co/datasets/tensonaut/EPSTEIN_FILES_20K) - Embedding Model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - Vector Store: [Qdrant](https://qdrant.tech/)

epstein_rag_mcp

Content

MCP Config

Connection Info

You Might Also Like

Filesystem

Fetch

Context 7

deep-code-reasoning-mcp

awesome-codex-cli

MySearch-Proxy

epstein_rag_mcp

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

Filesystem

Fetch

Context 7

deep-code-reasoning-mcp

awesome-codex-cli

MySearch-Proxy