Content

# Tool List ## 🇬🇧 English Description ### 🌟 Design Philosophy & Motivation **"Your memory belongs to you, not the cloud."** I built this project with a simple yet powerful goal: **Total Data Sovereignty**. In an era of subscription-based AI services and cloud dependencies, I wanted a solution that is: 1. **100% Local & Private:** No data ever leaves your machine. No API fees, no privacy risks. 2. **Permanent:** As long as your hard drive exists, your AI's memory exists. No fear of service shutdowns. 3. **Infinite Capacity:** The only limit is your local disk space. 4. **High Performance:** Utilizing local GPU acceleration (TensorRT/CUDA) for lightning-fast embedding and retrieval. This is a **Memory Context Protocol (MCP)** server that gives your AI (like Gemini CLI, Claude Desktop) a persistent, searchable, and evolving long-term memory. ### ✨ Key Features * **Hybrid Search Architecture:** Combines **LanceDB** (Vector Search for semantic understanding) and **SQLite FTS5** (Full-Text Search for exact keyword matching) for high-precision recall. * **Hardware Acceleration:** Powered by ONNX Runtime with TensorRT/CUDA execution providers for millisecond-level embedding generation. * **Standard MCP Tools:** * `save_memory`: Store snippets, code, docs, or personal facts (with automatic duplicate detection). * `search_memory`: Semantic & keyword retrieval. * `list_memories`: View recent entries. * `delete_memory`: Manage and clean up data. * `update_memory`: Update existing memory by ID. * **Lazy Loading:** Optimized startup time with on-demand resource initialization. * **Zero Cost:** Runs entirely on your existing hardware. ### 🛠️ Prerequisites * **OS:** Windows (tested) * **Python:** 3.10 or higher. * **Hardware:** NVIDIA GPU recommended (for TensorRT/CUDA acceleration), but works on CPU. * **MCP Client:** [Gemini CLI](https://github.com/google-gemini/gemini-cli) or [Claude Desktop](https://claude.ai/download).Or any IDE that can be configured with MCP. ### 🚀 Installation & Setup #### 1. Clone the Repository ```bash git clone https://github.com/YanZiBin/Local-memory-mcp.git cd Local-memory-mcp ``` #### 2. Create a Python Environment (Conda Recommended) To ensure GPU libraries work correctly, Conda is highly recommended. ```bash conda create -n Local-memory-mcp python=3.10 conda activate Local-memory-mcp ``` #### 3. Install Dependencies ```bash pip install fastmcp lancedb onnxruntime-gpu transformers numpy uvicorn ``` *(Note: If you don't have a GPU, install `onnxruntime` instead of `onnxruntime-gpu`)* #### 4. Download the Embedding Model This project uses `BAAI/bge-m3` converted to ONNX[here](https://huggingface.co/Xenova/bge-m3/tree/main). You need to download the model files into the `bge-m3-onnx` directory. You can use `huggingface-cli` or manually download these files: * `config.json` * `model.onnx` * `model.onnx_data` * `vocab.txt` * `sentence_transformers.onnx` * `sentence_transformers.onnx_data` * `tokenizer_config.json` * `tokenizer.json` Place them inside a folder named `bge-m3-onnx` in the project root. ### 🏃‍♂️ Running the Server Since this server uses heavy local models, we recommend the **Manual Start (SSE Mode)** for stability. 1. **Start the Server:** Open a terminal and run: ```bash python server.py ``` Wait until you see: `INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)` 2. **Connect your Client (e.g., Gemini CLI):** Edit your Gemini CLI configuration file (usually at `~/.gemini/settings.json` or `%USERPROFILE%\.gemini\settings.json` on Windows): ```json { "mcpServers": { "local-memory": { "url": "http://localhost:8000/sse", "type": "sse" } } } ``` 3. **Start using it!** Open Gemini CLI and try: > "Save this memory: My project uses Python 3.10." > "Search my memories for 'project'." ### 🗺️ Roadmap We are currently transitioning to **Phase 4 (Memory Management)**. - [x] **Phase 1: Prototype** - Initialize MCP server with `fastmcp`. - Basic in-memory storage (dict). - Implement basic `save_memory` & `search_memory` (keyword matching). - Manual connection testing with Gemini CLI. - [x] **Phase 2: Persistence** - Integrate **SQLite FTS5** and **LanceDB**. - Integrate local **ONNX embedding model**. - Dual-storage architecture (Full-text + Vector). - `list_memories` and `delete_memory` tools. - [x] **Phase 3: Intelligent Retrieval (Completed)** - [x] Implement **RRF (Reciprocal Rank Fusion)** algorithm for high-quality hybrid search. - [x] Add similarity thresholds & Top-K limits to reduce noise. - [ ] Implement **Contextual Retrieval** (Enhanced context storage). - [ ] (Optional) Add Reranker for higher precision. - [ ] **Phase 4: Memory Management** - [x] Duplicate detection on save (similarity-based). - [ ] Lifecycle management (Time decay, Conflict tagging). - [ ] **Phase 5: Advanced Optimization** - [ ] Expose **Resources**: Project summaries, ADR (Architecture Decision Records) guardrails. - [ ] Architectural guardrails (Recall ADRs on violation) & Task chain tracking. **License:** MIT **Author:** [YanZiBin]

Local-memory-mcp

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace

Local-memory-mcp

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace