Content
# Tool List
## 🇬🇧 English Description
### 🌟 Design Philosophy & Motivation
**"Your memory belongs to you, not the cloud."**
I built this project with a simple yet powerful goal: **Total Data Sovereignty**.
In an era of subscription-based AI services and cloud dependencies, I wanted a solution that is:
1. **100% Local & Private:** No data ever leaves your machine. No API fees, no privacy risks.
2. **Permanent:** As long as your hard drive exists, your AI's memory exists. No fear of service shutdowns.
3. **Infinite Capacity:** The only limit is your local disk space.
4. **High Performance:** Utilizing local GPU acceleration (TensorRT/CUDA) for lightning-fast embedding and retrieval.
This is a **Memory Context Protocol (MCP)** server that gives your AI (like Gemini CLI, Claude Desktop) a persistent, searchable, and evolving long-term memory.
### ✨ Key Features
* **Hybrid Search Architecture:** Combines **LanceDB** (Vector Search for semantic understanding) and **SQLite FTS5** (Full-Text Search for exact keyword matching) for high-precision recall.
* **Hardware Acceleration:** Powered by ONNX Runtime with TensorRT/CUDA execution providers for millisecond-level embedding generation.
* **Standard MCP Tools:**
* `save_memory`: Store snippets, code, docs, or personal facts (with automatic duplicate detection).
* `search_memory`: Semantic & keyword retrieval.
* `list_memories`: View recent entries.
* `delete_memory`: Manage and clean up data.
* `update_memory`: Update existing memory by ID.
* **Lazy Loading:** Optimized startup time with on-demand resource initialization.
* **Zero Cost:** Runs entirely on your existing hardware.
### 🛠️ Prerequisites
* **OS:** Windows (tested)
* **Python:** 3.10 or higher.
* **Hardware:** NVIDIA GPU recommended (for TensorRT/CUDA acceleration), but works on CPU.
* **MCP Client:** [Gemini CLI](https://github.com/google-gemini/gemini-cli) or [Claude Desktop](https://claude.ai/download).Or any IDE that can be configured with MCP.
### 🚀 Installation & Setup
#### 1. Clone the Repository
```bash
git clone https://github.com/YanZiBin/Local-memory-mcp.git
cd Local-memory-mcp
```
#### 2. Create a Python Environment (Conda Recommended)
To ensure GPU libraries work correctly, Conda is highly recommended.
```bash
conda create -n Local-memory-mcp python=3.10
conda activate Local-memory-mcp
```
#### 3. Install Dependencies
```bash
pip install fastmcp lancedb onnxruntime-gpu transformers numpy uvicorn
```
*(Note: If you don't have a GPU, install `onnxruntime` instead of `onnxruntime-gpu`)*
#### 4. Download the Embedding Model
This project uses `BAAI/bge-m3` converted to ONNX[here](https://huggingface.co/Xenova/bge-m3/tree/main). You need to download the model files into the `bge-m3-onnx` directory.
You can use `huggingface-cli` or manually download these files:
* `config.json`
* `model.onnx`
* `model.onnx_data`
* `vocab.txt`
* `sentence_transformers.onnx`
* `sentence_transformers.onnx_data`
* `tokenizer_config.json`
* `tokenizer.json`
Place them inside a folder named `bge-m3-onnx` in the project root.
### 🏃♂️ Running the Server
Since this server uses heavy local models, we recommend the **Manual Start (SSE Mode)** for stability.
1. **Start the Server:**
Open a terminal and run:
```bash
python server.py
```
Wait until you see: `INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)`
2. **Connect your Client (e.g., Gemini CLI):**
Edit your Gemini CLI configuration file (usually at `~/.gemini/settings.json` or `%USERPROFILE%\.gemini\settings.json` on Windows):
```json
{
"mcpServers": {
"local-memory": {
"url": "http://localhost:8000/sse",
"type": "sse"
}
}
}
```
3. **Start using it!**
Open Gemini CLI and try:
> "Save this memory: My project uses Python 3.10."
> "Search my memories for 'project'."
### 🗺️ Roadmap
We are currently transitioning to **Phase 4 (Memory Management)**.
- [x] **Phase 1: Prototype**
- Initialize MCP server with `fastmcp`.
- Basic in-memory storage (dict).
- Implement basic `save_memory` & `search_memory` (keyword matching).
- Manual connection testing with Gemini CLI.
- [x] **Phase 2: Persistence**
- Integrate **SQLite FTS5** and **LanceDB**.
- Integrate local **ONNX embedding model**.
- Dual-storage architecture (Full-text + Vector).
- `list_memories` and `delete_memory` tools.
- [x] **Phase 3: Intelligent Retrieval (Completed)**
- [x] Implement **RRF (Reciprocal Rank Fusion)** algorithm for high-quality hybrid search.
- [x] Add similarity thresholds & Top-K limits to reduce noise.
- [ ] Implement **Contextual Retrieval** (Enhanced context storage).
- [ ] (Optional) Add Reranker for higher precision.
- [ ] **Phase 4: Memory Management**
- [x] Duplicate detection on save (similarity-based).
- [ ] Lifecycle management (Time decay, Conflict tagging).
- [ ] **Phase 5: Advanced Optimization**
- [ ] Expose **Resources**: Project summaries, ADR (Architecture Decision Records) guardrails.
- [ ] Architectural guardrails (Recall ADRs on violation) & Task chain tracking.
**License:** MIT
**Author:** [YanZiBin]
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.