Content
# ReasoningBank MCP Server
<!-- TOC -->
- [ReasoningBank MCP Server](#reasoningbank-mcp-server)
- [🌟 Features](#-features)
- [🏗️ Architecture Design](#️-architecture-design)
- [🚀 Quick Start](#-quick-start)
- [1. Pull the code and enter the project root directory](#1-pull-the-code-and-enter-the-project-root-directory)
- [2. Install dependencies](#2-install-dependencies)
- [3. Configure MCP Client](#3-configure-mcp-client)
- [4. Command Line Arguments](#4-command-line-arguments)
- [🔧 Configuration File (Optional)](#-configuration-file-optional)
- [🔧 MCP Tools](#-mcp-tools)
- [`retrieve_memory`](#retrieve_memory)
- [`extract_memory`](#extract_memory)
- [⚙️ Configuration Instructions](#️-configuration-instructions)
- [Retrieval Strategy](#retrieval-strategy)
- [LLM Provider](#llm-provider)
- [Memory Management System](#memory-management-systemv020)
- [📖 Usage Examples](#-usage-examples)
- [Using in AI Agents](#using-in-ai-agents)
- [🔬 Development](#-development)
- [Run Tests](#run-tests)
- [Code Formatting](#code-formatting)
- [📚 References](#-references)
- [📝 License](#-license)
- [📋 Changelog](#-changelog)
<!-- /TOC -->
With the increasing prevalence of large language model agents in persistent real-world roles, they naturally encounter continuous task flows. However, a key limitation is their inability to learn from the accumulated interaction history, forcing them to discard valuable insights and repeat past mistakes. Based on the paper [ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory](https://arxiv.org/abs/2509.25140), we have implemented this memory-enhanced reasoning system, providing AI agents with experiential memory management capabilities through the MCP (Model Context Protocol) protocol.
ReasoningBank proposes a novel memory framework that can distill generalizable reasoning strategies from the successes and failures judged by the agents themselves. During testing, agents retrieve relevant memories from ReasoningBank to guide their interactions, and then integrate the newly learned knowledge back, enabling them to become stronger over time. This memory-driven experiential expansion creates a new dimension of growth for agents, allowing them to self-evolve and exhibit emergent behaviors.
## 🌟 Features
### Core Features
- ✅ **Memory Extraction**: Automatically extract reasoning experiences from successful and failed trajectories
- ✅ **Intelligent Retrieval**: Supports various retrieval strategies (cosine similarity, hybrid scoring, etc.)
- ✅ **Multi-Tenant Isolation**: Achieves memory isolation between different Agents through agent_id
- ✅ **Dual Transmission Modes**: Supports both STDIO and SSE transmission methods
- ✅ **Asynchronous Processing**: Memory extraction supports asynchronous mode, non-blocking for AI agents
- ✅ **Multi-Model Support**: DashScope (通义千问), OpenAI, Claude, etc.
- ✅ **Flexible Expansion**: Plugin architecture, easy to extend new retrieval strategies and storage backends
- ✅ **Memory Isolation**: Supports Claude's SubAgent mode, where each SubAgent independently manages its own memory
### Intelligent Memory Management (v0.2.0+)
- ✅ **Automatic Deduplication**: Prevents duplicate experience storage, supports semantic deduplication
- ✅ **Intelligent Merging**: Extracts similar experiences into general rules (LLM-driven or voting-based)
- ✅ **Experience Archiving**: Merged original experiences are traceable, supporting auditing
- ✅ **Background Processing**: Deduplication and merging are automatically executed in the background, without blocking the main process
- ✅ **Space Optimization**: Saves 50-80% storage space through deduplication and merging
## 🏗️ Architecture Design
```
reasoning-bank-mcp/
├── src/
│ ├── server.py # MCP server entry point
│ ├── config.py # Configuration management
│ ├── tools/ # MCP tools
│ │ ├── retrieve_memory.py # Retrieve memory
│ │ └── extract_memory.py # Extract memory
│ ├── retrieval/ # Retrieval strategies
│ │ ├── base.py # Abstract interface
│ │ ├── factory.py # Strategy factory
│ │ └── strategies/ # Concrete strategy implementations
│ ├── deduplication/ # Deduplication strategies (v0.2.0+)
│ │ ├── base.py # Abstract interface
│ │ ├── factory.py # Strategy factory
│ │ └── strategies/
│ │ ├── hash_dedup.py # Hash deduplication
│ │ └── semantic_dedup.py # Semantic deduplication
│ ├── merge/ # Merging strategies (v0.2.0+)
│ │ ├── base.py # Abstract interface
│ │ ├── factory.py # Strategy factory
│ │ └── strategies/
│ │ ├── llm_merge.py # LLM intelligent merging
│ │ └── voting_merge.py # Voting selection
│ ├── services/ # Service layer (v0.2.0+)
│ │ └── memory_manager.py # Memory management service
│ ├── storage/ # Storage backend
│ │ ├── base.py # Abstract interface
│ │ └── backends/ # Concrete storage implementations
│ ├── llm/ # LLM client
│ │ ├── base.py # Abstract interface
│ │ ├── factory.py # Provider factory
│ │ └── providers/ # Concrete Provider implementations
│ ├── prompts/ # Prompt templates
│ └── utils/ # Utility functions
└── data/ # Data storage directory
├── memories.json # Memory database
├── archived_memories.json # Archived memories (v0.2.0+)
└── embeddings.json # Embedding vectors
```
## 🚀 Quick Start
### 1. Code Pull and Enter Project Root Directory
```bash
git clone https://github.com/hanw39/ReasoningBank-MCP.git
cd ReasoningBank-MCP
```
### 2. Install Dependencies
```bash
pip install -e .
```
### 3. Configure MCP Client
#### Method 1: STDIO Mode (Applicable to Claude Desktop, Cursor, Qoder, Cherry Studio, etc.)
```json
{
"mcpServers": {
"reasoning-bank": {
"command": "reasoning-bank-mcp",
"env": {
"DASHSCOPE_API_KEY": "your Bailian APIKEY"
}
}
}
}
```
#### Method 2: SSE Mode (Applicable to Claude Desktop, Cursor, Qoder, Cherry Studio, etc.)
**1) Start the server**:
```bash
```
# Using Default Configuration (127.0.0.1:8000)
python3 -m src.server --transport sse
# Or specify host and port
```bash
python3 -m src.server --transport sse --host 0.0.0.0 --port 8080
```
**2) Client Configuration**:
```json
{
"mcpServers": {
"reasoning-bank": {
"url": "http://127.0.0.1:8000/sse"
}
}
}
```
### 4. MCP Client Prompt Examples
#### Qoder
```markdown
You are an intelligent assistant with reasoning memory capabilities, and your agent_id is `Qoder`. When using MCP, you must pass in “agent_id”=“Qoder”. You have two core MCP tools:
1. `retrieve_memory`: Used to retrieve relevant experiences at the start of a task or when the direction changes.
2. `extract_memory`: Used to extract and save experiences after a task ends or fails.
You must strictly adhere to the following code of conduct:
【Memory Strategy Rules】
① **Call MCP's `retrieve_memory` before starting a task**
- Before you begin executing any complex tasks (such as writing code, analyzing, planning, debugging, summarizing, etc.), you must call this tool first.
- The input `query` should describe the current task objective or user needs.
- If the current task has a significant difference in topic, goal, or context from the last task, you must also call `retrieve_memory` again.
② **Call MCP's `retrieve_memory` when the task direction changes**
- When you notice that the user has modified the task objective, problem direction, file object, or context content, you should immediately call `retrieve_memory` again.
- Ensure that you are always thinking based on the latest contextual memory.
③ **Call MCP's `extract_memory` after the task ends**
- When you believe the task is complete (i.e., before outputting the final answer or solution), you must call `extract_memory`.
- The trajectory should include important steps of task execution, dialogues, and key reasoning.
- You must execute this even if the user does not explicitly request saving.
④ **Call `extract_memory` in case of failure**
- If the task execution fails, errors occur, or the user indicates "not successful," "incorrect," "needs to start over," etc., you must call `extract_memory` once.
- Record the reasons for failure and improvement clues in the failure context.
⑤ **At least call once**
- In each independent task cycle, you must call at least:
- 1 time `retrieve_memory`
- 1 time `extract_memory`
```
### 5. Command Line Arguments
```bash
python3 -m src.server --help
```
# Available Parameters:
# --transport {stdio,sse} Transport Method (default: stdio)
# --host HOST Host address for SSE (Server-Sent Events) mode (default: 127.0.0.1)
# --port PORT Port number for SSE (Server-Sent Events) mode (default: 8000)
## 🔧 Configuration File (Optional)
If you need to customize the configuration, you can edit `config.yaml`:
```yaml
# LLM Provider Configuration
llm:
provider: "dashscope" # dashscope | openai | anthropic
dashscope:
api_key: "${DASHSCOPE_API_KEY}"
chat_model: "qwen-plus"
# Embedding Provider Configuration
embedding:
provider: "dashscope" # dashscope | openai
dashscope:
model: "text-embedding-v3"
# Retrieval Strategy Configuration
retrieval:
strategy: "hybrid"
min_score_threshold: 0.85 # Minimum relevance threshold
hybrid:
weights:
semantic: 0.6
confidence: 0.2
success: 0.15
recency: 0.05
# Memory Manager Configuration (v0.2.0+)
memory_manager:
enabled: true # Enable memory manager
# Deduplication Configuration
deduplication:
strategy: "semantic" # semantic
on_extraction: true # Real-time deduplication during extraction
semantic:
threshold: 0.90 # Similarity threshold
top_k_check: 5 # Check top K similar memories
# Merge Configuration
merge:
strategy: "llm" # llm | voting
auto_execute: true # Automatically execute merge
trigger:
min_similar_count: 3 # Minimum number of similar memories
similarity_threshold: 0.85 # Similarity threshold
llm:
temperature: 0.7
original_handling: "archive" # Archive original experiences
## 🔧 MCP Tools
### `retrieve_memory`
Retrieve relevant historical experience memories to assist in guiding the execution of the current task.
**Parameters**:
- `query` (string, required): The query description of the current task
- `top_k` (number, optional): The number of memories to retrieve, default is 1
- `agent_id` (string, optional): Agent ID for multi-tenant isolation
- Only retrieves memories for the specified agent
- If not provided, retrieves all memories
- It is recommended for SubAgent to pass its own name as agent_id
- For example: `"claude-code"`, `"code-reviewer"`, etc.
**Returns**:
```json
{
"status": "success",
"min_score_threshold": 0.85,
"filtered_count": 2,
"memories": [
{
"memory_id": "mem_001",
"score": 0.92,
"title": "Complete Historical Query Strategy",
"content": "...",
"success": true,
"agent_id": "claude-code"
}
],
"formatted_prompt": "Here are some memory items I have accumulated from past interactions with the environment..."
}
```
**Notes**:
- `min_score_threshold`: The minimum relevance threshold used
- `filtered_count`: The number of low-relevance memories filtered out
- `score`: The relevance score of the memory (0.0-1.0), only memories above the threshold are returned
### `extract_memory`
Extract inference experiences from the task trajectory and save them to the memory bank.
**Parameters**:
- `trajectory` (array, required): A list of steps in the task execution trajectory
- Each step contains: `step` (number), `role` (string), `content` (string), `metadata` (object, optional)
- `query` (string, required): Description of the task query
- `success_signal` (boolean, optional): Indicates whether the task was successful; automatically determined if null
- `async_mode` (boolean, optional): Indicates whether to process asynchronously; defaults to true
- `agent_id` (string, optional): Agent ID for multi-tenant isolation
- Marks which agent the memory belongs to
- It is recommended for SubAgents to pass their own name as the agent_id
- For example: `"claude-code"`, `"java-developer"`, etc.
**Return** (asynchronous mode):
```json
{
"status": "processing",
"message": "Memory extraction task has been submitted and is being processed in the background",
"task_id": "extract_12345",
"async_mode": true
}
```
**Return** (synchronous mode):
```json
{
"status": "success",
"message": "Memory extraction successful",
"memory_id": "mem_123",
"agent_id": "claude-code"
}
```
## ⚙️ Configuration Instructions
### Retrieval Strategies
Two retrieval strategies are supported:
1. **cosine**: Pure cosine similarity (baseline method)
2. **hybrid**: Hybrid scoring (recommended)
- Semantic similarity (60%)
- Confidence (20%)
- Success preference (15%)
- Timeliness (5%)
#### Relevance Threshold Filtering
The `min_score_threshold` configuration option can be used to filter out low-relevance memories:
- **Default Value**: 0.85 (i.e., memories with a relevance lower than 85% will not be returned)
- **Purpose**: Ensures that the returned memories are highly relevant to the current query
- **Effect**: Improves memory quality and avoids low-quality memories interfering with decision-making
```yaml
retrieval:
strategy: "hybrid"
min_score_threshold: 0.85 # Adjustable, range 0.0-1.0
hybrid:
weights:
semantic: 0.6
confidence: 0.2
success: 0.15
recency: 0.05
```
**Recommended Configurations**:
- Strict Mode: 0.90+ (only returns highly relevant memories)
- Standard Mode: 0.85 (balances relevance and recall)
- Lenient Mode: 0.75 (more candidate memories)
### Paper-faithful Mode
If you want to fully replicate the settings in the paper, you can change `mode.preset` to `paper_faithful` in the configuration. This mode will:
- Enforce the use of the `cosine` retrieval strategy;
- Automatically disable the Memory Manager (i.e., no deduplication/merging);
- Switch `extract_memory` to synchronous execution and use the original prompts from the paper.
For detailed steps and evaluation suggestions, see [docs/paper_faithful_mode.md](docs/paper_faithful_mode.md).
### LLM Provider
Supports multiple model APIs:
- **dashscope**: Tongyi Qianwen (recommended)
- **openai**: OpenAI or compatible API
- **anthropic**: Claude
```yaml
llm:
provider: "dashscope"
dashscope:
api_key: "${DASHSCOPE_API_KEY}"
chat_model: "qwen-plus"
embedding:
provider: "dashscope"
dashscope:
model: "text-embedding-v3"
```
### Memory Management System (v0.2.0+)
The Memory Management System provides automated deduplication and merging features, enhancing memory quality and storage efficiency.
#### Deduplication Strategy
1. **semantic**: Intelligent deduplication based on semantic similarity (recommended)
- Identifies experiences with similar content
- Configurable similarity threshold (recommended 0.90+)
- Suitable for production environments
#### Merging Strategies
Two merging strategies are supported:
1. **llm**: LLM-driven intelligent merging (recommended)
- Uses large models to extract commonalities from multiple similar experiences
- Generates abstract general rules
- Supports custom temperature parameters
2. **voting**: Voting-based selection
- Selects the optimal representative from a group of similar experiences
- Ranks by retrieval frequency, success rate, and timeliness
- Suitable for quick deduplication scenarios
#### Workflow
```
When extracting memories:
1. LLM extracts experiences
2. Deduplication check (isolated by agent_id)
3. Skip duplicate experiences
4. Detect merging opportunities
5. Trigger merge tasks in the background (non-blocking)
Background merging:
1. Invoke merging strategy
2. Generate merged experiences
3. Archive original experiences
4. Maintain complete traceability chain
```
#### Configuration Recommendations
```yaml
# Production Environment
memory_manager:
deduplication:
strategy: "semantic" # High Quality
semantic:
threshold: 0.92 # More Strict
merge:
strategy: "llm" # Best Performance
auto_execute: true
## 📖 Usage Example
### Basic Usage
```
# 1. Retrieve relevant experiences before starting the task
result = await mcp_call("retrieve_memory", {
"query": "Find the earliest order date of the user on the shopping website",
"top_k": 1,
"agent_id": "claude-code" # Optional: specify agent ID
})
# AI Acquisition Prompts:
# "The following are lessons learned from past experiences:
# Memory 1 [✓ Successful Experience] - Complete Historical Query Strategy
# Don't just check 'Recent Orders', you need to navigate to the full order history page...
# 2. Execute Task (Generate Trajectory)
trajectory = [
{"step": 1, "role": "user", "content": "Find the earliest order"},
{"step": 2, "role": "assistant", "content": "Click on order history"},
{"step": 3, "role": "tool", "content": "Successfully found the order from 2020-01-15"}
]
# 3. Extracting Experience After Task Completion
```python
await mcp_call("extract_memory", {
"trajectory": trajectory,
"query": "Find the earliest order date of the user on the shopping website",
"agent_id": "claude-code", # Optional: Tag the memory to the corresponding agent
"async_mode": True # Asynchronous processing, non-blocking
})
```
### Multi-Agent Isolation
Use the `agent_id` parameter to achieve memory isolation between different Agents:
```python
# Top-Level Agent (Claude Code)
await mcp_call("retrieve_memory", {
"query": "Optimize Python code performance",
"agent_id": "claude-code", # Only retrieve memories of claude-code
"top_k": 2
})
# Sub-agent (Code Reviewer)
await mcp_call("retrieve_memory", {
"query": "Check for code security issues",
"agent_id": "code-reviewer", # Only retrieve memories of code-reviewer
"top_k": 2
})
# Sub-agent (Java Developer)
await mcp_call("retrieve_memory", {
"query": "Implement Spring Boot API",
"agent_id": "java-developer", # Only retrieve memory for java-developer
"top_k": 2
})
# Not specifying agent_id: Retrieve all memories
```javascript
await mcp_call("retrieve_memory", {
"query": "General programming best practices",
"top_k": 3
})
```
**Memory Isolation Rules**:
- Memories with different `agent_id`s are completely isolated
- Memories with the same `agent_id` can be shared across sessions
- When no `agent_id` is provided, all memories are retrieved
- It is recommended that SubAgents use their own names as `agent_id`
### Mind2Web Integration
- The `src/mind2web/` module includes Mind2Web data loading, candidate filtering, few-shot Prompt, and evaluation metrics.
- The default 3-shot template is located at `src/prompts/mind2web_llm_prompt.json`, and a custom file can be specified using `--prompt-template`.
- To run the evaluation directly, use `scripts/eval_mind2web.py`:
```bash
python scripts/eval_mind2web.py \
--config config.yaml \
--data-root /path/to/Mind2Web/test \
--split test_task \
--scores-pickle /path/to/scores_all_data.pkl \
--top-k 50 \
--output-dir outputs/mind2web_test_task
```
- For more detailed integration instructions (data download, Top-K filtering, Macro/Micro calculation methods, etc.), see [docs/mind2web_integration.md](docs/mind2web_integration.md).
### MaTTS (Memory-aware Test-Time Scaling)
ReasoningBank MCP is responsible only for the "memory layer," and the parallel/serial scaling of MaTTS needs to be implemented at the Agent controller level. We provide a minimal practice manual in [docs/matts_playbook.md](docs/matts_playbook.md) that demonstrates:
1. **Parallel / Self-Contrast**: Running multiple trajectories for the same task, using LLM to evaluate and select the best answer, and then writing all trajectories into memory;
2. **Sequential / Self-Refine**: Allowing the Agent to self-check and self-correct, storing the final trajectory after multiple rounds of refinement.
This guide replicates the "multi-trajectory + memory" closed loop from the paper and can be directly integrated into external environments such as BrowserGym and SWE-Bench.
## 🔬 Development
### Paper Reproduction & Evaluation
- Start MCP using the configuration in [docs/paper_faithful_mode.md](docs/paper_faithful_mode.md);
- Run tasks serially in external benchmarks (WebArena, Mind2Web, SWE-Bench, etc.) and record the success/step metrics;
- By switching `mode.preset` and `agent_id`, you can compare "No Memory vs. ReasoningBank" or "ReasoningBank vs. MaTTS".
### Running Tests
```bash
pytest tests/
```
### Code Formatting
```bash
black src/
ruff check src/
```
## 📚 References
Based on the paper: **ReasoningBank: Memory as Test-Time Compute Scaling**
- Core idea of the paper: Extracting reasoning patterns from successful and failed experiences
- Retrieval mechanism: Similarity retrieval based on semantic embeddings
- Extension points: Support for more advanced retrieval strategies and storage backends
## 📝 License
MIT License
## 📋 Changelog
### v0.2.0 (2025-10-29)
**New Features**:
- ✨ Intelligent Memory Management System
- Automatic Deduplication (Semantic Deduplication)
- Intelligent Merging (LLM-driven Merging + Voting-based Merging)
- Experience Archiving (Maintaining Complete Traceability)
- Background Asynchronous Processing (Non-blocking Main Process)
- 🏗️ Plugin Architecture
- Deduplication Strategy Factory Pattern
- Merging Strategy Factory Pattern
- Memory Management Service Layer
- 💾 Storage Enhancements
- Support for Archival Memory Storage
- Batch Operation Interfaces
- agent_id Security Isolation
**Performance Optimization**:
- Save 20-30% storage space through deduplication
- Save 40-60% storage space through merging
- Archiving without retaining embeddings, saving 90% archival space
**Documentation Updates**:
- Complete User Documentation and Configuration Instructions
- Development and Production Environment Configuration Recommendations
- Workflow Descriptions and Best Practices
### v0.1.0 (Initial Release)
- ✅ Memory extraction and intelligent retrieval
- ✅ Multiple retrieval strategies (cosine similarity, hybrid scoring)
- ✅ Asynchronous processing support
- ✅ Multi-model support (DashScope, OpenAI, Claude)
- ✅ Memory isolation (SubAgent support)
Connection Info
You Might Also Like
everything-claude-code
Complete Claude Code configuration collection - agents, skills, hooks,...
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.