Content
# Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector



Open-source AI knowledge base with semantic search, vector embeddings, and Claude MCP integration. Built with Python and PostgreSQL pgvector for LLM-powered document retrieval.
**Features:**
- **Semantic Search** - Vector embeddings with OpenAI (text-embedding-3-small)
- **PostgreSQL + pgvector** - Vector similarity operations and full-text search
- **Claude MCP Integration** - Model Context Protocol server for Claude Code/Desktop
- **RAG Agent CLI** - Interactive terminal agent with query improvement (Google Gemini)
- **Python Toolkit** - Clean, modular API with type hints
- **Async Operations** - Database-first writes with background file sync
---
## System Architecture
```
Claude Code / Claude Desktop Terminal (kbagent)
│ │
▼ ▼
MCP Server (mcp_server.py) RAG Agent (rag_agent.py)
├─ search_summaries() ├─ Query improvement (Gemini)
├─ fetch_document() ├─ Document-only responses
├─ save_knowledge() └─ Interactive CLI
├─ update_document() │
├─ delete_document() │
└─ list_categories() │
│ │
└──────────────┬───────────────────┘
▼
Knowledge Toolkit (toolkit.py)
├─ search_summaries_tool()
├─ fetch_document_tool()
├─ knowledge_store_tool()
├─ update_document_tool()
├─ delete_document_tool()
└─ knowledge_list_categories()
│
▼
PostgreSQL + pgvector
├─ documents (full content)
├─ summaries (embeddings)
└─ vector indexes
```
---
## Quick Start Guide
### 1. Python Setup
```bash
git clone <repo>
cd knowledge-base
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
### 2. Database Setup
**Supabase (Cloud):**
```bash
# 1. Create PostgreSQL instance on Supabase
# 2. Run schema.sql in SQL Editor
# 3. Copy connection string from settings
```
**Local Docker:**
```bash
docker-compose up -d
psql -h localhost -U db_user -d knowledge -f schema.sql
```
### 3. Configure MCP Server
Add to your MCP config:
```json
{
"mcpServers": {
"knowledge-base": {
"type": "stdio",
"command": "python",
"args": ["src/knowledge_base/mcp_server.py"],
"env": {
"DATABASE_URL": "${DATABASE_URL}",
"OPENAI_API_KEY": "${OPENAI_API_KEY}",
"ENABLE_FILE_OPERATIONS": "${ENABLE_FILE_OPERATIONS:-true}",
"KNOWLEDGE_DIR": "${KNOWLEDGE_DIR:-./knowledge}"
}
}
}
}
```
Or use CLI:
```bash
claude mcp add --transport stdio knowledge-base \
--env DATABASE_URL="postgresql://user:password@host:5432/database" \
--env OPENAI_API_KEY="sk-..." \
-- python src/knowledge_base/mcp_server.py
```
Use absolute paths in config. See [.env.example](.env.example) for details.
Now ask Claude: "Search my knowledge base for X"
### 4. RAG Agent CLI (kbagent)
Interactive terminal agent with query improvement and document-based responses.
**Install:**
```bash
# In project directory with venv activated
pip install -e .
# For global access, add to ~/.zshrc or ~/.bashrc:
alias kbagent="/path/to/knowledge-base/.venv/bin/kbagent"
```
**Usage:**
```bash
# Interactive mode
kbagent
# Single query
kbagent "What is semantic search?"
```
**Features:**
- Query improvement: Clarifies unclear questions, enhances queries for better search
- Document-only responses: Answers strictly from knowledge base content
- Source attribution with relevance scores
- Commands: `/help`, `/categories`, `/quit`
**Example session:**
```
$ kbagent
Knowledge Base Agent
Type your question or /help for commands
You: python best practices
Analyzing query...
Based on the knowledge base documents, here are the key Python best practices...
Sources:
- Python Style Guide (knowledgebase) [85%]
- Clean Code Principles (knowledgebase) [72%]
Confidence: 78%
```
Requires `GOOGLE_API_KEY` in environment for query improvement (Gemini).
### Direct Python Usage (Optional)
```python
from knowledge_base import search_summaries_tool, knowledge_store_tool
# Search
results = search_summaries_tool("python best practices", limit=5)
# Save
response = knowledge_store_tool(
title="New Knowledge",
content="# Markdown content",
category="knowledgebase"
)
```
---
## Database Schema
Two tables with no data duplication:
- **documents**: Full content, metadata, category (BIGSERIAL primary key)
- **summaries**: Auto-generated summaries with vector embeddings (BIGSERIAL primary key, references documents with CASCADE delete)
See [schema.sql](schema.sql) for complete schema.
---
## Python API Reference
**search_summaries_tool(query, category=None, limit=5, min_relevance=None)**
- Returns: SummarySearchResponse with results list
- Each result: document_id, title, summary, relevance_score
- Optional min_relevance filter (0.0-1.0)
**fetch_document_tool(document_id)**
- Returns: DocumentResponse with full content, metadata
**knowledge_store_tool(title, content, category, tags=None, description=None)**
- Returns: OperationResponse with document_id
- Async: File write and summary generation happen in background
**update_document_tool(document_id, content)**
- Returns: OperationResponse with updated document metadata
- Updates content only (title/category unchanged)
- Async: File update and summary regeneration in background
**delete_document_tool(document_id)**
- Returns: OperationResponse with deleted document info
- Permanent operation (cannot be undone)
- Async: File cleanup in background
**knowledge_list_categories()**
- Returns: CategoriesResponse with all categories and counts
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.