Content

# Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector ![Python](https://img.shields.io/badge/python-3.10+-blue.svg) ![PostgreSQL](https://img.shields.io/badge/postgresql-14+-blue.svg) ![License](https://img.shields.io/badge/license-MIT-green.svg) Open-source AI knowledge base with semantic search, vector embeddings, and Claude MCP integration. Built with Python and PostgreSQL pgvector for LLM-powered document retrieval. **Features:** - **Semantic Search** - Vector embeddings with OpenAI (text-embedding-3-small) - **PostgreSQL + pgvector** - Vector similarity operations and full-text search - **Claude MCP Integration** - Model Context Protocol server for Claude Code/Desktop - **RAG Agent CLI** - Interactive terminal agent with query improvement (Google Gemini) - **Python Toolkit** - Clean, modular API with type hints - **Async Operations** - Database-first writes with background file sync --- ## System Architecture ``` Claude Code / Claude Desktop Terminal (kbagent) │ │ ▼ ▼ MCP Server (mcp_server.py) RAG Agent (rag_agent.py) ├─ search_summaries() ├─ Query improvement (Gemini) ├─ fetch_document() ├─ Document-only responses ├─ save_knowledge() └─ Interactive CLI ├─ update_document() │ ├─ delete_document() │ └─ list_categories() │ │ │ └──────────────┬───────────────────┘ ▼ Knowledge Toolkit (toolkit.py) ├─ search_summaries_tool() ├─ fetch_document_tool() ├─ knowledge_store_tool() ├─ update_document_tool() ├─ delete_document_tool() └─ knowledge_list_categories() │ ▼ PostgreSQL + pgvector ├─ documents (full content) ├─ summaries (embeddings) └─ vector indexes ``` --- ## Quick Start Guide ### 1. Python Setup ```bash git clone <repo> cd knowledge-base python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ### 2. Database Setup **Supabase (Cloud):** ```bash # 1. Create PostgreSQL instance on Supabase # 2. Run schema.sql in SQL Editor # 3. Copy connection string from settings ``` **Local Docker:** ```bash docker-compose up -d psql -h localhost -U db_user -d knowledge -f schema.sql ``` ### 3. Configure MCP Server Add to your MCP config: ```json { "mcpServers": { "knowledge-base": { "type": "stdio", "command": "python", "args": ["src/knowledge_base/mcp_server.py"], "env": { "DATABASE_URL": "${DATABASE_URL}", "OPENAI_API_KEY": "${OPENAI_API_KEY}", "ENABLE_FILE_OPERATIONS": "${ENABLE_FILE_OPERATIONS:-true}", "KNOWLEDGE_DIR": "${KNOWLEDGE_DIR:-./knowledge}" } } } } ``` Or use CLI: ```bash claude mcp add --transport stdio knowledge-base \ --env DATABASE_URL="postgresql://user:password@host:5432/database" \ --env OPENAI_API_KEY="sk-..." \ -- python src/knowledge_base/mcp_server.py ``` Use absolute paths in config. See [.env.example](.env.example) for details. Now ask Claude: "Search my knowledge base for X" ### 4. RAG Agent CLI (kbagent) Interactive terminal agent with query improvement and document-based responses. **Install:** ```bash # In project directory with venv activated pip install -e . # For global access, add to ~/.zshrc or ~/.bashrc: alias kbagent="/path/to/knowledge-base/.venv/bin/kbagent" ``` **Usage:** ```bash # Interactive mode kbagent # Single query kbagent "What is semantic search?" ``` **Features:** - Query improvement: Clarifies unclear questions, enhances queries for better search - Document-only responses: Answers strictly from knowledge base content - Source attribution with relevance scores - Commands: `/help`, `/categories`, `/quit` **Example session:** ``` $ kbagent Knowledge Base Agent Type your question or /help for commands You: python best practices Analyzing query... Based on the knowledge base documents, here are the key Python best practices... Sources: - Python Style Guide (knowledgebase) [85%] - Clean Code Principles (knowledgebase) [72%] Confidence: 78% ``` Requires `GOOGLE_API_KEY` in environment for query improvement (Gemini). ### Direct Python Usage (Optional) ```python from knowledge_base import search_summaries_tool, knowledge_store_tool # Search results = search_summaries_tool("python best practices", limit=5) # Save response = knowledge_store_tool( title="New Knowledge", content="# Markdown content", category="knowledgebase" ) ``` --- ## Database Schema Two tables with no data duplication: - **documents**: Full content, metadata, category (BIGSERIAL primary key) - **summaries**: Auto-generated summaries with vector embeddings (BIGSERIAL primary key, references documents with CASCADE delete) See [schema.sql](schema.sql) for complete schema. --- ## Python API Reference **search_summaries_tool(query, category=None, limit=5, min_relevance=None)** - Returns: SummarySearchResponse with results list - Each result: document_id, title, summary, relevance_score - Optional min_relevance filter (0.0-1.0) **fetch_document_tool(document_id)** - Returns: DocumentResponse with full content, metadata **knowledge_store_tool(title, content, category, tags=None, description=None)** - Returns: OperationResponse with document_id - Async: File write and summary generation happen in background **update_document_tool(document_id, content)** - Returns: OperationResponse with updated document metadata - Updates content only (title/category unchanged) - Async: File update and summary regeneration in background **delete_document_tool(document_id)** - Returns: OperationResponse with deleted document info - Permanent operation (cannot be undone) - Async: File cleanup in background **knowledge_list_categories()** - Returns: CategoriesResponse with all categories and counts

knowledge-base

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace

knowledge-base

Scan with WeChat to Share

Authentication Required

Content

MCP Config

Connection Info

You Might Also Like

markitdown

markitdown

Filesystem

TrendRadar

mempalace

mempalace