Content
# Vectorworks RAG + MCP
- Fetches Vectorworks' Python / VectorScript documentation locally and performs cross-sectional searches using FAISS.
- Provides `/search` `/answer` `/get` via FastAPI, allowing search and source confirmation from a simple Web UI.
- Simultaneously starts an MCP server with WebSocket (JSON-RPC 2.0) and provides `vw.search` / `vw.answer` / `vw.get` tools.
- Simultaneously starts `app` (API/MCP) and `db` (Postgres 16) with docker compose.
---
## Requirements
- Docker / Docker Compose (v2)
## Quick Start
1. Build
- `docker compose build`
2. Fetch Documents
- `docker compose run --rm app bash scripts/fetch_docs_minimal.sh`
- `docker compose run --rm -e GITHUB_TOKEN="$GITHUB_TOKEN" app bash scripts/fetch_github_vectorworks.sh`
3. Generate Vectors (Embedding + FAISS)
- `docker compose run --rm app python -m app.indexer`
- If you add documents with an existing index, recreate it:
- `docker compose run --rm app python -m app.indexer --rebuild`
4. Start (UI + MCP + Postgres)
- `docker compose up`
5. Access
- UI: `http://localhost:8000`
- MCP: `ws://localhost:8765`
Note
- 2. and 3) can also be done in the following one-liner:
- `docker compose run --rm app bash -lc 'scripts/fetch_docs_minimal.sh && python -m app.indexer'`
## Fetching Documents
The following command fetches the minimum required documents (md/html only) to `data/`.
- Dependencies: `git`, `curl`
- Execution (inside container):
- `docker compose run --rm app bash scripts/fetch_docs_minimal.sh`
- If a 403 error occurs due to network issues, the script will skip the corresponding page and continue.
- You can overwrite the UA if necessary: `docker compose run --rm -e UA="Mozilla/5.0 ..." app bash scripts/fetch_docs_minimal.sh`
Target (key points only)
- GitHub: Vectorworks/developer-scripting (Introduction/Guidance Markdown)
- App Help: Basic Scripting Guidance (Key pages for 2022/2023/2024)
- Japanese site: VectorScript function index page + example page
- Developer Wiki: VS Function Reference category index (HTML)
Note
- The indexer only supports `.md` `.markdown` `.html` `.htm` `.txt`. PDF etc. are not supported.
## GitHub (Fetch only Vectorworks specified repositories)
- Purpose: Clones (or updates) only the following three repositories to `data/github/vectorworks/`.
- `Vectorworks/developer-scripting`
- `Vectorworks/developer-sdk`
- `Vectorworks/developer-worksheets`
- Execution (inside container):
- `docker compose run --rm app bash scripts/fetch_github_vectorworks.sh`
- Destination: `data/github/vectorworks/<repo>`
- If you want to add/change, you can specify it with the environment variable `REPOS` (space or comma separated).
- Example: `docker compose run --rm -e REPOS="Vectorworks/developer-scripting,Vectorworks/developer-sdk" app bash scripts/fetch_github_vectorworks.sh`
- Specification of update method (optional): `UPDATE_MODE=pull` (default) or `UPDATE_MODE=reset`
- `pull`: Fast update by `git pull --ff-only --depth=1 --prune` (fails if there are local changes → fetch+reset in case of failure)
- `reset`: Full synchronization by `fetch --depth=1` followed by `reset --hard` (discards local changes)
Note
- The indexer only targets text-based extensions (.md/.html/.txt etc.).
## Vector Generation (Embedding + FAISS Index)
- Execution (inside container):
- `docker compose run --rm app python -m app.indexer`
After execution, `vw.faiss` and `meta.jsonl` will be created in `index/`.
(Please make sure that the document is in `data/` by `bash scripts/fetch_docs_minimal.sh` etc. beforehand)
## Start (UI + MCP + Postgres)
- `docker compose up --build`
- UI: `http://localhost:8000`
- MCP: `ws://localhost:8765`
- Postgres is started on the internal container network (not published to the host)
## API Examples
- Search: `GET /search?q=PushAttrs&k=6`
- Example: `curl -s "http://localhost:8000/search?q=VectorScript&k=6" | jq .`
- Answer (draft): `GET /answer?q=...&k=6`
- Example: `curl -s "http://localhost:8000/answer?q=record+format" | jq .`
- Get Chunk: `GET /get?doc_id=...&chunk_id=...`
The UI is at `GET /` and you can check the equivalent results from the search form.
## MCP (Model Context Protocol)
- Connect from VS Code: Add MCP with the following command
- `code --add-mcp '{"name":"vw_docs_local","url":"ws://localhost:8765"}'`
- Supported tools
- `vw.search({ query, k? })`
- `vw.answer({ query, k? })`
- `vw.get({ doc_id, chunk_id })`
The implementation is a JSON-RPC 2.0 based WebSocket server (`app/mcp_server.py`).
## Directory Structure
- `app/` App body
- `api.py` FastAPI app
- `mcp_server.py` MCP(WebSocket) server
- `indexer.py` Import/Vectorization (Embedding + FAISS creation)
- `search.py` Core logic for search/answer
- `chunking.py` Chunk splitting (approximately 700 tokens equivalent character length as a guide)
- `templates/` Web UI template
- `data/` Original md/html (`doc_id` is the relative path)
- `index/` FAISS and metadata (`vw.faiss`, `meta.jsonl`)
## Environment Variables (Optional)
- `DATA_DIR` Data placement directory (default: `data`)
- `INDEX_DIR` Index output directory (default: `index`)
- `EMBED_MODEL` Embedding model (default: `sentence-transformers/all-MiniLM-L6-v2`)
- `CHUNK_CHARS` Guideline for chunk character length (default: `2800` ≒ 700 tokens)
- `CHUNK_OVERLAP` Chunk character overlap (default: `480`)
- `API_HOST` / `API_PORT` FastAPI binding (default: `0.0.0.0:8000`)
- `MCP_HOST` / `MCP_PORT` MCP binding (default: `0.0.0.0:8765`)
Postgres (for future use)
- `PGHOST` / `PGPORT` / `PGDATABASE` / `PGUSER` / `PGPASSWORD`
## Operational Notes
- If you update the document, generate the vector again: `python -m app.indexer`
- It may take some time to download the embedding model for the first time.
- Since it is assumed to be executed on the CPU, FAISS uses `IndexFlatIP` + normalized vector (equivalent to cosine).
## License / Notes
- This repository does not include documents. Please place it in `data/` according to your terms of use and copyright.
- `answer` is a simple draft based on excerpts. Please check the original text for the final decision.
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.