Content
# Modular RAG MCP Server
> A pluggable and observable modular RAG (Retrieval-Augmented Generation) service framework that exposes tool interfaces through the MCP (Model Context Protocol) protocol, supporting direct calls from AI assistants such as Copilot / Claude. It is also a practical project and accompanying teaching resource designed for **learning and interviewing for large model-related positions**.
---
## 📖 Table of Contents
- [Project Overview](#-project-overview)
- [Branch Instructions](#-branch-instructions)
- [Quick Start](#-quick-start)
- [Who is this project for & How to use it](#-who-is-this-project--how-to-use-it)
- [Resume Reference](#-resume-reference)
- [FAQ](#-faq)
- [Future Plans](#-future-plans)
---
## 🏗️ Project Overview
### What is this project
This project connects the most common core components in RAG interviews - **Retrieval (Hybrid Search + Rerank)**, **Multimodal Visual Processing (Image Captioning)**, **RAG Evaluation (Ragas + Custom)**, **Generation (LLM Response)** - and the current popular application protocol **MCP (Model Context Protocol)** into a complete, runnable engineering project.
**A major highlight of the project is its ease of adaptation to your own business**. Thanks to the full-link pluggable architecture, you can quickly integrate it into your existing projects, and you can find a way to use it that suits you, no matter your background and needs. Specific usage strategies will be detailed later in [Who is this project for & How to use it](#-who-is-this-project--how-to-use-it).
### More than just a project, a whole set of ideas
**More valuable than the project itself is the whole set of engineering ideas behind it**:
- How to write **DEV_SPEC** (Development Specification Document) to drive development
- How to use **Skill** to automatically complete code writing based on Spec
- How to use **Skill** for automated testing, packaging, and environment configuration
- How to extend based on a pluggable architecture (e.g. extend to Agent)
**Once you learn the ideas, you can do completely new projects and extensions yourself**. The specific practices and design ideas for each of the above steps are explained in the notes with corresponding videos, which are recommended to watch together.
### Core Capabilities at a Glance
| Module | Capability | Description |
|------|------|------|
| **Ingestion Pipeline** | PDF → Markdown → Chunk → Transform → Embedding → Upsert | Full-link data ingestion, supports multimodal image description (Image Captioning) |
| **Hybrid Search** | Dense (Vector) + Sparse (BM25) + RRF Fusion + Rerank | Two-stage retrieval architecture of coarse-grained recall + fine-grained re-ranking |
| **MCP Server** | Standard MCP protocol exposes Tools | `query_knowledge_hub`, `list_collections`, `get_document_summary` |
| **Dashboard** | Streamlit six-page management platform | System Overview / Data Browsing / Ingestion Management / Ingestion Tracking / Query Tracking / Evaluation Panel |
| **Evaluation** | Ragas + Custom evaluation system | Supports golden test set regression testing, rejects "feeling-based" tuning |
| **Observability** | Full-link white-box tracking | Every intermediate state of the Ingestion and Query links is transparently visible |
| **Skill-driven full process** | One-click completion from writing to testing, packaging, and configuration | auto-coder / qa-tester / package / setup and other Skills cover the complete development lifecycle (the use and design ideas of each Skill are explained in the notes, please refer to the accompanying videos) |
### Technical Highlights
**🔌 Full-link Pluggable Architecture**: LLM / Embedding / Reranker / Splitter / VectorStore / Evaluator Each core component defines an abstract interface, supports "Lego-style" replacement, and switches the backend with one click through the configuration file, with zero code modification.
**🔍 Hybrid Search + Re-ranking**: BM25 sparse retrieval solves the exact matching of proper nouns + Dense Embedding solves the semantic matching of synonyms, and Cross-Encoder / LLM Rerank fine-ranking is optional after RRF fusion to balance recall and precision.
**🖼️ Multimodal Image Processing**: Using the Image-to-Text strategy, Vision LLM is used to automatically generate image descriptions and stitch them into Chunks, and the pure text RAG link can be reused to realize "searching for text to find images".
**📡 MCP Ecosystem Integration**: Following the Model Context Protocol standard, it can be directly connected to MCP Clients such as GitHub Copilot and Claude Desktop, with zero front-end development, and can be used everywhere with one development.
**📊 Visual Management + Automated Evaluation**: Streamlit Dashboard provides complete data management and link tracking capabilities, integrates evaluation frameworks such as Ragas, and establishes a data-based iterative feedback loop.
**🧪 Three-layer Testing System**: Unit / Integration / E2E layered testing, covering independent module logic, inter-module interaction, and complete links (MCP Client / Dashboard).
**🤖 Skill-driven Full Process**: Built-in auto-coder (automatic coding), qa-tester (automatic testing), package (cleaning and packaging), setup (one-click configuration) and other Agent Skills, covering the complete development lifecycle from code writing to testing, packaging, and deployment. The usage methods and design ideas of each Skill are explained in the project part of the notes, which can be referred to for learning.
> 📖 For detailed architecture design, module description and task schedule, please refer to [DEV_SPEC.md](DEV_SPEC.md)
---
## 📂 Branch Instructions
This project provides three branches, facing different usage scenarios, please choose according to your own needs:
### `main` — The cleanest complete code
- Always has only **1 commit**, containing the latest complete code of the project
- **Suitable for**:
- Students who want to quickly experience the complete functions of the project
- Students who are short on time and want to quickly get a project for an interview, skipping the intermediate development process
- Students who want to directly extend the project on the basis of this project
- **How to use**: Clone and run Setup Skill to experience
### `dev` — Keep complete development records
- The code is exactly the same as `main`, but the complete commit history is retained
- Records every step of building from scratch, including a large number of intermediate nodes
- **Suitable for**: Students who want to know how the project was built step by step from scratch can trace the development ideas through the commit history
### `clean-start` — Clean start, from scratch
- Only contains the engineering skeleton (Agent Skills + DEV_SPEC), all task progress is cleared
- Retains the complete Skill configuration, and can use Agent to assist in development
- **Suitable for**:
- Students who have enough time and want to develop from scratch (**strongly recommended**)
- Students who want to experience the complete workflow: write Spec → break down tasks → write code → write tests → iterate and optimize
- You can even redesign the architecture based on your own understanding, implement it with your own ideas, and deeply understand each module
- Use all the corresponding ideas we talked about (Spec-driven development, test-first, pluggable architecture, etc.) to complete the entire project
- **Core concept**: The code writing of the entire project is **automatically completed by AI based on DEV_SPEC**, you don't need to write code by hand. AI reads the task definitions, architecture design, and interface specifications in the Spec through Skill, and automatically generates code that meets the specifications. Please refer to the corresponding video explanation in the notes for this idea: **5.1 Project Skills Usage: How to let AI use Skill to follow DEV_SPEC to complete the code**.
---
## 🚀 Quick Start
### 1. Clone the project
```bash
git clone <repo-url>
cd Modular-RAG-MCP-Server
```
### 2. One-click configuration (Setup Skill)
This project provides **Setup Skill** to complete all environment configurations with one click, including: Provider selection → API Key configuration → Dependency installation → Configuration file generation → Dashboard startup.
Open the project in VS Code and enter through the Copilot / Claude dialog box:
```
setup
```
Agent will automatically guide you through the entire configuration process.
> 💡 If you are not familiar with how to use Skill, please watch the **Setup Skill usage explanation video** in the accompanying notes.
---
## 🎯 Who is this project for & How to use it
Everyone has different backgrounds - some are campus recruits, some are social recruits; the basics are also different - some have AI project experience, and some are changing directions. Therefore, the usage strategy for this project should also be different, **please be sure to use it flexibly, and avoid blindly copying**.
But one thing is common: **the ideas behind the whole project** - how to write Spec to quickly launch a project, how to use Skill to drive AI to automatically code and test - these engineering methodologies are applicable to any project and are worth learning for everyone.
For the usage strategy of the project itself in different scenarios, I will provide some specific examples and expand on my own personal experience - **if it were me, how would I use this project in different situations** - for everyone's reference.
### 1. Purely learning RAG —— Treat the project as a learning material for the entire RAG process
This project itself is a complete RAG system, which can be used as a practical project to learn RAG.
When I first started learning RAG, I read this book: **《Large Model RAG Practice: RAG Principles, Applications and System Construction》** (edited by Wang Peng, Gu Qingshui, Bian Longpeng and other experts in the field of artificial intelligence). You can completely combine this book to learn RAG. The typical links involved in the book - retrieval, generation, vector database, chunking strategy, reordering, etc. - in fact, no matter which RAG-related book you read, the core content is these.
**This project connects these steps together**, so it can be used as a general RAG full-process project to learn the entire process. You can match this book, and I believe you can also match other RAG books, because the process is the same. Interviewing RAG is actually nothing more than a combination of these processes, principles, and the difficulties and optimizations encountered in practice.
### 2. Time is tight —— Lack a project to take to the interview
If you don't have any AI-related projects now and urgently need a project to go to the interview, then you can:
1. **Use this project directly**, clone the `main` branch, and run it with Setup Skill
2. **Combine Resume Writer Skill** to write your resume (Skill will generate project descriptions customized according to your background)
3. **Try to understand the project**, run through the core process, and combine the interview questions of this project that I will summarize later, and go to the interview first
4. **Understand the project and expand the project with in-depth interviews** - the interview itself is the best learning driving force
For example, now in March, students who need to find summer internships are short on time - write it first, learn while interviewing, and expand when you have time. Solve your urgent need for a project for the interview. The idea is: **write it first → go to the interview → improve the project based on the interview feedback**.
Usually, summer internships have opportunities from March to July. After finding an internship and having a large model project experience, use this as a springboard to continue learning - autumn recruitment from July to October, and even spring recruitment in March next year, you have a lot of time to continue to accumulate. Although it seems a bit late to start now, it is actually not late. If you can maintain the learning rhythm, from now to March next year, learning for a whole year, there is absolutely no problem in landing on the large model direction in campus recruitment. **The key is whether you can maintain such a long learning ability.**
### 3. Relatively sufficient time —— Take this project as a starting point for expansion
You can take this project as a starting point and expand it in a targeted manner according to your development direction. The DEV_SPEC also writes the expansion direction, here are a few common ones:
- **Want to supplement Agent knowledge**: Implement the Agent side yourself, do some context processing, Tool Calling, ReAct logic, and turn this project into a module and capability of Agent, and become an **Agent + RAG** project
- **Want to show back-end engineering capabilities**: Add back-end deployment capabilities, write Dockerfile, do CI/CD pipeline, add monitoring and log collection
- **Want to do in-depth RAG**: Expand to advanced forms such as Agentic RAG, Graph RAG, etc., or do more optimization experiments on retrieval strategies
Everyone has different development directions - just like the project's supporting Resume Writer Skill, it will first ask about your background and situation when writing a resume. Your positioning as a large model application development engineer, RAG engineer, and full-stack engineer, the requirements for campus recruitment and social recruitment are different (you can see the **large model job introduction** part of the notes for specific introductions to different large model positions and technology stacks), so you need to expand in a targeted manner.
> **Strongly recommended**: No matter what your background or how you expand, you will most likely need to combine your own business to write a resume. So **at least try it** - throw your own field's documents (finance, law, medical, or your business documents) into it and see the retrieval effect. If the effect is not good, then adjust and improve it. This process itself is the best learning and the most convincing practical experience in the interview.
### 4. Plenty of time —— Experience the complete workflow from scratch
If you have enough time, I suggest you start from the `clean-start` branch, and even **delete DEV_SPEC** on the basis of `clean-start`, and experience it bit by bit from document design:
**Document design → AI writes code → Improve and iterate → Test → Deploy**
The methodology of the entire process. How to write DEV_SPEC and how to design Skill are explained in the corresponding videos in the project part of the notes. You can redesign the document, improve the document, or even directly do Agent-related things to complete the entire process.
By doing this, you will learn **the complete idea of developing a project**. The biggest advantage of this method is that **the lower limit is extremely low** - almost you can design and complete the entire project. In this way, you will learn the ideas and the process, and the project can be highly customized. Many friends in the group have already done this.
### 5. Integrate into existing projects —— Integrate RAG capabilities into your existing projects
This is actually a good strategy, and I myself may use this method. Let me tell you about my personal experience:
When I was looking for a job before, I actually had 2 Agent projects, but the RAG process was running very roughly. The general way I wrote on my resume was "What the Agent project did, which involved some RAG knowledge". During the interview, the interviewer would ask about the content of RAG, and then I would tell him, but because the RAG system of the previous project was very shallow - it was actually just doing a basic Embedding vector matching, without coarse-grained ranking, re-ranking and other strategies - so the interviewer's questions were relatively shallow.
After doing this project, one way to deal with it is to **integrate the RAG capabilities of this project into the previous Agent project**, and describe it as part of the Agent project on the resume instead of as an independent project. For example:
> *"……The project uses a self-developed modular RAG system for knowledge retrieval, adopts BM25 + Dense Embedding hybrid recall and sorts by RRF fusion, and combines Cross-Encoder reordering to improve Top-K accuracy; supports multimodal document processing (PDF parsing + Image Captioning), and exposes standardized tool interfaces for Agent calls through the MCP protocol. Integrate the Ragas evaluation framework to establish a Golden Test Set regression testing mechanism to continuously optimize retrieval quality……"*
In this way, your original Agent project will have RAG depth, and you will have something to talk about when the interviewer asks you again.
### 6. Product Manager —— Yes, you read that right, PM can also use this project
More and more interviews for large model product managers will test RAG-related knowledge, and some companies even require product managers to write a POC (Proof of Concept) and then hand it over to development. **This project and the methodology behind it can completely help you do this.**
**Why PM can use it:**
1. **Interview needs**: Large model product positions will test the basic principles and processes of RAG. Through this project, you can intuitively feel the entire process of RAG - from document ingestion, chunking, vectorization, retrieval, reordering to final generation, and establish a product-level understanding
2. **POC capability**: You can completely use this method to build the entire project - write documents (DEV_SPEC), or directly use existing documents, and then use Skill to let AI help you generate code. What you say during the interview is your ideas and product design, and the code is written by AI for you, which is completely reasonable at the moment
3. **No need to care about technical details**: Products don't need to care about how each line of code is written, but by running through this process, you can think about pain points from a product perspective - for example, how to define indicators if the retrieval is inaccurate, how to design a feedback mechanism in terms of user experience, how data quality affects RAG effects, etc.
**How to do it specifically:**
- Clone the `main` branch, run it with Setup Skill, and experience the complete process
- Throw your own business field documents into it, see the retrieval effect, and think about the optimization direction at the product level
- When interviewing, talk about your product ideas and design thinking, and explain that the technical implementation part is completed with the help of AI
> 💡 The notes also provide **Vibe Coding** related tutorials (such as Tina Huang's explanation), which are very suitable for students with non-technical backgrounds to refer to and use AI to quickly build prototypes.
### About the "shallow project" thing
Finally, I want to mention one point independently (this point applies to all the above situations):
**The in-depth optimization of all projects is not achieved in one step.**
If you are changing careers and the projects are all done by yourself, you will encounter the problem that the interviewer thinks your project is shallow. I have mentioned this before, but don't be afraid:
1. **Project depth is not a necessary condition for entering the industry.** I got 6 offers last year, including offers from large companies. Even so, some interviewers still thought my project was shallow. The interview will also consider many other aspects - theoretical foundation, algorithm ability, background matching, knowledge breadth, etc. Don't think that you can't change careers just because you think your project is shallow.
2. **The project is constantly being optimized and deepened.** If the interviewer says your project is shallow, listen to his feedback, and you will definitely be able to hear why he thinks it is shallow - for example, he thinks your data is not complex enough, then you can create some complex data; if he thinks your image processing is too simple, then you can expand the multimodal strategy. I myself am constantly adding things to the project during the interview process: for the Agent project I did before, as the interview progressed, I added deployment, training, reflection data, and evaluation modules - the whole process was carried out simultaneously with the interview.
**Reserve more interview time for yourself, and improve and deepen while interviewing.** So here again, the idea of the whole project is mentioned - if you learn these ideas, you can continue to expand, and the threshold for expansion is very low, just think about the idea and let AI write it, so don't be afraid.
Let me tell you a real data: **It took me about 2 months of after-work time to complete this project from its establishment to its completion**. During this period, I also had to go to work, do self-media, and self-media also had other content to produce. So I hope you don't completely rely on this project as a particularly in-depth project without expansion, especially for social recruitment students. But on the other hand - these things were made in two months of after-work time, if you learn this method, how fast will your expansion be?
The methods are all there, and all the plans, processes, and records are documented and video-explained. **In the end, you must rely on yourself to expand and iterate to make it the most suitable project for you.**
---
## 📝 Resume Reference
> ⚠️ **Strongly recommended**: Please use the built-in **Resume Writer Skill** of the project to generate your resume project experience, instead of directly copying the following examples.
>
> The resume project experience **must be targeted** - it needs to be customized and generated in combination with your own business background, target position, and technical focus. The following examples are only used to show the output effect of Skill and the writing reference for different scenarios. **It is meaningless to copy directly**.
>
> **How to use Resume Writer Skill**: Enter `写简历` or `resume` through the Copilot / Claude dialog box in VS Code. Skill will guide you to complete the portrait collection and automatically generate a four-paragraph resume. For specific usage methods and design ideas, please refer to the **video explanation of the project part** in the notes.
### Resume Writer Skill Working Method
Skill adopts the triangular model of **"Writing Principles + Project Highlights + User Portrait = Customized Resume"**, and the process is as follows:
1. **Portrait Collection**: Skill will ask about your target position (RAG Engineer / Backend / Agent, etc.), business background, technical focus, and special requirements
2. **Highlight Matching**: According to your position direction, select 3-5 of the most matching from the project's 10 major technical highlights to write bullet points
3. **Four-paragraph Generation**: Strictly output according to the **Background → Goal → Process → Result** structure, each bullet follows the "verb beginning + technical details + quantitative effect"
4. **Interview Follow-up Prediction**: Automatically generate 3-5 possible follow-up questions from the interviewer to help you prepare in advance
### Example 1: Campus Recruitment · RAG Engineer
> The following is an example output generated by Skill based on "Campus Recruitment, RAG direction, general framework mode":
**Intelligent Knowledge Retrieval and Question Answering System** | 2024.09 - 2025.02 | Independent Design and Development
**Background**: Aiming at the common pain points of enterprise-level knowledge base scenarios such as scattered documents, insufficient retrieval accuracy, and difficulty in accessing private knowledge for AI applications, a modular RAG retrieval framework was designed and implemented.
**Objective**: Build an intelligent knowledge question answering system based on hybrid retrieval + MCP protocol to achieve accurate semantic retrieval and the ability of AI Agent to directly call private knowledge base, and improve the accuracy of document question answering to more than 90%.
**Process**:
- Designed a BM25 + Dense Embedding hybrid recall architecture, which balances recall and precision through RRF fusion ranking, and combines Cross-Encoder re-ranking to increase the Top-10 hit rate by about 25%
- Build a full-link Ingestion Pipeline (PDF parsing → Markdown → semantic chunking → Metadata enhancement → Embedding → Upsert), integrate Vision LLM to automatically describe pictures and stitch them into Chunk, and reuse the plain text link to "search for text and output pictures"
- Implement a pluggable architecture for LLM / Embedding / Reranker / VectorStore, define a unified abstract interface, and switch the backend Provider with one click through the configuration file, supporting zero-code switching of 4+ LLM Providers
- Integrate Ragas + Custom dual evaluation system, establish Golden Test Set regression testing mechanism, covering Faithfulness / Relevancy / Recall and other dimensions, and reject "feeling-based" optimization
- Based on Skill-driven full-process development, 5 major Agent Skills such as auto-coder / qa-tester / setup / package cover the complete life cycle of coding, testing, configuration, and packaging, and complete the full delivery of 68 sub-tasks in 2 months of spare time
**Results**: The system supports real-time semantic retrieval of 5000+ documents, the retrieval accuracy (Hit Rate@10) reaches 92%, the end-to-end query delay is controlled within 800ms, and the three-layer test system (Unit / Integration / E2E) covers 1200+ test cases.
**Technology Stack**: Python / LangChain / ChromaDB / BM25 / Cross-Encoder / MCP Protocol / Streamlit / Ragas / Azure OpenAI
### Example 2: Social Recruitment · Existing Agent Project, Integrating RAG Depth
> The following is an example output generated by Skill based on "Social Recruitment, Agent direction, Windows platform development business background" (integrating RAG capabilities into existing Agent projects):
**Windows Platform Intelligent Knowledge Assistant** | 2024.06 - 2025.02 | Core Development
**Background**: In the Windows platform development team, version release-related information (Release Notes, change logs, patch announcements, compatibility descriptions, etc.) is scattered in multiple Wikis, document repositories, and internal systems. Engineers need to search across systems when troubleshooting version differences or answering customer questions. The existing keyword search cannot understand semantics, resulting in low retrieval efficiency and frequent information omissions.
**Objective**: Build an intelligent knowledge assistant based on Agent + RAG architecture for the team to realize semantic retrieval and automatic question answering of cross-system documents, integrate it into the daily tool chain of engineers (VS Code / Claude Desktop) through MCP protocol, and shorten the document search time by more than 60%.
**Process**:
- Design Agent + RAG layered architecture, the Agent end is responsible for intent recognition and Tool Calling, the RAG end provides two-stage retrieval capabilities of BM25 + Dense Embedding hybrid recall + Cross-Encoder fine ranking, and exposes standardized tool interfaces for Agent calling through MCP protocol
- Implement a full-link Ingestion Pipeline, support PDF / Markdown multi-format document parsing, integrate Vision LLM to automatically generate picture descriptions (architecture diagrams, screenshots, etc.), and solve the multi-modal retrieval needs of "searching for text and outputting pictures"
- Build a pluggable backend architecture, LLM / Embedding / Reranker / VectorStore all define abstract interfaces, support Azure OpenAI ↔ DeepSeek ↔ Ollama one-click switching, and adapt to different network environments of the team
- Build a Streamlit Dashboard management platform, providing six functional pages: data browsing, Ingestion tracking, query tracking, and evaluation panel, to achieve full-link white-box observability
- Integrate Ragas evaluation framework + Golden Test Set regression testing, continuously monitor retrieval quality in version iteration, and Faithfulness score is stable above 0.85
- Adopt Skill-driven full-process development mode, write DEV_SPEC specification documents to drive auto-coder automatic coding, qa-tester automatic testing and repair, setup one-click environment configuration, 5 major Agent Skills cover the complete development life cycle, and complete the delivery of 68 sub-tasks in 2 months of spare time
**Results**: The system covers 8000+ technical documents of the team, the average daily document query time of engineers is shortened from 15 minutes to 3 minutes, the retrieval accuracy Hit Rate@10 reaches 90%, and it has been connected to 3 internal AI tools through MCP protocol, with a cumulative processing of 20,000+ queries.
**Technology Stack**: Python / Agent / Tool Calling / RAG / BM25 / Dense Retrieval / Cross-Encoder / MCP Protocol / ChromaDB / Streamlit / Ragas / Skill-Driven Development / Azure OpenAI
### Example 3: Social Recruitment · Backend Engineer Turns to AI Direction
> The following is an example output generated by Skill based on "Social Recruitment turns to AI, backend/architecture direction, financial compliance business background":
**Compliance Intelligent Document Retrieval System** | 2024.10 - 2025.02 | Design and Lead Development
**Background**: In the compliance department of a financial institution, the number of regulatory documents and internal policy documents continues to grow to tens of thousands. The compliance team needs to quickly locate specific clauses in review and consultation scenarios, but the existing full-text search system can only accurately match keywords and cannot understand semantic synonymous expressions such as "anti-money laundering" and "AML", resulting in low efficiency in locating clauses.
**Objective**: Design and implement a modular RAG retrieval system to introduce semantic retrieval capabilities into the compliance document management process, support synonym and cross-language clause matching, and aim to improve the accuracy of compliance clause location to more than 90%.
**Process**:
- Lead the system architecture design, adopt a full-link pluggable architecture, LLM / Embedding / Reranker / Splitter / VectorStore all define abstract interfaces and factory patterns, switch the backend with one click through YAML configuration, and adapt to different deployment environments without code modification
- Implement a hybrid recall strategy of BM25 sparse retrieval + Dense Embedding semantic retrieval, and use RRF fusion ranking to take into account the accurate matching of proper nouns and semantic synonym matching, and the retrieval accuracy is 22% higher than the pure vector scheme
- Build a complete data ingestion pipeline, support PDF parsing → semantic chunking → Chunk Refinement → Metadata Enrichment → vectorized storage, implement DocumentManager idempotent management, and ensure data consistency when documents are updated
- Build a three-layer test system (Unit / Integration / E2E), covering 1200+ test cases, integrate Ragas evaluation framework to establish an automated regression mechanism, and ensure that the retrieval quality does not degrade during the iteration process
- Expose standardized tool interfaces based on MCP protocol, support direct calling by AI assistants such as GitHub Copilot / Claude Desktop, and realize "one development, multi-terminal calling" service deployment
- Practice Skill-driven full-process engineering methods, based on DEV_SPEC specification documents, drive AI Agent to automatically complete coding (auto-coder), testing (qa-tester), environment configuration (setup), cleaning and packaging (package), and 68 sub-tasks are fully delivered by Agent, and the development cycle is compressed to 2 months of spare time
**Results**: After the system is launched, it supports real-time semantic retrieval of 12000+ compliance documents, the accuracy of clause location is improved from 68% to 91%, the single query delay is controlled within 700ms, and the document review efficiency of the compliance team is improved by about 50%.
**Technology Stack**: Python / Pluggable Architecture / Factory Pattern / BM25 / Dense Retrieval / RRF / Cross-Encoder / ChromaDB / MCP Protocol / Streamlit / Ragas / Skill-Driven Development / Azure OpenAI
---
> 💡 **Reminder and Important Notes**:
>
> **1. About Amplification Strategy**: The Resume Writer Skill has a built-in **amplification strategy** designed by me - AI will package and amplify your project experience within a reasonable range (such as quantitative indicators, business scale, etc.). This is what I allow and is normal resume writing practice. But this means: **After generating a resume, you must think clearly about what the interviewer may ask about each item and how you should answer it**. Skill will automatically give 3-5 interview follow-up predictions while generating the resume, please prepare these questions carefully.
>
> **2. Treat your resume as a practice checklist**: Every technical point written in your resume, you should **actually try it out**. For example, the resume says "Retrieval accuracy increased by XX%" - then you should run it on your own data to see the actual effect, what problems you encountered during the process, and how you optimized and solved them. These practical experiences are the truly convincing content in the interview, and it is also the process of truly learning something. You can also take this opportunity to do code experiments on parts not covered in the resume (for example, if you haven't tried multi-modality or run evaluations).
>
> **3. The generated resume is a draft, please be sure to modify it according to your own situation**: The resume generated by Skill is a **draft**, not a final draft. You need to adjust it according to your actual situation - which technologies you have actually used in depth, which ones you only know about, and which data needs to be replaced with your own. There is an iron rule in resume writing: **You must be able to talk about everything written on your resume**. Even if a certain point is amplified, you must think clearly about how the interviewer will ask and how you will justify yourself. It is better not to write things that you cannot explain clearly, and if you write them, you must be able to withstand follow-up questions.
>
> **4. Method is more important than template**: The entire resume writing idea is mine - including the amplification strategy, four-paragraph structure (background → objective → process → results → technology stack), highlight matching logic, etc., which are all deposited in the Resume Writer Skill. If you have your own more trusted resume template, or you have made extended modifications to the project, you can completely modify the Skill itself to adapt. **Learning this logic of "using Skill to deposit methodology and letting AI execute according to rules" is more valuable than the resume itself** - this idea can be reused in the resume writing of any of your future projects.
>
> **5. It is strongly recommended to write Skill-driven full process**: My personal opinion is that **the closed loop of Skill-driven full process development is suitable for writing on anyone's resume**. Skill is a very popular direction at the moment, and it is already a must-test content in interviews. Many companies are also studying how to use Skill to accelerate project construction. Explaining clearly how you use Skill to complete the entire project from coding → testing → fixing → configuration → packaging is a relatively innovative and cutting-edge highlight in itself, and the interviewer will be impressed by it. I will also provide some examples for everyone to refer to on how to talk about Skill-related content in interviews and how to answer follow-up questions.
---
## ❓ Common Problems
### 1. How to switch Provider (such as changing to Qwen / DeepSeek / Ollama)?
**Very simple - just ask AI to help you complete it.**
The project uses the **Factory Pattern** in its architecture design, and the expansion and switching of Providers is very convenient. You only need to understand the internal principles to find that: different APIs are essentially similar HTTP requests, and most of them even follow the OpenAI request format, making it particularly easy to switch.
**There are two specific ways to operate:**
1. **Use Setup Skill (recommended)**: Run the one-click Setup Skill, AI will actively ask you which Provider you want to use, guide you to fill in the API Key, and then automatically help you complete the code adaptation and configuration generation.
2. **Let AI help you change it directly**: Tell AI the Provider you want to switch to (such as "Help me switch to Qwen" or "Help me configure DeepSeek"), AI can automatically complete the code writing according to the architecture of the factory pattern.
> **Principle Explanation**: The LLM, Embedding, Reranker and other modules under the project's `src/libs/` all use the factory pattern. Adding a Provider only requires: ① Adding a Provider class; ② Registering in the factory; ③ Updating the `settings.yaml` configuration. AI can completely automate these steps.
### 2. Project Evaluation (Custom Evaluator) and Cross-Encoder Reranker Section
The **framework code for these two modules has been built, but has not been fully tested**. Students who are interested can improve it by themselves:
| Module | Status | What needs to be done |
|------|------|-----------|
| **Custom Evaluator** | Framework exists, not tested | Define evaluation methods, prepare corresponding test datasets |
| **Cross-Encoder Reranker** | Framework exists, not tested | Need to download the local re-ranking model (such as `cross-encoder/ms-marco-MiniLM-L-6-v2`) |
**AI can write these out for you**. Describe the requirements clearly, and AI can help you implement evaluation methods, prepare data, download models, and complete integration testing. Completing these extensions is also a plus for the interview, reflecting your independent expansion ability.
### 3. What if there is a project error / Bug?
**This is not a widely tested production-level project, but an interview-oriented practical project.** It is normal to encounter errors.
- **Impact on the interview**: The project's Bug has almost no impact on the interview - interviewers will not actually run your project, they are concerned about your understanding of the architecture, principles, and design decisions.
- **How to fix**: The easiest way is to **throw the error message directly to AI**, and AI can help you fix most problems.
- **Reference Resources**: The video by Tina Huang recommended in the notes also introduces this method of quickly fixing errors with AI.
### 4. What if you want to ingest document formats other than PDF (Word / Markdown / HTML, etc.)?
**Just ask AI to help you expand it.**
The Loader layer of the project adopts a pluggable abstract design (`BaseLoader`), and currently implements the PDF Loader by default. If you need to support other formats such as Word, Markdown, HTML, etc., the overall architecture has already designed extension points, and you can let AI help you add a corresponding Loader implementation.
For example, tell AI: "Help me add a Loader for Word documents, refer to the existing PDF Loader implementation", AI can completely handle it.
### 5. How to integrate into AI tools (Copilot / Cursor / Claude Code, etc.)?
This project is an **MCP Server** that can be integrated into any AI tool and Agent that supports the MCP protocol. I have already integrated it into **GitHub Copilot** and **Cursor** in my demonstration, and you can also integrate it into **Claude Code** or other tools that support the MCP framework.
**How to integrate? Very simple - ask AI.**
Essentially, it is writing an MCP configuration file for different tools:
- **Copilot (VS Code)**: Let AI help you generate the MCP configuration file
- **Cursor**: Import the project directly, Cursor will automatically recognize it
- **Claude Code / Other frameworks**: Ask AI how to configure it, the configuration method of each tool is slightly different, but the principle is the same
Of course, it is also recommended that you understand the principle of the MCP protocol - understand how the Server and Client communicate, and how the Tool is registered and called. These are also plus points in the interview.
### 6. General advice: Make good use of AI
Most of the above problems (Provider switching, module expansion, Bug fixing, architecture understanding) **AI can solve**:
- 🔧 **Code level**: Let AI help you switch Provider, implement evaluation methods, fix Bugs
- 📖 **Knowledge level**: Project architecture problems, design pattern problems, you can ask AI for explanations
- 🚀 **Expansion level**: If you want to add new functions or adapt to new scenarios, describe the requirements clearly and let AI help you implement them
> Ask AI more and let it guide you. This is also one of the core concepts that this project wants to convey - **learn to collaborate with AI in development**.
---
## 📌 Subsequent Arrangements
### ✅ Will do
- Summary and FAQ of project-related questions
- Summary of high-frequency interview questions and reference answers
- Explanation of technical points (RAG core knowledge, architecture design, etc.)
- Resume packaging suggestions and demonstrations
- **Personal interview practice**: I will go to the interview with this project, and summarize the problems encountered and how to answer them into the document
- **Welcome to contribute and build together**: If you use this project to go to the interview, you can send me the interview recording, and I will help you analyze the project-related questions and write them into the document, and you can also listen to the overall interview for improvement suggestions. In this way, everyone can progress together, and jointly summarize and improve the interview questions and answers of this project
### ❌ Will not do
- Will not continue to expand new functions
- Will not handle Bug Fix, design optimization, etc.
- If you encounter Bugs and design improvements, please fix and optimize them in your own project
- Subsequent extensions and fixes must be done by yourself, and **with AI, these are very easy to do**
- This is a very good learning and interview plus point in itself
- Independent expansion based on understanding the project is the real embodiment of ability
### 📝 Personal Planning Statement
I will go to study **large model algorithms and training** in the future, and will summarize some notes and ideas in the notes. Therefore, for this project, **I will not infinitely expand functions or fix Bugs**, but I will be very happy to continue to do things:
- Summarize the problems encountered in the interview with this project
- Sort out how to answer and how to iterate and optimize ideas
- Deposit the interview questions and answers into the document for everyone to refer to
---
## 📚 Supporting Resources
This project is equipped with complete supporting learning resources, including:
- 🎬 **Video Explanation**: Project architecture design, Skill usage, DEV_SPEC writing, full development process demonstration
- 📝 **Interview Notes**: Large model direction interview preparation, RAG core knowledge point sorting
- ❓ **Interview Question Reference**: Real questions and reference answers encountered in the interview of this project
- 📖 **Eight-legged Essay Sorting**: High-frequency interview questions related to large models / RAG / NLP
> 👉 **Please follow Xiaohongshu: [Do not change my name until I switch to a large model](https://www.xiaohongshu.com) to get all the above resources.**
Connection Info
You Might Also Like
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.
Context 7
Context7 MCP provides up-to-date code documentation for any prompt.
context7-mcp
Context7 MCP Server provides natural language access to documentation for...
chrome-devtools-mcp
Chrome DevTools for coding agents