Content
# Modular RAG MCP Server
> A plug-and-play, observable modular RAG (Retrieval-Augmented Generation) service framework that exposes tool interfaces through the MCP (Model Context Protocol) protocol, supporting AI assistants like Copilot/Claude to call directly. It is also a practical project and supporting teaching resources designed for **large model related positions learning and interview job seeking**.
## Table of Contents
- [Project Overview](#project-overview)
- [Branch Description](#branch-description)
- [Quick Start](#quick-start)
- [Who is suitable for this project & How to use](#who-is-suitable-for-this-project--how-to-use)
- [Resume Reference](#resume-reference)
- [FAQ](#faq)
- [Follow-up Arrangements](#follow-up-arrangements)
## Project Overview
### What is this project
This project combines the most common core aspects of RAG interviews - **retrieval (Hybrid Search + Rerank)**, **multi-modal visual processing (Image Captioning)**, **RAG evaluation (Ragas + Custom)**, and **generation (LLM Response)** - as well as the popular **MCP (Model Context Protocol)** application protocol into a complete, runnable engineering project.
**A major highlight of this project is that it is extremely easy to adapt to your own business**. Thanks to the fully pluggable architecture, you can quickly integrate it into your existing project, regardless of your background and needs. The specific usage strategies will be detailed in [Who is suitable for this project & How to use](#who-is-suitable-for-this-project--how-to-use).
### Not just a project, but a whole set of ideas
**What's more valuable than this project itself is the whole set of engineering ideas behind it**:
- How to write **DEV_SPEC** (development specification document) to drive development
- How to use **Skill** to automatically complete code writing based on Spec
- How to use **Skill** for automated testing, packaging, and environment configuration
- How to extend based on a pluggable architecture (e.g., extending to Agent)
**Once you understand the ideas, you can create new projects and extensions yourself**. The specific practices and design ideas for each step are explained in the notes, and it's recommended to watch them together.
### Core capabilities
| Module | Capability | Description |
|------|------|------|
| **Ingestion Pipeline** | PDF → Markdown → Chunk → Transform → Embedding → Upsert | Full-chain data ingestion, supporting multi-modal image description (Image Captioning) |
| **Hybrid Search** | Dense (vector) + Sparse (BM25) + RRF Fusion + Rerank | Two-stage retrieval architecture with coarse recall and fine ranking |
| **MCP Server** | Standard MCP protocol exposes tools | `query_knowledge_hub`, `list_collections`, `get_document_summary` |
| **Dashboard** | Streamlit six-page management platform | System overview / data browsing / ingestion management / ingestion tracking / query tracking / evaluation panel |
| **Evaluation** | Ragas + Custom evaluation system | Supports golden test set regression testing, refusing "feeling-based" optimization |
| **Observability** | Full-chain white-box tracking | Ingestion and query chain transparent and visible at every intermediate state |
| **Skill-driven process** | From writing to testing, packaging, and configuration, completed with one click | auto-coder / qa-tester / package / setup and other skills covering the complete development lifecycle |
### Technical highlights
**🔌 Fully pluggable architecture**: LLM / Embedding / Reranker / Splitter / VectorStore / Evaluator each core link defines an abstract interface, supporting "Lego-style" replacement, switching back-end with configuration files, and zero code modification.
**🔍 Hybrid retrieval + reranking**: BM25 sparse retrieval solves precise matching of proper nouns + Dense Embedding solves synonymous semantic matching, RRF fusion, and optional Cross-Encoder / LLM Rerank fine ranking, balancing recall and precision.
**🖼️ Multi-modal image processing**: Adopting Image-to-Text strategy, using Vision LLM to automatically generate image descriptions and integrate them into Chunk, reuse pure text RAG link to achieve "search text output image".
**📡 MCP ecological integration**: Following Model Context Protocol standards, directly docking with GitHub Copilot, Claude Desktop, and other MCP clients, zero front-end development, one-time development, and everywhere available.
**📊 Visualized management + automated evaluation**: Streamlit Dashboard provides complete data management and chain tracking capabilities, integrates Ragas and other evaluation frameworks, and establishes a data-driven iterative feedback loop.
**🧪 Three-layer testing system**: Unit / Integration / E2E hierarchical testing, covering independent module logic, module interaction, and complete chain (MCP Client / Dashboard).
**🤖 Skill-driven process**: Built-in auto-coder (automatic coding), qa-tester (automatic testing), package (cleaning and packaging), setup (one-click configuration), and other Agent Skills, covering the complete development lifecycle from code writing to testing, packaging, and deployment.
> 📖 For detailed architecture design, module description, and task schedule, please refer to [DEV_SPEC.md](DEV_SPEC.md)
## Branch Description
This project provides three branches, targeting different usage scenarios. Please choose according to your needs:
### `main` — The cleanest complete code
- Always only **1 commit**, containing the latest complete code of the project
- **Suitable for**:
- Those who want to quickly experience the complete function of the project
- Those who are short on time and want to quickly get a project for an interview, skipping the intermediate development process
- Those who want to directly extend this project on the basis of it
- **Usage**: Clone and run Setup Skill directly to experience
### `dev` — Retain complete development records
- Code is consistent with `main`, but retains complete commit history
- Records every step of building from scratch, including many intermediate nodes
- **Suitable for**: Those who want to understand how the project was built step by step, and can trace back development ideas through commit history
### `clean-start` — Clean starting point, from scratch
- Only contains the project skeleton (Agent Skills + DEV_SPEC), all task progress is zero
- Retains complete Skill configuration, can use Agent to assist development
- **Suitable for**:
- Those who have plenty of time and want to develop from scratch (**strongly recommended**)
- Those who want to experience the complete workflow: write Spec → split tasks → write code → write tests → iterate and optimize
- Can also design and implement based on their own understanding, and deeply understand each module
- Use all the ideas mentioned (Spec-driven development, testing first, pluggable architecture, etc.) to complete the project
- **Core concept**: The code writing of the entire project is **let AI complete based on DEV_SPEC**, you don't need to write code manually. AI reads task definitions, architecture design, and interface specifications in Spec, and automatically generates code that meets the specifications. Refer to the corresponding video explanation in the notes: **5.1 Project Skills Usage: How to Let AI Use Skill Follow DEV_SPEC to Complete Code**.
## Quick Start
### 1. Clone the project
```bash
git clone <repo-url>
cd Modular-RAG-MCP-Server
```
### 2. One-click configuration (Setup Skill)
This project provides **Setup Skill** to complete all environment configurations with one click, including: provider selection → API key configuration → dependency installation → configuration file generation → Dashboard startup.
Open the project in VS Code and input through the Copilot/Claude dialog box:
```bash
setup
```
The Agent will guide you through the entire configuration process.
> 💡 If you are unfamiliar with the usage of Skill, please watch the video explaining the usage of Setup Skill in the supporting notes.
## Who is suitable for this project & How to use
Everyone's background is different - some are school recruits, some are social recruits; the foundation is also different - some have AI project experience, and some are transferring directions. Therefore, the usage strategy for this project should also be different, **please be flexible and avoid rigid application**.
However, one thing is universal: **the ideas behind the entire project** - how to write Spec to quickly start a project, how to use Skill to drive AI to automatically code and test - these engineering methodologies are applicable to any project and worth learning for everyone.
For the usage strategies of the project in different scenarios, I will provide some specific examples and expand on my personal experience - **if I were in different situations, how would I use this project** - for your reference.
### 1. Pure learning RAG - take the project as a learning material for RAG
This project itself is a complete RAG system and can serve as a practical project for learning RAG.
When I first learned RAG, I read this book: **《Large Model RAG Practice: RAG Principle, Application and System Construction》** (Written by experts in the AI field such as Wang Peng, Gu Qingshui, and Bian Longpeng). You can combine this book to learn RAG, and the book involves typical links - retrieval, generation, vector database, blocking strategy, reordering, etc. - **this project connects these steps**, so it can serve as a general RAG full-process project to learn the entire process. You can match this book, and I believe you can also match other RAG books because the process is universal. Interviews with RAG are also nothing but a combination of these processes, principles, and difficulties encountered in actual situations and optimizations.
### 2. Time is urgent - lack of a project for an interview
If you currently have no AI-related projects and urgently need a project for an interview, you can:
1. **Directly use this project**, clone the `main` branch, use Setup Skill to run it
2. **Combine Resume Writer Skill** to write your resume (Skill will generate project descriptions based on your background)
3. **Try to understand the project**, run through the core process, combine my subsequent summary of the project's interview questions, and go to the interview first
4. **As the interview deepens, understand and expand the project** - the interview itself is the best learning driver
For example, now in March, you need to find a summer internship. If you're short on time:
- Write it first → go to the interview → improve the project according to the interview feedback
Usually, summer internships from March to July have opportunities. After finding an internship and having a large model project experience, you can use this as a stepping stone to continue learning - from July to October for the autumn recruitment, or even to next March for the spring recruitment, you have plenty of time to continue accumulating. If you can maintain a learning rhythm, from now to next March, learning for a whole year, it's not late. **The key is whether you can maintain such a long period of learning ability.**
### 3. Time is relatively sufficient - expand on this project
You can take this project as a starting point and expand it according to your development direction. The DEV_SPEC also writes about the expansion direction. Here are a few common ones:
- **Want to supplement Agent knowledge**: implement the Agent side, do some context processing, Tool Calling, ReAct logic, and use this project as a module and ability of Agent, becoming an **Agent + RAG** project
- **Want to show backend engineering capabilities**: add backend deployment capabilities, write Dockerfile, do CI/CD pipeline, add monitoring and log collection
- **Want to make RAG deeper**: expand to Agentic RAG, Graph RAG and other advanced forms, or do more optimization experiments on retrieval strategies
Everyone's development direction is different - just like the Resume Writer Skill that comes with the project, when writing a resume, it will ask about your background and situation. You position yourself as a large model application development engineer, RAG engineer, full-stack engineer, and the requirements for school recruitment and social recruitment are different (specific large model different positions introduction and technical stack can see the notes **large model position introduction** part), so you need to expand accordingly.
> **Strongly recommended**: regardless of your background, how to expand, you will probably need to combine your business to write a resume. So **try it at least** - put your own field documents (financial, legal, medical, or your business documents) in, look at the retrieval effect, and adjust and improve. This process itself is the best learning and the most persuasive practical experience for an interview.
### 4. Time is particularly sufficient - experience the complete workflow from scratch
If you have plenty of time, I recommend you start from the `clean-start` branch, or even delete **DEV_SPEC** from `clean-start`, and experience from scratch:
**document design → AI write code → improve iteration → test → deployment**
The complete methodology. Among them, how to write DEV_SPEC, how to design Skill, these are explained in the corresponding video explanations in the project part of the notes. You can redesign documents, improve documents, or do Agent-related things to complete the entire process.
This way, you will learn **the complete idea of developing a project**. The biggest advantage of this set of methods is **extremely low threshold** - almost anyone can design and complete the entire project. This way, you will learn the ideas and the process, and the project can be highly customized. Many friends in the group have already done this.
### 5. Integrate into an existing project - integrate RAG capabilities into your existing project
This is also a good strategy. I may also use this approach. Based on my personal experience:
I previously had 2 Agent projects when looking for a job, but the RAG process was very rough. I roughly wrote it on my resume as "... used some RAG knowledge in the Agent project". During the interview, the interviewer would ask about RAG content, and I would talk to him, but because the previous project's RAG system was very shallow - actually just did a basic Embedding vector matching, no coarse retrieval, reranking and other strategies - so the interviewer asked and I talked, but not in-depth.
After doing this project, one way is to **integrate the RAG capabilities of this project into the previous Agent project**, and describe it on the resume as part of the Agent project. For example:
> *"…… used the self-developed modular RAG system for knowledge retrieval, using BM25 + Dense Embedding hybrid recall and RRF fusion sorting, combined with Cross-Encoder reranking to improve Top-K accuracy; support multi-modal document processing (PDF parsing + Image Captioning); MCP protocol exposes standardized tool interfaces for Agent calling. Integrated Ragas evaluation framework, establish Golden Test Set regression testing mechanism, continuously optimize retrieval quality..."*
This way, your original Agent project has RAG depth, and the interviewer will ask you something to talk about.
### 6. Product manager - yes, PM can also use this project
Large model product manager interviews are increasingly testing RAG-related knowledge, and some companies even require product managers to write a POC (Proof of Concept) and then hand it to development. **This project and the methodology behind it can help you do this.**
**Why PM can use**:
1. **Interview needs**: large model product positions will test RAG's basic principles and process, you can directly feel the entire RAG process through this project - from document ingestion, chunking, vectorization, retrieval, reranking, and final generation, establish a product-level understanding
2. **POC capability**: you can construct the entire project - write documents (DEV_SPEC), or directly use existing documents, then use Skill to let AI help you generate code. During the interview, you talk about your ideas and design thinking, and the technical implementation part explains that AI helped write it, which is completely reasonable
3. **Don't care about technical details**: products don't need to care about every line of code write how, but through running through the process, you can think about product-level pain points - such as retrieval accuracy definition indicators, user experience design feedback mechanisms, data quality affects RAG effect, etc.
**How to do it**:
- Clone the `main` branch, use Setup Skill to run it, experience the complete process
- put your business field documents in, look at the retrieval effect, think about product-level optimization directions
- interview, talk about your product ideas and design thinking, technical implementation part explains that AI helped complete it
> 💡 The notes also provide **Vibe Coding** related tutorials (such as Tina Huang's explanation), which are suitable for non-technical background students, and use AI to quickly build prototypes.
### About "project shallow" matters
Finally, I want to mention one thing (this point applies to all situations):
**The in-depth optimization of all projects is not one-off.**
If you are a transfer, projects are all done yourself, you will encounter the interviewer feel your project shallow. I also mentioned this point before, but don't be afraid:
1. **Project depth is not necessary for entry requirements**. I got 6 offers, including large factory offers, even so, there are still interviewers who feel my project shallow. The interview will consider many other aspects - theoretical foundation, algorithm capabilities, background matching, knowledge breadth, etc. Don't feel that you can transfer because you transfer project shallow.
2. **Projects are constantly optimized and improved. The interviewer says your project is shallow, you listen to his feedback, and you can see why they feel shallow, and you can see that he wants shallow, you can make some complex data, he feels that image processing is too simple, you can expand multi-modal strategies. I myself also added deployment, training, reflection data, evaluation modules during the interview - the whole process was synchronized with the interview.
**Give yourself some interview time, and you can improve and deepen it.** So here, I mentioned the ideas of the entire project - learn these ideas, you can continue to expand, and the threshold for expansion is very low, think about good ideas, let AI write.
Tell a true data: **This project was developed from scratch to completion, and it took me about 2 months of spare time, and I still had to work, do self-media, and produce other content.** So I hope you don't rely entirely on this project as a particularly in-depth project, especially for social recruiters. But on the other hand - if you learn this set of methods, how fast will your expansion speed be?
The methodology is available, all the programs, processes, records, and videos are left. **Ultimately, you have to expand and iterate on your own, and make it the most suitable project for yourself.**
## Resume Reference
> ⚠️ **Strongly recommended**: Please use the project built-in **Resume Writer Skill** to generate your resume project experience, rather than directly copying the examples below.
>
> Project experience on the resume **must be targeted** - needs to be customized and generated based on your own business background, target position, and technical focus.
> The examples below are only for demonstration and reference, **no direct copying**.
### Resume Writer Skill Working Method
Skill adopts **"writing principles + project highlights + user portraits = customized resume"** triangular model, the process is as follows:
1. **Portrait collection**: Skill will ask about your target position (RAG Engineer / Backend / Agent, etc.), business background, technical focus, and special requirements.
2. **Highlight matching**: According to your position direction, from 10 major technical highlights of the project, select 3-5 most matching ones to write bullet points.
3. **Four-paragraph generation**: Strictly follow the **background → goal → process → result** structure, output each bullet point following "verb + technical details + quantitative effect".
4. **Interview follow-up prediction**: Automatically generate 3-5 questions that the interviewer may ask, to help you prepare in advance.
## FAQ
## Follow-up Arrangements
### Tool List
### Example 1: RAG Engineer for School Recruitment
> The following is an example output generated by Skill based on "school recruitment, RAG direction, general framework mode":
**Intelligent Knowledge Retrieval and Question Answering System** | 2024.09 - 2025.02 | Independent Design and Development
**Background**: Aiming at the common pain points of scattered documents, insufficient retrieval accuracy, and difficulty in accessing AI applications to private knowledge in enterprise-level knowledge base scenarios, a modular RAG retrieval framework was designed and implemented.
**Goal**: Construct an intelligent knowledge question and answer system based on hybrid retrieval + MCP protocol, achieve accurate semantic retrieval and AI Agent direct call private knowledge base capabilities, and improve document question and answer accuracy to over 90%.
**Process**:
- Design BM25 + Dense Embedding hybrid recall architecture, balance recall and precision through RRF fusion sorting, and improve Top-10 hit rate by about 25% through Cross-Encoder re-ranking
- Build an end-to-end Ingestion Pipeline (PDF parsing → Markdown → semantic chunking → Metadata enhancement → Embedding → Upsert), integrate Vision LLM to automatically describe images and merge them into Chunk, and reuse pure text link to achieve "search text and output image"
- Implement LLM / Embedding / Reranker / VectorStore full-chain pluggable architecture, define unified abstract interface, switch back-end Provider with one click through configuration file, support 4+ LLM Provider zero-code switching
- Integrate Ragas + Custom dual evaluation system, establish Golden Test Set regression test mechanism, cover Faithfulness / Relevancy / Recall and other dimensions, refuse "feeling" optimization
- Drive full-process development based on Skill, cover complete life cycle of coding, testing, configuration, and packaging through 5 major Agent Skills, complete 68 sub-tasks in 2 months of spare time
**Result**: The system supports real-time semantic retrieval of 5000+ documents, retrieval accuracy (Hit Rate@10) reaches 92%, end-to-end query delay controlled within 800ms, and three-layer test system (Unit / Integration / E2E) covers 1200+ test cases.
**Tech Stack**: Python / LangChain / ChromaDB / BM25 / Cross-Encoder / MCP Protocol / Streamlit / Ragas / Azure OpenAI
### Example 2: 社招 · 已有 Agent 项目,融入 RAG 深度
> The following is an example output generated by Skill based on "社招、Agent 方向、Windows 平台开发业务背景" (the translation is: "Example output generated by Skill based on '社招, Agent direction, Windows platform development business background'"):
**Windows Platform Intelligent Knowledge Assistant** | 2024.06 - 2025.02 | Core Development
**Background**: In the Windows platform development team, version release-related information (Release Notes, change logs, patch announcements, compatibility instructions, etc.) is scattered across multiple Wiki, document repositories, and internal systems. Engineers have to search across systems to troubleshoot version differences or answer customer questions, and existing keyword search cannot understand semantics, leading to low retrieval efficiency and frequent information omission.
**Goal**: Construct an intelligent knowledge assistant based on Agent + RAG architecture for the team, achieve semantic retrieval and automatic question and answer across system documents, integrate into engineers' daily toolchain (VS Code / Claude Desktop) through MCP protocol, and shorten document search time by over 60%.
**Process**:
- Design Agent + RAG hierarchical architecture, Agent side responsible for intention identification and Tool Calling, RAG side provides BM25 + Dense Embedding hybrid recall + Cross-Encoder precise ranking two-stage retrieval capability, expose standardized tool interface through MCP protocol for Agent call
- Implement full-chain Ingestion Pipeline, support multi-format document parsing such as PDF / Markdown, integrate Vision LLM to automatically generate image descriptions (architecture diagrams, screenshots, etc.), solve multi-modal retrieval needs of "search text and output image"
- Construct pluggable back-end architecture, define abstract interfaces for LLM / Embedding / Reranker / VectorStore, support Azure OpenAI ↔ DeepSeek ↔ Ollama one-click switching, adapt to different network environments of the team
- Build Streamlit Dashboard management platform, provide data browsing, Ingestion tracking, query tracking, and assessment panel six functional pages, achieve full-chain white-box observability
- Integrate Ragas assessment framework + Golden Test Set regression test, continuously monitor retrieval quality during version iteration, Faithfulness score stable at 0.85+
- Adopt Skill-driven full-process development mode, write DEV_SPEC specification document to drive auto-coder automatic coding, qa-tester automatic testing and repair, setup one-click environment configuration, 5 major Agent Skills cover complete development life cycle, complete 68 sub-tasks delivery in 2 months of spare time
**Result**: The system covers 8000+ technical documents of the team, engineers' daily document query time shortened from 15 minutes to 3 minutes, retrieval accuracy Hit Rate@10 reaches 90%, has accessed 3 internal AI tools through MCP protocol, accumulated over 20,000 queries.
**Tech Stack**: Python / Agent / Tool Calling / RAG / BM25 / Dense Retrieval / Cross-Encoder / MCP Protocol / ChromaDB / Streamlit / Ragas / Skill-Driven Development / Azure OpenAI
### Example 3: 社招 · 后端工程师转 AI 方向
> The following is an example output generated by Skill based on "社招转 AI、后端/架构方向、金融合规业务背景" (the translation is: "Example output generated by Skill based on '社招转 AI, back-end/architecture direction, financial compliance business background'"):
**Compliance Intelligent Document Retrieval System** | 2024.10 - 2025.02 | Design and Lead Development
**Background**: In a financial institution's compliance department, regulatory documents and internal policy documents continue to grow to tens of thousands, compliance team needs to quickly locate specific clauses in review and consultation scenarios, but existing full-text search system can only accurately match keywords, cannot understand semantic near-synonym expressions such as "anti-money laundering" and "AML", resulting in low clause positioning efficiency.
**Goal**: Design and implement modular RAG retrieval system, introduce semantic retrieval capability into compliance document management process, support near-synonym, cross-language clause matching, target compliance clause positioning accuracy to over 90%.
**Process**:
- Lead system architecture design, adopt full-chain pluggable architecture, define abstract interfaces and factory mode for LLM / Embedding / Reranker / Splitter / VectorStore, switch back-end with one click through YAML configuration, zero-code modification to adapt to different deployment environments
- Implement BM25 sparse retrieval + Dense Embedding semantic retrieval hybrid recall strategy, balance precise matching of proprietary nouns and semantic near-synonym matching through RRF fusion sorting, retrieval accuracy improved by 22% compared to pure vector scheme
- Construct complete data ingestion pipeline, support PDF parsing → semantic chunking → Chunk Refinement → Metadata Enrichment → vectorized storage, implement DocumentManager idempotent management, ensure data consistency during document updates
- Build three-layer test system (Unit / Integration / E2E), cover 1200+ test cases, integrate Ragas assessment framework to establish automated regression mechanism, ensure retrieval quality does not degrade during iteration
- Expose standardized tool interface based on MCP protocol, support GitHub Copilot / Claude Desktop and other AI assistants to directly call, achieve "one-time development, multi-terminal call" service-oriented deployment
- Practice Skill-driven full-process engineering method, drive AI Agent to automatically complete coding (auto-coder), testing (qa-tester), environment configuration (setup), and packaging (package) based on DEV_SPEC specification document, 68 sub-tasks fully delivered by Agent, development cycle compressed to 2 months of spare time
**Result**: The system supports real-time semantic retrieval of 12000+ compliance documents, clause positioning accuracy improved from 68% to 91%, single query delay controlled within 700ms, compliance team document review efficiency improved by about 50%.
**Tech Stack**: Python / pluggable architecture / factory mode / BM25 / Dense Retrieval / RRF / Cross-Encoder / ChromaDB / MCP Protocol / Streamlit / Ragas / Skill-Driven Development / Azure OpenAI
## ❓ Frequently Asked Questions
### 1. How to switch Provider (e.g., to Qwen / DeepSeek / Ollama)?
**Very simple - just ask AI to help you complete it.**
The project uses **factory pattern** from the architecture design, and the extension and switching of Provider are very convenient. You only need to understand the internal principle to find that different APIs are essentially similar HTTP requests, and most of them follow the OpenAI request format, making switching very easy.
**There are two specific operation methods:**
1. **Use Setup Skill (recommended)**: Run the one-click Setup Skill, and AI will actively ask you which Provider you want to use, guide you to fill in the API Key, and automatically help you complete code adaptation and configuration generation.
2. **Let AI help you directly**: Tell AI the Provider you want to switch to (e.g., "help me switch to Qwen" or "help me configure DeepSeek"), and AI can automatically complete code writing based on the factory pattern architecture.
> **Principle explanation**: The project's `src/libs/` directory uses factory mode for LLM, Embedding, Reranker and other modules. Adding a new Provider only requires: ① adding a Provider class; ② registering in the factory; ③ updating `settings.yaml` configuration. AI can automatically complete these steps.
### 2. Project Evaluation (Custom Evaluator) and Cross-Encoder Reranker
These two modules have **completed framework code but have not been fully tested**. Interested students can complete them independently:
| Module | Status | What to do |
|------|------|-----------|
| **Custom Evaluator** | Framework available, not tested | Define evaluation method, prepare corresponding test dataset |
| **Cross-Encoder Reranker** | Framework available, not tested | Need to download local re-ranking model (e.g., `cross-encoder/ms-marco-MiniLM-L-6-v2`) |
**AI can help you write these**. Describe the requirements clearly, and AI can help you implement evaluation methods, prepare data, download models, and complete integration testing. Completing these extensions is also a plus for interviews, reflecting your independent extension capabilities.
### 3. Project Error / Bug
**This is not a widely tested production-level project, but an interview-oriented practical project**. Encountering errors is normal.
- **Impact on interviews**: Project bugs have almost no impact on interviews - interviewers will not actually run your project, and they focus on your understanding of architecture, principles, and design decisions.
- **How to fix**: The simplest way is to **throw error information directly to AI**, and AI can help you fix most problems.
- **Reference resources**: Notes recommend Tina Huang's video, which also introduces this method of quickly fixing errors with AI.
### 4. Want to ingest PDF and other document formats (Word / Markdown / HTML, etc.)?
**Just ask AI to help you extend it**.
The Loader layer of the project adopts a pluggable abstract design (`BaseLoader`). Currently, it implements PDF Loader by default. If you need to support Word, Markdown, HTML, and other formats, the overall architecture has designed extension points, and AI can help you add a corresponding Loader implementation.
For example, tell AI: "Help me add a Word document Loader, refer to the existing PDF Loader implementation," and AI can handle it.
### 5. How to integrate with AI tools (Copilot / Cursor / Claude Code, etc.)?
This project is an **MCP Server** that can be integrated into any AI tool and Agent that supports the MCP protocol. My demo has already integrated with **GitHub Copilot** and **Cursor**, and you can also integrate with **Claude Code** or other tools that support the MCP framework.
**How to integrate? Very simple - ask AI**.
Essentially, it is to write an MCP configuration file for different tools:
- **Copilot (VS Code)**: Let AI help you generate the MCP configuration file
- **Cursor**: Directly import the project, and Cursor will automatically recognize
- **Claude Code / other frameworks**: Ask AI how to configure, each tool's configuration method is slightly different, but the principle is the same
Of course, it is also recommended that you understand the principle of the MCP protocol - understand how the Server and Client communicate, and how Tools are registered and called. These are also plus points in interviews.
### 6. General suggestions: make good use of AI
Most of the above problems (Provider switching, module extension, bug fixing, architecture understanding) **can be solved by AI**:
- 🔧 **Code level**: Let AI help you switch Provider, implement evaluation methods, fix bugs
- 📖 **Knowledge level**: Project architecture issues, design pattern issues, can be asked to AI for explanation
- 🚀 **Extension level**: Describe requirements clearly and let AI help you implement them
> Make good use of AI and let it guide you. This is also one of the core ideas of this project - **learn to cooperate with AI development**.
## 📌 Follow-up arrangements
### ✅ Will do
- Project-related problem summary and FAQ sorting
- Interview high-frequency problem sorting and reference answers
- Technical point explanation (RAG core knowledge, architecture design, etc.)
- Resume packaging suggestions and demonstration
- **Personal interview practice**: I will take this project to the interview, summarize the problems encountered, how to answer, and write them into the document
- **Welcome to contribute and build**: If you use this project for an interview, you can send the interview recording to me, and I will help you analyze the project-related problems and write them into the document, and you can also listen to the overall improvement suggestions for the interview. This way, everyone can make progress together and summarize and improve this project's interview questions
### ❌ Will not do
- Will not continue to extend new functions
- Will not handle Bug Fix, design optimization, etc.
- Encounter bugs and design improvements, please fix and optimize in your own project
- Subsequent extensions and repairs must be done by yourself, and **with AI, these are very easy to do**
- This is also a good learning and interview plus point
- On the basis of understanding the project, independent extension is the real ability to reflect
### 📝 Personal plan description
I will follow up on learning **large model algorithms, training** and other directions, and summarize some notes and ideas in the notes. Therefore, for this project, **will not extend new functions or fix bugs**, but will be very willing to do:
- Summarize the problems encountered in the interview with this project
- Organize how to answer and how to iterate and optimize ideas
- Write interview questions and answers into the document for everyone to refer to
## 📚 Supporting resources
This project has complete supporting learning resources, including:
- 🎬 **Video explanation**: Project architecture design, Skill use, DEV_SPEC writing, and full-process development demonstration
- 📝 **Interview notes**: Large model direction interview preparation, RAG core knowledge points finishing
- ❓ **Interview question reference**: Real problems and reference answers encountered in the interview with this project
- 📖 **Eight major finishing**: Large model / RAG / NLP related high-frequency interview questions
> 👉 **Please pay attention to Xiaohongshu: [不转到大模型不改名](https://www.xiaohongshu.com) to get all the above resources.**
Connection Info
You Might Also Like
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.
Context 7
Context7 MCP provides up-to-date code documentation for any prompt.
context7-mcp
Context7 MCP Server provides natural language access to documentation for...
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.
chrome-devtools-mcp
Chrome DevTools for coding agents