Content
# AuditLuma - Advanced Code Audit AI System 🔍
<div align="center">




</div>
AuditLuma is an intelligent code audit system that adopts an innovative **hierarchical RAG architecture**, combining multiple AI agents and advanced technologies, including Haystack-AI Orchestrator, txtai knowledge retrieval, R2R context enhancement, and Self-RAG validation, to provide comprehensive and accurate security analysis for codebases.
## 🌟 Architecture Highlights
- 🏗️ **Hierarchical RAG Architecture** - Four-layer intelligent architecture: Haystack Orchestration + txtai Retrieval + R2R Enhancement + Self-RAG Validation
- 🚀 **Haystack-AI Orchestrator** - Intelligent task decomposition and result integration, supports fallback to traditional orchestrators
- 🔍 **Intelligent Knowledge Retrieval** - txtai-driven semantic retrieval and context understanding
- 🎯 **Precise Validation** - Self-RAG multi-model cross-validation, effectively reduces false positives
- 🔄 **Adaptive Architecture** - Automatically selects the optimal architecture mode based on project size
## ✨ Core Features
### 🏗️ Hierarchical RAG Architecture
- **Haystack Orchestration Layer** - Intelligent task decomposition, parallel execution, and result integration
- **txtai Knowledge Retrieval Layer** - Semantic retrieval and context understanding
- **R2R Context Enhancement Layer** - Dynamic context expansion and correlation analysis
- **Self-RAG Validation Layer** - Multi-model cross-validation and false positive filtering
### 🚀 Intelligent Orchestration System
- **Haystack-AI Orchestrator** - AI-based intelligent task orchestration (recommended)
- **Traditional Orchestrator** - Rule-driven stable orchestration scheme
- **Automatic Fallback Mechanism** - Automatic switching when the AI orchestrator is unavailable
- **Dynamic Architecture Selection** - Automatically selects the optimal architecture based on project size
### 🔍 Advanced Analysis Capabilities
- 🛡️ **Comprehensive Security Analysis** - Comprehensive detection of vulnerabilities and provision of effective remediation suggestions
- 🌐 **Cross-File Security Analysis** - Detects cross-file vulnerabilities that traditional single-file analysis cannot find
- 📊 **Global Context Construction** - Constructs code call graphs, data flow graphs, and dependencies
- 🎯 **Taint Analysis** - Tracks the propagation path of user input in the code
- 🔄 **MCP (Multi-Agent Collaboration Protocol)** - Enhances coordination and collaboration between agents
### 🌐 Enterprise-Level Support
- **Multi-LLM Vendor Support** - Supports multiple vendors including OpenAI, DeepSeek, MoonShot, Tongyi Qianwen, and more
- **Automatic Vendor Detection** - Automatically identifies and configures the correct vendor API based on the model name
- **Asynchronous Parallel Processing** - Uses asynchronous concurrency technology to improve performance and speed up analysis
- **Visualization Features** - Generates dependency graphs and detailed security reports
## 📋 Table of Contents
- [Quick Start](#-quick-start)
- [Hierarchical RAG Architecture](#-hierarchical-rag-architecture)
- [Documentation](#-documentation)
- [Installation](#-installation)
- [Usage](#-usage)
- [Configuration](#-configuration)
- [Supported Languages](#-supported-languages)
- [Architecture](#-architecture)
- [Report Format](#-report-format)
- [Contribution](#-contribution)
- [License](#-license)
## 🚀 Quick Start
```bash
# 1. Clone the project
git clone https://github.com/Vistaminc/AuditLuma.git
cd AuditLuma
# 2. Install dependencies
pip install -r requirements.txt
# 3. Analyze using the hierarchical RAG architecture (recommended)
python main.py --architecture hierarchical --haystack-orchestrator ai -d ./your-project
# 4. View architecture information
python main.py --show-architecture-info
```
## 🏗️ Hierarchical RAG Architecture
AuditLuma 2.0 introduces an innovative four-layer RAG architecture, significantly improving analysis accuracy and efficiency:
```
┌─────────────────────────────────────────────────────────────┐
│ Hierarchical RAG Architecture │
├─────────────────────────────────────────────────────────────┤
│ First Layer: Haystack Orchestration Layer │
│ ├─ Haystack-AI Orchestrator (Recommended) - Intelligent Task Decomposition and Result Integration │
│ └─ Traditional Orchestrator - Rule-Driven Stable Solution │
├─────────────────────────────────────────────────────────────┤
│ Second Layer: txtai Knowledge Retrieval Layer │
│ ├─ Semantic Retrieval and Similarity Matching │
│ └─ Context Understanding and Knowledge Graph Construction │
├─────────────────────────────────────────────────────────────┤
│ Third Layer: R2R Context Enhancement Layer │
│ ├─ Dynamic Context Expansion │
│ └─ Correlation Analysis and Dependency Tracking │
├─────────────────────────────────────────────────────────────┤
│ Fourth Layer: Self-RAG Validation Layer │
│ ├─ Multi-Model Cross-Validation │
│ └─ False Positive Filtering and Confidence Assessment │
└─────────────────────────────────────────────────────────────┘
```
### Architectural Advantages
- **🎯 Improved Accuracy** - Four-layer validation mechanism, significantly reduces false positives
- **⚡ Performance Optimization** - Intelligent caching and parallel processing, improves analysis speed
- **🔄 Adaptive** - Automatically selects the optimal configuration based on project size
- **🛡️ Reliability** - Multiple fallback mechanisms to ensure stable system operation
## 📚 Documentation
### 🚀 Getting Started
- [Installation Guide](./docs/installation-guide.md) - Detailed installation steps and environment configuration
- [User Guide](./docs/user-guide.md) - Complete usage tutorial from beginner to expert
- [Quick Reference](./docs/quick-reference.md) - Quick reference manual for common commands and configurations
### 🏗️ Core Documentation
- [Hierarchical RAG Architecture Guide](./docs/hierarchical-rag-guide.md) - Detailed explanation and usage guide for the hierarchical RAG architecture
- [Configuration Reference](./docs/configuration-reference.md) - Complete configuration options and parameter descriptions
- [Best Practices](./docs/best-practices.md) - Usage suggestions, performance optimization, and security configuration
### 🔧 Technical Documentation
- [Architecture Design](./docs/architecture-design.md) - System architecture and design philosophy
- [Troubleshooting Guide](./docs/troubleshooting.md) - Common issues, error diagnosis, and solutions
- [Project Structure](./项目结构.md) - Detailed project directory structure and module descriptions
### 📖 Online Resources
- [AuditLuma Related Documentation](https://iwt6omodfh0.feishu.cn/drive/folder/OwWqf7EYblaqTNdaDbtcnQcHnTt) - Complete online documentation and tutorials
## 🚀 Installation
Clone the repository and install dependencies:
```bash
git clone https://github.com/Vistaminc/AuditLuma.git
cd AuditLuma
pip install -r requirements.txt
```
### Optional Dependencies
**FAISS Vector Retrieval Library**
By default, AuditLuma uses a simple built-in vector storage implementation. If you need to process large codebases, it is recommended to install FAISS to improve performance:
```bash
# CPU version
pip install faiss-cpu
# GPU version (supports CUDA)
pip install faiss-gpu
```
After installing FAISS, the system will automatically detect and use it for vector storage and retrieval, significantly improving performance when analyzing large projects.
## 🛠 Usage
### Basic Usage
```bash
# Use hierarchical RAG architecture (recommended)
python main.py --architecture hierarchical -d ./your-project -o ./reports
# Use Haystack-AI orchestrator (default, recommended)
python main.py --architecture hierarchical --haystack-orchestrator ai -d ./your-project
# Use traditional orchestrator
python main.py --architecture hierarchical --haystack-orchestrator traditional -d ./your-project
# Automatically select architecture (based on project size)
python main.py --architecture auto -d ./your-project
# Traditional RAG architecture (backward compatible)
python main.py --architecture traditional -d ./your-project
```
### Advanced Usage
```bash
# Enable performance comparison mode
python main.py --architecture hierarchical --enable-performance-comparison -d ./your-project
# View architecture information and configuration
python main.py --show-architecture-info
# Configuration migration (upgrade from traditional configuration to hierarchical RAG)
python main.py --config-migrate
# AI-enhanced cross-file analysis
python main.py --architecture hierarchical --enhanced-analysis -d ./your-project
```
### Command Line Arguments
#### Basic Parameters
| Parameter | Description | Default Value |
|------|------|--------|
| `-d, --directory` | Target project directory | `./goalfile` |
| `-o, --output` | Report output directory | `./reports` |
| `-w, --workers` | Number of parallel worker threads | `max_batch_size` in configuration |
| `-f, --format` | Report format (html/pdf/json) | `report_format` in configuration |
#### Architecture Selection Parameters
| Parameter | Description | Default Value |
|------|------|--------|
| `--architecture` | RAG architecture mode (traditional/hierarchical/auto) | `auto` |
| `--haystack-orchestrator` | Haystack orchestrator type (traditional/ai) | `ai` |
| `--force-traditional` | Force use of traditional RAG architecture | - |
| `--force-hierarchical` | Force use of hierarchical RAG architecture | - |
| `--enable-performance-comparison` | Enable performance comparison mode | - |
| `--auto-switch-threshold` | File count threshold for automatic architecture switching | `100` |
#### Hierarchical RAG Specific Parameters
| Parameter | Description | Default Value |
|------|------|--------|
| `--enable-txtai` | Enable txtai knowledge retrieval layer | - |
| `--enable-r2r` | Enable R2R context enhancement layer | - |
| `--enable-self-rag-validation` | Enable Self-RAG validation layer | - |
| `--disable-caching` | Disable hierarchical caching system | - |
| `--disable-monitoring` | Disable performance monitoring | - |
#### Traditional Feature Parameters
| Parameter | Description | Default Value |
|------|------|--------|
| `--no-mcp` | Disable multi-agent collaboration protocol | Enabled by default |
| `--no-self-rag` | Disable Self-RAG retrieval | Enabled by default |
| `--no-deps` | Skip dependency analysis | Not skipped by default |
| `--no-remediation` | Skip generating remediation suggestions | Not skipped by default |
| `--no-cross-file` | Disable cross-file vulnerability detection | Enabled by default |
| `--enhanced-analysis` | Enable AI-enhanced cross-file analysis | Disabled by default |
#### Other Parameters
| Parameter | Description | Default Value |
|------|------|--------|
| `--verbose` | Enable detailed logging | Disabled by default |
| `--dry-run` | Dry run mode (does not perform actual analysis) | - |
| `--config-migrate` | Migrate configuration to hierarchical RAG format | - |
| `--show-architecture-info` | Show current architecture information and exit | - |
## ⚙️ Configuration
Configure the system by editing the `config/config.yaml` file. AuditLuma 2.0 supports hierarchical RAG architecture configuration.
### Hierarchical RAG Configuration
```yaml
# Hierarchical RAG architecture model configuration
hierarchical_rag_models:
# Whether to enable the hierarchical RAG architecture
enabled: true
# Haystack orchestration layer configuration
haystack:
# Orchestrator type selection: traditional or ai (Haystack-AI, recommended)
orchestrator_type: "ai" # Use Haystack-AI orchestrator by default
# Default model (supports model@provider format)
default_model: "qwen3:32b@ollama"
# Task-specific model configuration
task_models:
security_scan: "gpt-4@openai" # Security scanning uses a stronger model
syntax_check: "deepseek-chat@deepseek" # Syntax check
logic_analysis: "qwen-turbo@qwen" # Logic analysis
dependency_analysis: "gpt-3.5-turbo@openai" # Dependency analysis
# txtai knowledge retrieval layer model configuration
txtai:
retrieval_model: "gpt-3.5-turbo@openai" # Knowledge retrieval model
embedding_model: "text-embedding-ada-002@openai" # Embedding model
# R2R context enhancement layer model configuration
r2r:
context_model: "gpt-3.5-turbo@openai" # Context analysis model
enhancement_model: "gpt-3.5-turbo@openai" # Enhancement model
# Self-RAG validation layer model configuration
self_rag_validation:
validation_model: "gpt-3.5-turbo@openai" # Main validation model
cross_validation_models: # Multiple models used for cross-validation
- "gpt-4@openai"
- "deepseek-chat@deepseek"
- "gpt-3.5-turbo@openai"
```
### Model Specification Format
AuditLuma supports using a unified model specification format `model@provider` to specify the model and provider:
```
deepseek-chat@deepseek # Specifies the use of the deepseek-chat model from the DeepSeek provider
gpt-4-turbo@openai # Specifies the use of the gpt-4-turbo model from the OpenAI provider
qwen-turbo@qwen # Specifies the use of the qwen-turbo model from the Tongyi Qianwen provider
```
If the provider is not specified (no @ symbol is used), the system will automatically infer the provider based on the model name.
### Architecture Selection Configuration
```yaml
# Global settings
global:
# Default architecture mode: traditional, hierarchical, auto
default_architecture: "hierarchical"
# Automatic switching threshold (number of files)
auto_switch_threshold: 100
# Whether to enable performance comparison
enable_performance_comparison: false
```
### Multi-Vendor Support
AuditLuma supports multiple LLM vendors and can automatically detect the vendor based on the model name:
| Model Prefix | Vendor |
|---------|------|
| `gpt-` | OpenAI |
| `deepseek-` | DeepSeek |
| `qwen-` | Tongyi Qianwen |
| `glm-` or `chatglm` | Zhipu AI |
| `baichuan` | Baichuan |
| `ollama-` | ollama |
-Note: The openai vendor can connect to all openai format transfer platforms
## 💻 Supported Languages
AuditLuma supports analyzing the following programming languages:
### Major Languages (Including Top 10)
- Python (.py)
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Java (.java)
- C# (.cs)
- C++ (.cpp, .cc, .hpp)
- C (.c, .h)
- Go (.go)
- Ruby (.rb)
- PHP (.php)
- Lua (.lua)
### Other Supported Languages
- Rust (.rs)
- Swift (.swift)
- Kotlin (.kt)
- Scala (.scala)
- Dart (.dart)
- Bash (.sh, .bash)
- PowerShell (.ps1, .psm1)
### Markup and Configuration Languages
- HTML (.html, .htm)
- CSS (.css)
- JSON (.json)
- XML (.xml)
- YAML (.yml, .yaml)
- SQL (.sql)
## 🏛 Architecture
AuditLuma uses a multi-agent architecture, including the following components:

1. **Agent Orchestrator** - Coordinates all agents in the workflow
2. **Code Analysis Agent** - Analyzes code structure and extracts dependencies
3. **Security Analysis Agent** - Identifies security vulnerabilities
4. **Remediation Suggestion Agent** - Generates targeted vulnerability remediation solutions
5. **Visualization Component** - Generates intuitive reports and dependency graphs
## 📊 Report Formats
AuditLuma supports the following report formats:
- 📋 **HTML Report** - Includes vulnerability details, statistical charts, and interactive visualizations
- 📄 **PDF Report** - Suitable for printing and sharing
- 🔄 **JSON Report** - Machine-readable format suitable for further processing and integration
## 💬 Contributing
Contributions and suggestions are welcome! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Create a Pull Request
## 📞 Contact
-QQ:1047736593
## 🤝 Partners
- [棉花糖网络安全圈](https://vip.bdziyi.com/?ref=711)
## Support and Appreciation
If you find AuditLuma helpful, you are welcome to support us in the following ways:
- Your sponsorship will be used to help us continuously improve AuditLuma!
<div style="display: flex; justify-content: space-between; max-width: 600px; margin: 0 auto;">
<div style="flex: 1; margin-right: 20px;">
<img src="https://github.com/Vistaminc/Miniluma/blob/main/ui/web/static/img/zanshang/wechat.jpg"/>
</div>
<div style="flex: 1;">
<img src="https://github.com/Vistaminc/Miniluma/blob/main/ui/web/static/img/zanshang/zfb.jpg"/>
</div>
</div>
## Star History
[](https://www.star-history.com/#)
## 📜 License
MIT
---
<div align="center">
<sub>Built with ❤️ by AuditLuma Team</sub>
</div>
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
git
A Model Context Protocol server for Git automation and interaction.