Content
<div align="center">
# NagaAgent
**AI Desktop Assistant with Four-Service Collaboration — Streaming Tool Calling · Knowledge Graph Memory · Live2D · Voice Interaction**
[简体中文](README.md) | [English](README_en.md)




[](https://github.com/Xxiii8322766509/NagaAgent)
[](https://github.com/Xxiii8322766509/NagaAgent)
[](https://github.com/Xxiii8322766509/NagaAgent/issues)
**[QQ Robot Integration: Undefined QQbot](https://github.com/69gg/Undefined/)**
</div>
---
**Dual License** · Open source under [AGPL-3.0](LICENSE), closed source under [Proprietary License](CLOSED-SOURCE.LICENSE) (requires written authorization). Business cooperation: contact@nagaagent.com / bilibili【柏斯阔落】
---
## Overview
NagaAgent consists of four independent microservices:
| Service | Port | Responsibilities |
|------|------|------|
| **API Server** | 8000 | Dialogue, streaming tool calling, document uploading, authentication proxy, memory API, configuration management |
| **Agent Server** | 8001 | Background intent analysis, OpenClaw integration, task scheduling and compression memory |
| **MCP Server** | 8003 | MCP tool registration/discovery/parallel scheduling |
| **Voice Service** | 5048 | TTS (Edge-TTS) + ASR (FunASR) + real-time voice (Qwen Omni) |
`main.py` orchestrates the startup, with all services running as daemon threads. The frontend can be either Electron + Vue 3 desktop or PyQt5 native GUI.
---
## Changelog
| Date | Content |
|------|------|
| **2026-02-14** | 5.0.0 Release: Remote Memory Microservice (NagaMemory Cloud + Local GRAG Fallback), Consciousness Sea 3D Rewrite, Startup Title Animation and Particle Effects, Progress Bar Stagnation Detection and Health Polling, Version Update Check Pop-up, User Agreement |
| **2026-02-14** | Captcha verification integration, registration process (username + email + verification code), CAS session invalidation pop-up, voice input button, file parsing button, IME Chinese input method Enter key misfire fix |
| **2026-02-14** | Removed ChromaDB local dependency (-1119 lines), game guide fully cloud-based, guide function added login state gate |
| **2026-02-13** | Floating ball mode (4 state animations: classic / ball / compact / full), screenshot multimodal visual model automatic switching |
| **2026-02-13** | Skill workshop refactoring + Live2D expression channel independent + naga-config skill |
| **2026-02-12** | NagaCAS authentication + NagaModel gateway routing + login pop-up + user menu |
| **2026-02-12** | Live2D 4-channel orthogonal animation architecture (body/action/expression/tracking), window-level visual tracking and calibration |
| **2026-02-12** | Agentic Tool Loop: streaming tool extraction + multi-turn automatic execution + parallel MCP/OpenClaw/Live2D scheduling |
| **2026-02-12** | Arknights-style startup interface + progress tracking + view preloading + mouse parallax floating effect |
| **2026-02-12** | Game guide MCP connected (automatic screenshot + visual model + Neo4j import + 6 game RAG processors) |
| **2026-02-11** | Embedded OpenClaw packaging, automatic generation of configuration files from templates at startup |
| **2026-02-10** | Backend packaging optimization, skill workshop MCP status fix, frontend bug fix |
| **2026-02-09** | Frontend refactoring, Live2D disabled eye tracking, OpenClaw renamed to AgentServer |
---
## Core Modules
### Streaming Tool Calling Loop
NagaAgent's tool calling does not rely on the OpenAI Function Calling API. Instead, it allows the LLM to embed JSON descriptions of tool calls within ` ```tool``` ` code blocks in the text output. This means that **any OpenAI-compatible LLM provider can use it directly** without the model itself supporting function calling.
**Single-round process**:
```
LLM Streaming Output ──SSE──▶ Frontend Real-time Text Display
│ │
▼ ▼
Complete Text Concatenation TTS Sentence Playback
│
▼
parse_tool_calls_from_text()
├─ Phase 1: Extract JSON within ```tool``` code block
└─ Phase 2: Fallback extract raw JSON (backward compatibility)
│
▼
Categorize by agentType
├─ "mcp" → MCPManager.unified_call() (in-process)
├─ "openclaw" → HTTP POST → Agent Server /openclaw/send
└─ "live2d" → asyncio.create_task() → UI Notification
│
▼
asyncio.gather() Parallel Execution
│
▼
Tool results injected into messages, entering the next round of LLM calls
```
**Implementation Details**:
- **Text Parsing**: Regular expression `r"```tool\s*\n([\s\S]*?)(?:```|\Z)"` extracts code blocks, `json5` fault-tolerant parsing (fallback `json`), automatic normalization of full-width characters (`{}:`)
- **Loop Control**: Maximum 5 rounds (`max_loop_stream` configurable), terminates if no `agentType` JSON is output by LLM in each round
- **SSE Encoding**: Each chunk is `data: base64(json({"type":"content"|"reasoning","text":"..."}))\n\n`, frontend `ReadableStream` + `TextDecoder` real-time splitting
- **Tool Result Reinjection**: Formatted as `[Tool Result 1/N - service: tool (status)]` appended to messages
Source code: [`apiserver/agentic_tool_loop.py`](apiserver/agentic_tool_loop.py)、[`apiserver/streaming_tool_extractor.py`](apiserver/streaming_tool_extractor.py)
---
### GRAG Knowledge Graph Memory
GRAG (Graph-RAG) automatically extracts quintuples `(subject, subject type, predicate, object, object type)` from conversations and stores them in Neo4j, automatically retrieving relevant memories as LLM context during conversations.
**Extraction Process**:
1. **Structured Extraction** (Priority): Call `beta.chat.completions.parse()` + Pydantic model `QuintupleResponse`, `temperature=0.3`, retry up to 3 times
2. **JSON Fallback**: Prompt requires LLM to return a JSON array, if parsing fails, take the content between the first `[` and the last `]`
3. **Filtering Rules**: Only extract factual information (behavior, entity relationships, states, preferences), filter metaphors, assumptions, pure emotions, and chitchat
4. **Entity Types**: person / location / organization / item / concept / time / event / activity
**Task Manager**:
- 3 asyncio worker coroutines consume `asyncio.Queue(maxsize=100)`
- SHA-256 deduplication: PENDING/RUNNING tasks with the same text are automatically skipped
- Automatically cleans up completed tasks older than 24h every hour
- Configurable timeout (default 12 seconds) and retry count (default 2 times)
**Dual Storage**:
- Local file `logs/knowledge_graph/quintuples.json` (JSON array, set deduplication)
- Neo4j graph database: `Entity` node + typed `Relationship` relationship edge, `graph.merge()` upsert
**RAG Retrieval**:
1. Extract keywords from user questions (LLM generated)
2. Cypher query: `MATCH (e1:Entity)-[r]->(e2:Entity) WHERE e1.name CONTAINS '{kw}' ... LIMIT 5`
3. Format as `subject(type) —[predicate]→ object(type)` injected into LLM context
**Remote Memory** (5.0.0 New):
- `summer_memory/memory_client.py` connects to NagaMemory cloud service
- Logged-in users automatically use cloud storage, automatically fallback to local GRAG when logging out or offline
- API Server adds `/api/memory/*` proxy endpoints, frontend relays through API Server
Source code: [`summer_memory/`](summer_memory/)
---
### MCP Tool System
A pluggable tool architecture based on [Model Context Protocol](https://modelcontextprotocol.io/), with each tool running as an independent Agent.
**Built-in Agents**:
| Agent | Directory | Function |
|-------|------|------|
| `weather_time` | `mcpserver/agent_weather_time/` | Weather query/forecast, system time, automatic city/IP detection |
| `open_launcher` | `mcpserver/agent_open_launcher/` | Scan installed applications on the system, natural language program launch |
| `game_guide` | `mcpserver/agent_game_guide/` | Game strategy Q&A, damage calculation, team recommendation, automatic screenshot injection |
| `online_search` | `mcpserver/agent_online_search/` | Web search based on SearXNG |
| `crawl4ai` | `mcpserver/agent_crawl4ai/` | Web page content extraction based on Crawl4AI |
| `playwright_master` | `mcpserver/agent_playwright_master/` | Browser automation based on Playwright |
| `vision` | `mcpserver/agent_vision/` | Screenshot analysis and visual Q&A |
| `mqtt_tool` | `mcpserver/agent_mqtt_tool/` | MQTT protocol IoT device control |
| `office_doc` | `mcpserver/agent_office_doc/` | docx/xlsx content extraction |
**Registration and Discovery**:
```
mcpserver/
├── agent_weather_time/
│ ├── agent-manifest.json ← Declares name, entryPoint.module/class, capabilities
│ └── weather_time_agent.py
├── agent_online_search/
│ ├── agent-manifest.json
│ └── ...
└── mcp_registry.py ← scan_and_register_mcp_agents() glob scans **/agent-manifest.json
importlib.import_module(module).ClassName() dynamically instantiates
```
- `MCPManager.unified_call(service_name, tool_call)` routes to the corresponding Agent's `handle_handoff()`
- MCP Server `POST /schedule` supports batch calls, `asyncio.gather()` parallel execution
- **Skill Market**: The frontend skill workshop supports one-click installation of community Skills (Agent Browser, Brainstorming, Context7, Firecrawl Search, etc.), backend `GET /openclaw/market/items` + `POST /openclaw/market/items/{id}/install`
Source code: [`mcpserver/`](mcpserver/)
---
### Electron Desktop
Desktop client based on Electron + Vue 3 + Vite + UnoCSS + PrimeVue.
#### Live2D Rendering and Animation
Render Cubism Live2D models using **pixi-live2d-display** + **PixiJS WebGL**. SSAA supersampling anti-aliasing: Canvas is rendered at `width * ssaa`, CSS `transform: scale(1/ssaa)` scaling.
**4-Channel Orthogonal Animation System** (`live2dController.ts`):
| Channel | Description | Parameters |
|------|------|------|
| **State** | Keyframe loop animation (idle/thinking/talking), hermite smooth interpolation | Loaded from `naga-actions.json` |
| **Action** | Queued head actions (nodding/shaking), FIFO single execution | AngleX/Y, EyeBallX/Y |
| **Emotion** | `.exp3.json` expression files, three blending modes (Add/Multiply/Overwrite) | Exponential decay transition |
| **Tracking** | Mouse pointer follows gaze, configurable delay start (`tracking_hold_delay_ms`) | Angle ±30, EyeBall ±1, BodyAngle ±10 |
Merge order: State → Lip Sync → Action → Manual Override → Expression Blend → Tracking Blend.
#### Consciousness Sea Visualization (MindView)
Canvas 2D + handwritten 3D projection (non-WebGL/SVG), spherical coordinate camera `(theta, phi, distance)`, perspective division `700 / depth`.
**7 Layers Rendering**: Background Gradient → Ground Grid → Water Surface → Volumetric Light (3 Light Beams) → Particle System (3 Layers 125 Particles) → Bioluminescent Plankton (10 with Trails) → Knowledge Graph Nodes and Edges (Depth-Sorted Painter's Algorithm).
Quintuple to graph mapping: `subject`/`object` → node, `predicate` → directed edge, degree centrality → node height weight (high-weight nodes float up), 100 node limit.
Interaction: Click and drag to rotate, middle click/Shift+drag to pan, scroll to zoom, node drag/click, keyword search filter, touch gestures.
#### Floating Ball Mode
4-state animated window system: `classic` (normal) → `ball` (100×100 sphere) → `compact` (420×100 collapsed) → `full` (420×N expanded).
easeOutCubic easing (`1 - (1 - t)^3`), 160ms / 60FPS transition. Smart positioning: expands to the right from the ball position, automatically fits the screen boundaries.
#### Startup Animation
1. **Title Stage**: Black mask + 40 golden rising particles + title image 2.4s CSS keyframe (fade in → stay → fade out)
2. **Progress Stage**: Neural Network particle background + Live2D cutout frame + golden progress bar (`requestAnimationFrame` interpolation, minimum speed 0.5 fallback)
3. **Stagnation Detection**: Display restart prompt if no progress change for 3 seconds, poll backend `/health` every second after 25% to prevent signal loss
4. **Wake Up**: Display "Click to Wake Up" pulse prompt after 100% progress
Source code: [`frontend/`](frontend/)
---
### Voice Interaction
**TTS (Text-to-Speech)**:
- Edge-TTS engine, OpenAI compatible interface `/v1/audio/speech`
- 3-thread pipeline: sentence queue → TTS API call (Semaphore(2) concurrency control) → pygame playback
- Live2D lip sync: `AdvancedLipSyncEngineV2` 60FPS extracts 5 parameters (mouth_open / mouth_form / mouth_smile / eye_brow_up / eye_wide)
- Supports mp3 / aac / wav / opus / flac formats, FFmpeg optional transcoding
**ASR (Automatic Speech Recognition)**:
- FunASR local server, supports VAD endpoint detection and WebSocket real-time stream
- Three-mode automatic switching: LOCAL (FunASR) → END_TO_END (Qwen Omni) → HYBRID (Qwen ASR + API Server)
**Real-time Voice Conversation** (requires DashScope API Key):
- Full-duplex WebSocket voice interaction based on Qwen Omni
- Echo cancellation, VAD detection, audio chunking (200ms), session cooldown, maximum voice duration control
Source code: [`voice/`](voice/)
---
### Agent Server and Task Scheduling
**Background Intent Analyzer** (`BackgroundAnalyzer`):
- Based on LangChain `ChatOpenAI`, `temperature=0`, extracts executable tool calls from conversations
- Deduplicates per session (to prevent concurrent analysis in the same session), 60-second timeout
- Extracted tool calls are distributed to MCP / OpenClaw / Live2D based on `agentType`
**OpenClaw Integration**:
- Connects to OpenClaw Gateway (port 18789), schedules AI programming assistant to perform computer tasks via natural language
- Three-level fallback: packaged embedded → global `openclaw` command → automatic `npm install -g openclaw`
- `POST /openclaw/send` sends instructions, waits up to 120 seconds
**Task Scheduler** (`TaskScheduler`):
- Records task steps (purpose/content/output/analysis/success)
- Automatically extracts key facts and "key findings"/"important" markers
- Memory compression: when the number of steps exceeds the threshold, LLM is called to generate `CompressedMemory` (key_findings / failed_attempts / current_status / next_steps), only the most recent N steps are retained
- `schedule_parallel_execution()` executes a list of tasks in parallel via `asyncio.gather()`
Source code: [`agentserver/`](agentserver/)
---
## Architecture
```
┌──────────────────────────────────────────────────────────┐
│ Electron / PyQt5 Frontend │
│ Vue 3 + Vite + UnoCSS + PrimeVue + pixi-live2d-display │
└────────────┬────────────┬────────────┬───────────────────┘
│ │ │
┌───────▼──────┐ ┌──▼──────┐ ┌──▼──────┐
│ API Server │ │ Agent │ │ Voice │
│ :8000 │ │ Server │ │ Service │
│ │ │ :8001 │ │ :5048 │
│ - Dialogue/SSE│ │ │ │ │
│ - Tool Calls │ │ - Intent│ │ - TTS │
│ - File Upload│ │ Analysis│ │ - ASR │
│ - Auth Proxy │ │ - Task │ │ - Real- │
│ - Memory API │ │ Schedule│ │ time │
│ - Skill Store│ │ - Open │ │ Voice │
│ - Config │ │ Claw │ │ │
└──────┬───────┘ └────┬────┘ └─────────┘
│ │
┌──────▼──────┐ ┌───▼──────────┐
│ MCP Server │ │ OpenClaw │
│ :8003 │ │ Gateway │
│ │ │ :18789 │
│ - Tool Reg │ └──────────────┘
│ - Agent Disc│
│ - Parallel │
└──────┬──────┘
│
┌───────┴──────────────────────┐
│ MCP Agents (Pluggable Tools)│
│ Weather | Search | Crawl | Vision│
│ Launcher | Guide | Docs | MQTT │
└──────────────────────────────┘
│
┌──────▼──────┐
│ Neo4j │
│ :7687 │
│ Knowledge │
│ Graph │
└─────────────┘
```
### Directory Structure
```
NagaAgent/
├── apiserver/ # API Server — Dialogue, Streaming Tool Calls, Authentication, Configuration Management
│ ├── api_server.py # FastAPI Main Application
│ ├── agentic_tool_loop.py # Multi-turn Tool Call Loop
│ ├── llm_service.py # LiteLLM Unified LLM Call
│ └── streaming_tool_extractor.py # Streaming Sentence Splitting + TTS Distribution
├── agentserver/ # Agent Server — Intent Analysis, Task Scheduling, OpenClaw
│ ├── agent_server.py # FastAPI Main Application
│ └── task_scheduler.py # Task Orchestration + Compressed Memory
├── mcpserver/ # MCP Server — Tool Registration and Scheduling
│ ├── mcp_server.py # FastAPI Main Application
│ ├── mcp_registry.py # Manifest Scanning + Dynamic Registration
│ ├── mcp_manager.py # unified_call() Routing
│ ├── agent_weather_time/
│ ├── agent_open_launcher/
│ ├── agent_game_guide/
│ ├── agent_online_search/
│ ├── agent_crawl4ai/
│ ├── agent_playwright_master/
│ ├── agent_vision/
│ ├── agent_mqtt_tool/
│ └── agent_office_doc/
├── summer_memory/ # GRAG Knowledge Graph
│ ├── quintuple_extractor.py # Quintuple Extraction (Structured Output + JSON Fallback)
│ ├── quintuple_graph.py # Neo4j + File Dual Storage
│ ├── quintuple_rag_query.py # Cypher Keyword RAG Retrieval
│ ├── task_manager.py # 3 worker Asynchronous Task Manager
│ ├── memory_manager.py # GRAG Total Manager
│ └── memory_client.py # NagaMemory Remote Client
├── voice/ # Voice Service
│ ├── output/ # TTS (Edge-TTS) + Lip Sync
│ └── input/ # ASR (FunASR) + Real-time Voice (Qwen Omni)
├── guide_engine/ # Game Guide Engine — Cloud RAG Service
├── frontend/ # Electron + Vue 3 Frontend
│ ├── electron/ # Main Process (Window Management, Floating Ball, Backend Management, Hotkeys)
│ └── src/ # Vue 3 Application
│ ├── views/ # MessageView / MindView / SkillView / ModelView / MemoryView / ConfigView
│ ├── components/ # Live2dModel / SplashScreen / LoginDialog / ...
│ ├── composables/ # useAuth / useStartupProgress / useVersionCheck / useToolStatus
│ └── utils/ # live2dController (4-Channel Animation) / encoding / session
├── ui/ # PyQt5 GUI (MVC)
├── system/ # Configuration Loading, Environment Detection, System Prompts, Background Analyzer
├── main.py # Unified Entry Point, Orchestrates All Services
├── config.json # Runtime Configuration (Copy from config.json.example)
└── pyproject.toml # Project Metadata and Dependencies
```
---
## Quick Start
### Environment Requirements
- Python 3.11 (`>=3.11, <3.12`)
- Optional: [uv](https://github.com/astral-sh/uv) (accelerates dependency installation)
- Optional: Neo4j (knowledge graph memory)
### Installation
```bash
git clone https://github.com/Xxiii8322766509/NagaAgent.git
cd NagaAgent
# Method 1: setup script (automatically detects environment, creates virtual environment, installs dependencies)
python setup.py
# Method 2: uv
uv sync
# Method 3: Manual
python -m venv .venv
source .venv/bin/activate # Windows: .\.venv\Scripts\activate
pip install -r requirements.txt
```
### Configuration
Copy `config.json.example` to `config.json`, fill in LLM API information:
```json
{
"api": {
"api_key": "your-api-key",
"base_url": "https://api.deepseek.com/v1",
"model": "deepseek-v3.2"
}
}
```
Supports all OpenAI compatible APIs (DeepSeek, Tongyi Qianwen, OpenAI, Ollama, etc.).
### Startup
```bash
python main.py # Full startup (API + Agent + MCP + Voice + GUI)
uv run main.py # Using uv
python main.py --headless # GUI-less mode (with Electron frontend)
```
All services are orchestrated by `main.py`, or can be started individually:
```bash
uvicorn apiserver.api_server:app --host 127.0.0.1 --port 8000 --reload
uvicorn agentserver.agent_server:app --host 0.0.0.0 --port 8001
```
### Electron Frontend Development
```bash
cd frontend
npm install
npm run dev # Development mode (Vite + Electron)
npm run build # Build production package
```
---
## Optional Configuration
<details>
<summary><b>Knowledge Graph Memory (Neo4j)</b></summary>
Install Neo4j ([Docker](https://hub.docker.com/_/neo4j) or [Neo4j Desktop](https://neo4j.com/download/)), then configure:
```json
{
"grag": {
"enabled": true,
"neo4j_uri": "neo4j://127.0.0.1:7687",
"neo4j_user": "neo4j",
"neo4j_password": "your-password"
}
}
```
</details>
<details>
<summary><b>Voice Interaction</b></summary>
```json
{
"system": { "voice_enabled": true },
"tts": { "port": 5048, "default_voice": "zh-CN-XiaoxiaoNeural" }
}
```
Real-time voice conversation (requires Tongyi Qianwen DashScope API Key):
```json
{
"voice_realtime": {
"enabled": true,
"provider": "qwen",
"api_key": "your-dashscope-key",
"model": "qwen3-omni-flash-realtime"
}
}
```
</details>
<details>
<summary><b>Live2D Virtual Avatar</b></summary>
```json
{
"live2d": {
"enabled": true,
"model_path": "path/to/your/model.model3.json"
}
}
```
Electron Frontend Live2D Configuration:
```json
{
"web_live2d": {
"ssaa": 2,
"model": {
"source": "./models/your-model/model.model3.json",
"x": 0.5,
"y": 1.3,
"size": 6800
},
"face_y_ratio": 0.13,
"tracking_hold_delay_ms": 100
}
}
```
</details>
<details>
<summary><b>MQTT IoT</b></summary>
```json
{
"mqtt": {
"enabled": true,
"broker": "mqtt-broker-address",
"port": 1883,
"topic": "naga/agent/topic"
}
}
```
</details>
---
## Port Overview
| Service | Port | Description |
|------|------|------|
| API Server | 8000 | Main Interface: Dialogue, Configuration, Authentication, Skill Store |
| Agent Server | 8001 | Intent Analysis, Task Scheduling, OpenClaw |
| MCP Server | 8003 | MCP Tool Registration and Scheduling |
| Voice Service | 5048 | TTS / ASR |
| Neo4j | 7687 | Knowledge Graph (Optional) |
| OpenClaw Gateway | 18789 | AI Programming Assistant (Optional) |
---
## Update
```bash
python update.py # Automatic git pull + dependency synchronization
```
---
## Troubleshooting
| Problem | Solution |
|------|----------|
| Python version incompatibility | Use Python 3.11; or use uv (automatically manages Python versions) |
| Port is occupied | Check if 8000, 8001, 8003, 5048 are available |
| Neo4j connection failed | Confirm that the Neo4j service is started, check the connection parameters in config.json |
| Startup stuck on progress bar | Check if the API Key is configured correctly; a restart prompt appears after 3 seconds; Electron automatically polls the backend health status |
```bash
python main.py --check-env --force-check # Environment diagnosis
python main.py --quick-check # Quick check
```
---
## Build
```bash
python build.py # Build Windows one-click run integration package, output to dist/
```
---
## Contribution
Welcome to submit Issues and Pull Requests. If you have any questions, you can join the QQ channel nagaagent1.
---
<<<<<<< HEAD
=======
---
>>>>>>> ab668726c1af0745149115e4e3057fc42cbf363f
## Star History
[](https://www.star-history.com/#RTGS2017/NagaAgent&type=date&legend=top-left)
Connection Info
You Might Also Like
awesome-mcp-servers
A collection of MCP servers.
git
A Model Context Protocol server for Git automation and interaction.
Appwrite
Build like a team of hundreds
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)