Content
<h1 align="center">Xiaozhi ESP32 Server Java</h1>
<p align="center">
Based on <a href="https://github.com/78/xiaozhi-esp32">Xiaozhi ESP32</a> project developed Java version server, including complete front and back end management platform<br/>
Provide powerful back-end support and intuitive management interface for smart hardware devices
</p>
<p align="center">
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/issues">Feedback Issues</a>
· <a href="#deployment">Deployment Documentation</a>
· <a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/blob/main/CHANGELOG.md">Changelog</a>
</p>
<p align="center">
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/graphs/contributors">
<img alt="GitHub Contributors" src="https://img.shields.io/github/contributors/joey-zhou/xiaozhi-esp32-server-java?logo=github" />
</a>
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/issues">
<img alt="Issues" src="https://img.shields.io/github/issues/joey-zhou/xiaozhi-esp32-server-java?color=0088ff" />
</a>
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/pulls">
<img alt="GitHub pull requests" src="https://img.shields.io/github/issues-pr/joey-zhou/xiaozhi-esp32-server-java?color=0088ff" />
</a>
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/badge/license-MIT-white?labelColor=black" />
</a>
<a href="https://github.com/joey-zhou/xiaozhi-esp32-server-java">
<img alt="stars" src="https://img.shields.io/github/stars/joey-zhou/xiaozhi-esp32-server-java?color=ffcb47&labelColor=black" />
</a>
</p>
<p align="center">
<b>If this project helps you, consider giving it a ⭐ Star!</b><br/>
Your support is the driving force for us to continue improving!
</p>
---
## Project Introduction
Xiaozhi ESP32 Server Java is a Java version server developed based on the [Xiaozhi ESP32](https://github.com/78/xiaozhi-esp32) project, including a complete front and back end management platform. The project aims to provide users with a feature-rich and easy-to-use management interface to help users better manage devices, configurations, etc.
Considering the needs of enterprise-level application scenarios, Java as a mature enterprise-level development language, has a more complete ecosystem support and more powerful concurrent processing capabilities, so we chose to develop this Java version of the server to provide more possibilities and expansion space for the project.
- **Back-end Framework**: Spring Boot + Spring MVC
- **Front-end Framework**: Vue.js + Ant Design
- **Data Storage**: MySQL + Redis
- **Global Responsiveness**: Adapt to various devices and resolutions
---
## Applicable People
If you have purchased ESP32-related hardware and want to control and manage your devices through a functional and user-friendly management platform, this project is perfect for you. Especially suitable for:
- Users who need enterprise-level stability
- Personal developers who want to quickly build and use
- Users who want a complete front-end management interface
- Users who need more powerful data management and analysis capabilities
- Users who have high requirements for system scalability
- Scenarios that require support for a large number of devices to connect concurrently
- Application scenarios with high requirements for real-time data processing
---
## Function Modules (Some content has not been open-sourced, please contact us if you have needs)
### Open-source Version
| Function Module | Status | Description |
|---------|------|------|
| **First Sentence Response** | ✅ | Wake-up word response time > 4 seconds |
| **Average Response Speed** | ✅ | Average dialogue response time > 3 seconds |
| **WebSocket Protocol** | ✅ | High-performance WebSocket communication, support for real-time device status updates and control |
| **Device Management** | ✅ | View the list of all connected devices, real-time device status monitoring, add/edit/delete device information |
| **Voice Color Selection** | ✅ | Provide multiple voice color templates, preview voice effects, and assign different voice configurations to different devices |
| **User Management** | ✅ | Support multi-user configuration, meet the needs of multiple family members |
| **Chat Records** | ✅ | View historical chat records, search chat content by date/keyword, delete messages, clear memory function |
| **Intelligent Body** | ✅ | Connect with Coze and Dify and other intelligent body platforms to achieve complex scene dialogue capabilities |
| **Role Switching** | ✅ | Preset role switching (AI teacher, boyfriend/girlfriend, smart home assistant, etc.) support voice switching roles |
| **Persistent Conversation** | ✅ | Support persistent conversation records, convenient for viewing historical dialogue content |
| **LLM Multi-platform Support** | ✅ | Support OpenAI, Zhiyuan AI, Xunfei Spark, Ollama and other large language models |
| **IoT Device Control** | ✅ | Support management of IoT devices through voice commands to achieve smart home control |
| **Multi-voice Recognition Service** | ✅ | Support Funasr, Alibaba, Tencent, Vosk and other voice recognition services |
| **Function Call** | ✅ | Support LLM function call function to achieve complex task processing and intelligent decision-making |
| **Take a Photo to Identify** | ✅ | Support image recognition and processing to achieve richer interaction methods |
| **Real-time Interruption** | ✅ | Server-side support echo cancellation real-time interruption function to improve dialogue fluency |
| **Memory Management** | ✅ | Customizable memory dialogue number, historical dialogue summary/abstract function, manual operation dialogue record |
| **Multi-language Support** | ✅ | Support multi-language interface to meet the needs of users in different regions |
### Commercial Version
| Function Module | Status | Description |
|---------|------|------|
| **First Sentence Response** | ✅ | Wake-up word response time 1 second, fast response experience |
| **Average Response Speed** | ✅ | Average dialogue response time < 2 seconds, smooth dialogue experience |
| **MQTT Protocol** | ✅ | Support MQTT communication protocol, long connection, server-side active wake-up |
| **Voice Color Cloning** | ✅ | Support volcanic engine and Alibaba Cloud voice color cloning to achieve personalized voice customization |
| **Two-way Streaming Interaction** | ✅ | Support volcanic, Alibaba, Xunfei streaming playback, real-time voice input and reply output |
| **User End** | ✅ | Friendly user-end operation interface, native card device management page |
| **MCP Access Point** | ✅ | Based on role-based MCP tool access point, expand function access |
| **MCP Service** | ✅ | SSE MCP access method, support more third-party service integration |
| **Function Call Comfort Word** | ✅ | Tool call pre-set comfort word to improve user experience |
| **Long-term Memory** | ✅ | According to user dialogue, extract key information records, intelligent memory management |
| **Knowledge Base** | ✅ | RAG retrieval knowledge base (later expansion map knowledge base), document upload, intelligent retrieval |
| **Memory Summary** | ✅ | Based on knowledge base long-term memory summary, intelligent dialogue analysis |
| **Voice Reminder and Alarm Clock** | ✅ | Server-side active wake-up device issue audio content, intelligent reminder function |
| **Multi-device Collaboration** | ✅ | AB device collaborative playback, whole house intelligent collaborative work |
| **Monitoring Panel** | ✅ | Monitoring day, week, month different dimension Token, dialogue time, device activity and other data |
| **OTA Firmware Upgrade** | ✅ | Firmware upload, automatic upgrade, remote device management |
| **Voiceprint Recognition** | ✅ | Support voiceprint recognition function to achieve personalized voice assistant |
| **Chat Data Visualization** | ✅ | Chat frequency statistical chart and other data visualization function, monitor dialogue data trend |
| **Hybrid Mode Role** | ✅ | Support multi-role hybrid mode, through different wake-up words wake up different roles (automatic switching) |
### Under Development
| Function Module | Status | Description |
|---------|------|------|
| **Home Assistant** | 🚧 | Support smart home device control, manage Home Assistant devices through voice commands |
| **Emotion Analysis** | 🚧 | Through voice emotion analysis, provide more humane reply |
| **Custom Plugin System** | 🚧 | Support custom plugin development, expand system function |
| **Remote Control** | 🚧 | Support remote control device to achieve device management when going out |
---
## UI Display
### Core Function Display
<table>
<tr>
<td width="50%">
<img src="docs/images/device.jpg" alt="Device Management" />
<p align="center"><strong>Device Management</strong> - Comprehensive management and monitoring of all connected devices</p>
</td>
<td width="50%">
<img src="docs/images/message.jpg" alt="Message Records" />
<p align="center"><strong>Message Records</strong> - View and search historical dialogue content</p>
</td>
</tr>
<tr>
<td width="50%">
<img src="docs/images/voiceClone.jpg" alt="Voice Cloning" />
<p align="center"><strong>Voice Cloning</strong> - Clone your own voice to achieve personalized voice assistant</p>
</td>
<td width="50%">
<img src="docs/images/mcpServer.jpg" alt="MCP Server Management" />
<p align="center"><strong>MCP Server</strong> - SSE MCP server management, manage MCP tools</p>
</td>
</tr>
</table>
### More Function Interfaces
<div align="center">
<table>
<tr>
<td align="center" width="16.66%">
<a href="docs/images/login.jpg">
<img src="docs/images/login.jpg" width="130" /><br/>
<sub>Login Interface</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/dashboard.jpg">
<img src="docs/images/dashboard.jpg" width="130" /><br/>
<sub>Dashboard</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/agent.jpg">
<img src="docs/images/agent.jpg" width="130" /><br/>
<sub>Intelligent Body</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/llm.jpg">
<img src="docs/images/llm.jpg" width="130" /><br/>
<sub>Model Configuration</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/ota.jpg">
<img src="docs/images/ota.jpg" width="130" /><br/>
<sub>Firmware Upgrade</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/role.jpg">
<img src="docs/images/role.jpg" width="130" /><br/>
<sub>Role Management</sub>
</a>
</td>
</tr>
<tr>
<td align="center" width="16.66%">
<a href="docs/images/mcpTools.jpg">
<img src="docs/images/mcpTools.jpg" width="130" /><br/>
<sub>MCP Tools</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/membership.jpg">
<img src="docs/images/membership.jpg" width="130" /><br/>
<sub>Member Management</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/privilege.jpg">
<img src="docs/images/privilege.jpg" width="130" /><br/>
<sub>Privilege Management</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/stt.jpg">
<img src="docs/images/stt.jpg" width="130" /><br/>
<sub>Speech Recognition</sub>
</a>
</td>
<td align="center" width="16.66%">
<a href="docs/images/tts.jpg">
<img src="docs/images/tts.jpg" width="130" /><br/>
<sub>Text-to-Speech</sub>
</a>
</td>
<td align="center" width="16.66%"></td>
</tr>
</table>
<sub>💡 Click on the thumbnail to view the large image</sub>
</div>
---
<a id="deployment"></a>
## Deployment Documentation
We provide multiple deployment methods to meet the needs of different users:
### 1. Local Source Code Operation
- [Windows Deployment Documentation](./docs/WINDOWS_DEVELOPMENT.md) - Suitable for Windows environment development and testing
- [CentOS Deployment Documentation](./docs/CENTOS_DEVELOPMENT.md) - Suitable for Linux server environment deployment
After successful operation, the console will output OTA and WebSocket connection address. According to the firmware compilation document, make the device connect to the service.
### 2. Docker Deployment
- [Docker Deployment Documentation](./docs/DOCKER.md) - Quick containerized deployment solution
### 3. Firmware Compilation
- [Firmware Compilation Documentation](./docs/FIRMWARE-BUILD.md) - Detailed firmware compilation and burning process
After burning successfully and connecting to the Internet, wake up Xiaozhi with the wake-up word, and pay attention to the information output by the back-end console.
---
## Performance Testing
We have developed a special WebSocket concurrent testing tool [Xiaozhi Concurrent](https://github.com/joey-zhou/xiaozhi-concurrent) to evaluate the performance and stability of the system. The testing tool supports simulating a large number of devices to connect at the same time, testing the complete WebSocket communication process, and generating detailed performance reports and visual charts.
> 📖 For detailed use instructions, installation steps, and parameter configuration of the testing tool, please see: [Xiaozhi Concurrent Repository](https://github.com/joey-zhou/xiaozhi-concurrent)
### Benchmark Test Results
The following test data is based on the **Tencent Cloud Server (8-core 8G, 100M bandwidth on demand)** environment, **100 devices, 100 concurrent connections, continuous 5 rounds** dialogue test:
#### Performance Indicators
| Test Item | Success Rate | Average Delay | Minimum | Maximum | Remarks |
|---------|-------|---------|-------|-------|------|
| WebSocket Connection | 100% (500/500) | 0.090s | - | - | Connection establishment time |
| Hello Handshake | 100% (500/500) | 0.073s | - | - | Handshake response time |
| Wake-up Word Response | 100% (500/500) | 0.333s | - | - | Wake-up word to audio reply |
| Speech Recognition Accuracy | 100% (500/500) | - | - | - | Real audio recognition |
| Speech Recognition Delay | - | 0.988s | 0.949s | 1.255s | ASR recognition time (including 800ms silence) |
| Server Processing Delay | - | 0.849s | 0.454s | 3.759s | Server-side processing time (LLM+TTS) |
| User Perception Delay | - | 1.837s | 1.433s | 4.723s | Speaking end to receiving reply |
#### Server Resource Occupation
| Resource Type | Idle | Peak | Remarks |
|---------|-------|------|------|
| CPU Usage | 0% | 80% | 8-core CPU occupancy rate |
| Memory Occupation | 1.8G | 1.96G | JVM heap memory stable |
| Network Bandwidth (Uplink) | 0 | 2200KB/s | Client audio upload |
| Network Bandwidth (Downlink) | 0 | 3300KB/s | Server audio issuance |
| WebSocket Connection Number | 0 | 100 | Concurrent active connections |
#### Audio Transmission Quality
| Indicator | Value | Remarks |
|-----|------|------|
| Average Audio Frame Interval | 58.07ms | Audio frame sending interval |
| Frame Delay Rate | 8.47% (4226/49918) | > 65ms |
### Test Result Visualization
<div align="center">
<img src="docs/images/xiaozhi_test.png" alt="Performance Test Results" width="800" style="margin: 10px;" />
<p><strong>Concurrent Test Data Visualization</strong> - Latency Distribution and Performance Metrics Statistics</p>
</div>
---
### Business Cooperation
We accept various project developments. If you have specific requirements or are interested in the commercial version, please contact us through WeChat for discussion.
<img src="./web/public/static/img/wechat.jpg" alt="WeChat" width="200" />
## Contribution Guide
Welcome contributions of any form! If you have good ideas or find issues, please contact us in the following ways:
### WeChat
The WeChat group has over 200 members, and we cannot add new members by scanning the QR code. You can add my WeChat account and mention **Xiao Zhi**; I will invite you to the WeChat group.
<img src="./web/public/static/img/wechat.jpg" alt="WeChat" width="200" />
### QQ
Welcome to join our QQ group for discussion. QQ Group Number: 790820705
<img src="./web/public/static/img/qq.jpg" alt="QQ Group" width="200" />
---
## Disclaimer
This project provides technical implementation code only and does not provide any media content. When using related functions, users must ensure they have legitimate rights or copyright licenses and comply with copyright laws and regulations in their region.
Example content or resources involved in the project come from the network or are provided by user submissions, used only for functional demonstration and technical testing. If any content infringes on your rights, please contact us immediately, and we will take deletion and other measures after verification.
The project developers are not responsible for any content obtained or played by users using the project code. Using this project means you agree to bear all legal risks and responsibilities during use.
---
## Star History
<a href="https://www.star-history.com/#joey-zhou/xiaozhi-esp32-server-java&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=joey-zhou/xiaozhi-esp32-server-java&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=joey-zhou/xiaozhi-esp32-server-java&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=joey-zhou/xiaozhi-esp32-server-java&type=Date" />
</picture>
</a>
Connection Info
You Might Also Like
Appwrite
Build like a team of hundreds
awesome-claude-skills
A curated list of awesome Claude Skills, resources, and tools for...
cc-switch
All-in-One Assistant for Claude Code, Codex & Gemini CLI across platforms.
claude-flow
Claude-Flow v2.7.0 is an enterprise AI orchestration platform.
semantic-kernel
Build and deploy intelligent AI agents with Semantic Kernel's orchestration...
opik
Opik is a versatile tool for managing and tracking experiments in machine learning.