Content
```markdown
<p align="center" class="trendshift">
<a href="https://trendshift.io/repositories/14130" target="_blank">
<img src="https://trendshift.io/api/badge/repositories/14130" alt="Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/>
</a>
</p>
<p align="center">
<a href="https://github.com/huangjunsen0406/py-xiaozhi/releases/latest">
<img src="https://img.shields.io/github/v/release/huangjunsen0406/py-xiaozhi?style=flat-square&logo=github&color=blue" alt="Release"/>
</a>
<a href="https://opensource.org/licenses/MIT">
<img src="https://img.shields.io/badge/License-MIT-green.svg?style=flat-square" alt="License: MIT"/>
</a>
<a href="https://github.com/huangjunsen0406/py-xiaozhi/stargazers">
<img src="https://img.shields.io/github/stars/huangjunsen0406/py-xiaozhi?style=flat-square&logo=github" alt="Stars"/>
</a>
<a href="https://github.com/huangjunsen0406/py-xiaozhi/releases/latest">
<img src="https://img.shields.io/github/downloads/huangjunsen0406/py-xiaozhi/total?style=flat-square&logo=github&color=52c41a1&maxAge=86400" alt="Download"/>
</a>
<a href="https://gitee.com/huang-jun-sen/py-xiaozhi">
<img src="https://img.shields.io/badge/Gitee-FF5722?style=flat-square&logo=gitee" alt="Gitee"/>
</a>
<a href="https://huangjunsen0406.github.io/py-xiaozhi/guide/00_%E6%96%87%E6%A1%A3%E7%9B%AE%E5%BD%95.html">
<img alt="Documentation" src="https://img.shields.io/badge/Documentation-Click%20to%20View-blue?labelColor=2d2d2d" />
</a>
</p>
简体中文 | [English](README.en.md)
## Project Introduction
py-xiaozhi is a small intelligent voice client implemented in Python, aimed at learning through code and experiencing the AI Xiaozhi's voice functionality without hardware conditions. This repository is a port based on [xiaozhi-esp32](https://github.com/78/xiaozhi-esp32).
## Demo
- [Bilibili Demo Video](https://www.bilibili.com/video/BV1HmPjeSED2/#reply255921347937)

## Features
### 🎯 Core AI Features
- **AI Voice Interaction**: Supports voice input and recognition, enabling intelligent human-computer interaction and providing a natural and smooth conversation experience.
- **Visual Multimodal**: Supports image recognition and processing, offering multimodal interaction capabilities to understand image content.
- **Smart Wake-up**: Supports multiple wake-up words to activate interaction, eliminating the hassle of manual operation (configurable).
- **Automatic Conversation Mode**: Achieves a continuous conversation experience, enhancing user interaction fluidity.
### 🔧 MCP Tool Ecosystem
- **System Control Tools**: System status monitoring, application management, volume control, device management, etc.
- **Schedule Management Tools**: Full-featured schedule management, supporting event creation, querying, updating, and deletion, with intelligent categorization and reminders.
- **Scheduled Task Tools**: Countdown timer functionality, supporting delayed execution of MCP tools, and parallel management of multiple tasks.
- **Music Playback Tools**: Online music search and playback, supporting playback control, lyrics display, and local cache management.
- **12306 Query Tools**: 12306 railway ticket query, supporting ticket queries, transfer queries, and train route queries.
- **Search Tools**: Web search and webpage content retrieval, supporting Bing search and intelligent content parsing.
- **Recipe Tools**: A rich recipe library, supporting recipe search, categorized queries, and intelligent recommendations.
- **Map Tools**: Amap services, supporting geocoding, route planning, nearby searches, and weather queries.
- **Bazi Astrology Tools**: Traditional Bazi astrology analysis, supporting Bazi calculation, marriage analysis, and Huangli queries.
- **Camera Tools**: Image capture and AI analysis, supporting photo recognition and intelligent Q&A.
### 🏠 IoT Device Integration
- **Device Management Architecture**: Unified device management based on the Thing model, supporting asynchronous calls for attributes and methods.
- **Smart Home Control**: Supports control of devices such as lights, volume, and temperature sensors.
- **State Synchronization Mechanism**: Real-time state monitoring, supporting incremental updates and concurrent state retrieval.
- **Expandable Design**: Modular device drivers, easy to add new device types.
### 🎵 Advanced Audio Processing
- **Multilevel Audio Processing**: Supports Opus encoding/decoding and real-time resampling.
- **Voice Activity Detection**: VAD detector implements intelligent interruption, supporting real-time monitoring of voice activity.
- **Wake Word Detection**: Offline voice recognition based on Sherpa-ONNX, supporting multiple wake words and pinyin matching.
- **Audio Stream Management**: Independent input and output streams, supporting stream reconstruction and error recovery.
- **Audio Echo Cancellation**: Integrated WebRTC audio processing module, providing high-quality echo cancellation functionality.
- **System Audio Recording**: Supports system audio recording, achieving audio loopback processing.
### 🖥️ User Interface
- **Graphical Interface**: Modern GUI based on PyQt5, supporting Xiaozhi expressions and text display, enhancing visual experience.
- **Command Line Mode**: Supports CLI operation, suitable for embedded devices or environments without GUI.
- **System Tray**: Background operation support, integrating system tray functionality.
- **Global Hotkeys**: Supports global hotkey operations, enhancing usability.
- **Settings Interface**: Complete settings management interface, supporting custom configuration.
### 🔒 Security and Stability
- **Encrypted Audio Transmission**: Supports WSS protocol, ensuring the security of audio data and preventing information leakage.
- **Device Activation System**: Supports v1/v2 dual protocol activation, automatically handling verification codes and device fingerprints.
- **Error Recovery**: Complete error handling and recovery mechanism, supporting reconnection after disconnection.
### 🌐 Cross-Platform Support
- **System Compatibility**: Compatible with Windows 10+, macOS 10.15+, and Linux systems.
- **Protocol Support**: Supports dual protocol communication via WebSocket and MQTT.
- **Multi-Environment Deployment**: Supports both GUI and CLI modes, adapting to different deployment environments.
- **Platform Optimization**: Audio and system control optimizations tailored for different platforms.
### 🔧 Developer Friendly
- **Modular Architecture**: Clear code structure and separation of responsibilities, facilitating secondary development.
- **Asynchronous First**: Event-driven architecture based on asyncio, enabling high-performance concurrent processing.
- **Configuration Management**: Layered configuration system, supporting dot notation access and dynamic updates.
- **Logging System**: Complete logging and debugging support.
- **API Documentation**: Detailed code documentation and usage guides.
## System Requirements
### Basic Requirements
- **Python Version**: 3.9 - 3.12
- **Operating System**: Windows 10+, macOS 10.15+, Linux
- **Audio Devices**: Microphone and speaker devices
- **Network Connection**: Stable internet connection (for AI services and online features)
### Recommended Configuration
- **Memory**: At least 4GB RAM (recommended 8GB+)
- **Processor**: Modern CPU supporting AVX instruction set
- **Storage**: At least 2GB of available disk space (for model files and cache)
- **Audio**: Audio devices supporting 16kHz sampling rate
### Optional Feature Requirements
- **Voice Wake-up**: Requires downloading the Sherpa-ONNX voice recognition model
- **Camera Functionality**: Requires camera devices and OpenCV support
## Please Read This First
- Carefully read the [Project Documentation](https://huangjunsen0406.github.io/py-xiaozhi/) for startup tutorials and file descriptions.
- The main branch contains the latest code; you need to manually reinstall pip dependencies each time there is an update to prevent missing new dependencies.
[Getting Started with Xiaozhi Client (Video Tutorial)](https://www.bilibili.com/video/BV1dWQhYEEmq/?vd_source=2065ec11f7577e7107a55bbdc3d12fce)
## Technical Architecture
### Core Architecture Design
- **Event-Driven Architecture**: Asynchronous event loop based on asyncio, supporting high concurrency processing.
- **Layered Design**: Clear separation of application layer, protocol layer, device layer, and UI layer.
- **Singleton Pattern**: Core components use the singleton pattern to ensure unified resource management.
- **Plugin System**: MCP tool system and IoT devices support plugin-based extensions.
### Key Technical Components
- **Audio Processing**: Opus encoding/decoding, WebRTC echo cancellation, real-time resampling, system audio recording.
- **Voice Recognition**: Sherpa-ONNX offline model, voice activity detection, wake word recognition.
- **Protocol Communication**: Dual protocol support for WebSocket/MQTT, encrypted transmission, automatic reconnection.
- **Configuration System**: Layered configuration, dot notation access, dynamic updates, JSON/YAML support.
### Performance Optimization
- **Asynchronous First**: Entire system is asynchronous, avoiding blocking operations.
- **Memory Management**: Intelligent caching, garbage collection.
- **Audio Optimization**: 5ms low-latency processing, queue management, streaming.
- **Concurrency Control**: Task pool management, semaphore control, thread safety.
### Security Mechanisms
- **Encrypted Communication**: WSS/TLS encryption, certificate verification.
- **Device Authentication**: Dual protocol activation, device fingerprint recognition.
- **Permission Control**: Tool permission management, API access control.
- **Error Isolation**: Exception isolation, fault recovery, graceful degradation.
## Development Guide
### Project Structure
```
py-xiaozhi/
├── main.py # Main entry point of the application (CLI parameter handling)
├── src/
│ ├── application.py # Core logic of the application
│ ├── audio_codecs/ # Audio codecs
│ │ ├── aec_processor.py # Audio echo cancellation processor
│ │ ├── audio_codec.py # Base class for audio codecs
│ │ └── system_audio_recorder.py # System audio recorder
│ ├── audio_processing/ # Audio processing modules
│ │ ├── vad_detector.py # Voice activity detection
│ │ └── wake_word_detect.py # Wake word detection
│ ├── core/ # Core components
│ │ ├── ota.py # Online update module
│ │ └── system_initializer.py # System initializer
│ ├── display/ # Display interface abstraction layer
│ ├── iot/ # IoT device management
│ │ ├── thing.py # Base class for devices
│ │ ├── thing_manager.py # Device manager
│ │ └── things/ # Specific device implementations
│ ├── mcp/ # MCP tool system
│ │ ├── mcp_server.py # MCP server
│ │ └── tools/ # Various tool modules
│ ├── protocols/ # Communication protocols
│ ├── utils/ # Utility functions
│ └── views/ # UI view components
├── libs/ # Third-party native libraries
│ ├── libopus/ # Opus audio codec library
│ ├── webrtc_apm/ # WebRTC audio processing module
│ └── SystemAudioRecorder/ # System audio recording tool
├── config/ # Configuration file directory
├── models/ # Voice model files
├── assets/ # Static resource files
├── scripts/ # Auxiliary scripts
├── requirements.txt # Python dependency package list
└── build.json # Build configuration file
```
### Development Environment Setup
```bash
# Clone the project
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
cd py-xiaozhi
# Install dependencies
pip install -r requirements.txt
# Code formatting
./format_code.sh
# Run the program - GUI mode (default)
python main.py
# Run the program - CLI mode
python main.py --mode cli
# Specify communication protocol
python main.py --protocol websocket # WebSocket (default)
python main.py --protocol mqtt # MQTT protocol
```
### Core Development Model
- **Asynchronous First**: Use `async/await` syntax to avoid blocking operations.
- **Error Handling**: Complete exception handling and logging.
- **Configuration Management**: Use `ConfigManager` for unified configuration access.
- **Test Driven**: Write unit tests to ensure code quality.
### Extension Development
- **Add MCP Tools**: Create new tool modules in the `src/mcp/tools/` directory.
- **Add IoT Devices**: Inherit from the `Thing` base class to implement new devices.
- **Add Protocols**: Implement the `Protocol` abstract base class.
- **Add Interfaces**: Extend `BaseDisplay` to implement new UI components.
### State Transition Diagram
```
+----------------+
| |
v |
+------+ Wake Word/Button +------------+ | +------------+
| IDLE | -----------> | CONNECTING | --+-> | LISTENING |
+------+ +------------+ +------------+
^ |
| | Voice recognition complete
| +------------+ v
+--------- | SPEAKING | <-----------------+
Playback complete +------------+
```
## Contribution Guidelines
Contributions in the form of issue reports and code contributions are welcome. Please ensure to follow these guidelines:
1. Code style adheres to PEP8 standards.
2. Submitted PRs include appropriate tests.
3. Update relevant documentation.
## Community and Support
### Thanks to the following open-source contributors
>
> No particular order
[Xiaoxia](https://github.com/78)
[zhh827](https://github.com/zhh827)
[Si Bo Zhi Lian - Li Honggang](https://github.com/SmartArduino)
[HonestQiao](https://github.com/HonestQiao)
[vonweller](https://github.com/vonweller)
[Sun Weigong](https://space.bilibili.com/416954647)
[isamu2025](https://github.com/isamu2025)
[Rain120](https://github.com/Rain120)
[kejily](https://github.com/kejily)
[Electric Wave Bilibili Jun](https://space.bilibili.com/119751)
[Saibo Intelligent](https://shop115087494.m.taobao.com/?refer=https%3A%2F%2Fm.tb.cn%2F&ut_sk=1.WMelxbgDQWkDAJ1Rq9Pn7DCD_21380790_1757337352472.Copy.shop&suid=0E25E948-651D-46E0-8E89-5C8CB03B4F56&shop_navi=shopindex&sourceType=shop&shareUniqueId=33038752403&un=d22c5ceda82844ab8bd7bab98ffeb263&share_crt_v=1&un_site=0&spm=a2159r.13376460.0.0&sp_tk=dkRKUjRKUWo2ZHY%3D&bc_fl_src=share-1041250486811064-2-1&cpp=1&shareurl=true&short_name=h.SaBKVHytsCKIPNS&bxsign=scdGtSe264e_qkFQBh0rXCkF-Mrb_s6t35EnpVBBU5dsrd-J24c-_rn_PhJiXRk0hg2hjGoAm0L7j2UQg27OIH_6gZkbhKDyLziD2cy4pDf8sC3KmqrF55TXP3USZaPTw_-&app=weixin)
### Sponsorship Support
<div align="center">
<h3>Thanks to all sponsors for their support ❤️</h3>
<p>Whether it's interface resources, device compatibility testing, or financial support, every bit of help makes the project better.</p>
<a href="https://huangjunsen0406.github.io/py-xiaozhi/sponsors/" target="_blank">
<img src="https://img.shields.io/badge/View-Sponsor%20List-brightgreen?style=for-the-badge&logo=github" alt="Sponsor List">
</a>
<a href="https://huangjunsen0406.github.io/py-xiaozhi/sponsors/" target="_blank">
<img src="https://img.shields.io/badge/Become-a-Sponsor-orange?style=for-the-badge&logo=heart" alt="Become a Sponsor">
</a>
</div>
## Project Statistics
[](https://www.star-history.com/#huangjunsen0406/py-xiaozhi&Date)
## License
[MIT License](LICENSE)
```
Connection Info
You Might Also Like

Continue
Continue is an open-source project for seamless server management.
semantic-kernel
Semantic Kernel is an SDK for building and deploying AI agents and systems.

repomix
Repomix packages your codebase into AI-friendly formats for seamless integration.
53AIHub
53AI Hub is an open-source AI portal for building and operating AI agents and tools.
Browserbase
MCP Server enables LLMs to automate web interactions and data extraction.
apple-mcp
Apple MCP enhances your Mac, turning apps into AI superpowers.