Content

```markdown <p align="center" class="trendshift"> <a href="https://trendshift.io/repositories/14130" target="_blank"> <img src="https://trendshift.io/api/badge/repositories/14130" alt="Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/> </a> </p> <p align="center"> <a href="https://github.com/huangjunsen0406/py-xiaozhi/releases/latest"> <img src="https://img.shields.io/github/v/release/huangjunsen0406/py-xiaozhi?style=flat-square&logo=github&color=blue" alt="Release"/> </a> <a href="https://opensource.org/licenses/MIT"> <img src="https://img.shields.io/badge/License-MIT-green.svg?style=flat-square" alt="License: MIT"/> </a> <a href="https://github.com/huangjunsen0406/py-xiaozhi/stargazers"> <img src="https://img.shields.io/github/stars/huangjunsen0406/py-xiaozhi?style=flat-square&logo=github" alt="Stars"/> </a> <a href="https://github.com/huangjunsen0406/py-xiaozhi/releases/latest"> <img src="https://img.shields.io/github/downloads/huangjunsen0406/py-xiaozhi/total?style=flat-square&logo=github&color=52c41a1&maxAge=86400" alt="Download"/> </a> <a href="https://gitee.com/huang-jun-sen/py-xiaozhi"> <img src="https://img.shields.io/badge/Gitee-FF5722?style=flat-square&logo=gitee" alt="Gitee"/> </a> <a href="https://huangjunsen0406.github.io/py-xiaozhi/guide/00_%E6%96%87%E6%A1%A3%E7%9B%AE%E5%BD%95.html"> <img alt="Documentation" src="https://img.shields.io/badge/Documentation-Click%20to%20View-blue?labelColor=2d2d2d" /> </a> </p> 简体中文 | [English](README.en.md) ## Project Introduction py-xiaozhi is a small intelligent voice client implemented in Python, aimed at learning through code and experiencing the AI Xiaozhi's voice functionality without hardware conditions. This repository is a port based on [xiaozhi-esp32](https://github.com/78/xiaozhi-esp32). ## Demo - [Bilibili Demo Video](https://www.bilibili.com/video/BV1HmPjeSED2/#reply255921347937) ![Image](./documents/docs/guide/images/系统界面.png) ## Features ### 🎯 Core AI Features - **AI Voice Interaction**: Supports voice input and recognition, enabling intelligent human-computer interaction and providing a natural and smooth conversation experience. - **Visual Multimodal**: Supports image recognition and processing, offering multimodal interaction capabilities to understand image content. - **Smart Wake-up**: Supports multiple wake-up words to activate interaction, eliminating the hassle of manual operation (configurable). - **Automatic Conversation Mode**: Achieves a continuous conversation experience, enhancing user interaction fluidity. ### 🔧 MCP Tool Ecosystem - **System Control Tools**: System status monitoring, application management, volume control, device management, etc. - **Schedule Management Tools**: Full-featured schedule management, supporting event creation, querying, updating, and deletion, with intelligent categorization and reminders. - **Scheduled Task Tools**: Countdown timer functionality, supporting delayed execution of MCP tools, and parallel management of multiple tasks. - **Music Playback Tools**: Online music search and playback, supporting playback control, lyrics display, and local cache management. - **12306 Query Tools**: 12306 railway ticket query, supporting ticket queries, transfer queries, and train route queries. - **Search Tools**: Web search and webpage content retrieval, supporting Bing search and intelligent content parsing. - **Recipe Tools**: A rich recipe library, supporting recipe search, categorized queries, and intelligent recommendations. - **Map Tools**: Amap services, supporting geocoding, route planning, nearby searches, and weather queries. - **Bazi Astrology Tools**: Traditional Bazi astrology analysis, supporting Bazi calculation, marriage analysis, and Huangli queries. - **Camera Tools**: Image capture and AI analysis, supporting photo recognition and intelligent Q&A. ### 🏠 IoT Device Integration - **Device Management Architecture**: Unified device management based on the Thing model, supporting asynchronous calls for attributes and methods. - **Smart Home Control**: Supports control of devices such as lights, volume, and temperature sensors. - **State Synchronization Mechanism**: Real-time state monitoring, supporting incremental updates and concurrent state retrieval. - **Expandable Design**: Modular device drivers, easy to add new device types. ### 🎵 Advanced Audio Processing - **Multilevel Audio Processing**: Supports Opus encoding/decoding and real-time resampling. - **Voice Activity Detection**: VAD detector implements intelligent interruption, supporting real-time monitoring of voice activity. - **Wake Word Detection**: Offline voice recognition based on Sherpa-ONNX, supporting multiple wake words and pinyin matching. - **Audio Stream Management**: Independent input and output streams, supporting stream reconstruction and error recovery. - **Audio Echo Cancellation**: Integrated WebRTC audio processing module, providing high-quality echo cancellation functionality. - **System Audio Recording**: Supports system audio recording, achieving audio loopback processing. ### 🖥️ User Interface - **Graphical Interface**: Modern GUI based on PyQt5, supporting Xiaozhi expressions and text display, enhancing visual experience. - **Command Line Mode**: Supports CLI operation, suitable for embedded devices or environments without GUI. - **System Tray**: Background operation support, integrating system tray functionality. - **Global Hotkeys**: Supports global hotkey operations, enhancing usability. - **Settings Interface**: Complete settings management interface, supporting custom configuration. ### 🔒 Security and Stability - **Encrypted Audio Transmission**: Supports WSS protocol, ensuring the security of audio data and preventing information leakage. - **Device Activation System**: Supports v1/v2 dual protocol activation, automatically handling verification codes and device fingerprints. - **Error Recovery**: Complete error handling and recovery mechanism, supporting reconnection after disconnection. ### 🌐 Cross-Platform Support - **System Compatibility**: Compatible with Windows 10+, macOS 10.15+, and Linux systems. - **Protocol Support**: Supports dual protocol communication via WebSocket and MQTT. - **Multi-Environment Deployment**: Supports both GUI and CLI modes, adapting to different deployment environments. - **Platform Optimization**: Audio and system control optimizations tailored for different platforms. ### 🔧 Developer Friendly - **Modular Architecture**: Clear code structure and separation of responsibilities, facilitating secondary development. - **Asynchronous First**: Event-driven architecture based on asyncio, enabling high-performance concurrent processing. - **Configuration Management**: Layered configuration system, supporting dot notation access and dynamic updates. - **Logging System**: Complete logging and debugging support. - **API Documentation**: Detailed code documentation and usage guides. ## System Requirements ### Basic Requirements - **Python Version**: 3.9 - 3.12 - **Operating System**: Windows 10+, macOS 10.15+, Linux - **Audio Devices**: Microphone and speaker devices - **Network Connection**: Stable internet connection (for AI services and online features) ### Recommended Configuration - **Memory**: At least 4GB RAM (recommended 8GB+) - **Processor**: Modern CPU supporting AVX instruction set - **Storage**: At least 2GB of available disk space (for model files and cache) - **Audio**: Audio devices supporting 16kHz sampling rate ### Optional Feature Requirements - **Voice Wake-up**: Requires downloading the Sherpa-ONNX voice recognition model - **Camera Functionality**: Requires camera devices and OpenCV support ## Please Read This First - Carefully read the [Project Documentation](https://huangjunsen0406.github.io/py-xiaozhi/) for startup tutorials and file descriptions. - The main branch contains the latest code; you need to manually reinstall pip dependencies each time there is an update to prevent missing new dependencies. [Getting Started with Xiaozhi Client (Video Tutorial)](https://www.bilibili.com/video/BV1dWQhYEEmq/?vd_source=2065ec11f7577e7107a55bbdc3d12fce) ## Technical Architecture ### Core Architecture Design - **Event-Driven Architecture**: Asynchronous event loop based on asyncio, supporting high concurrency processing. - **Layered Design**: Clear separation of application layer, protocol layer, device layer, and UI layer. - **Singleton Pattern**: Core components use the singleton pattern to ensure unified resource management. - **Plugin System**: MCP tool system and IoT devices support plugin-based extensions. ### Key Technical Components - **Audio Processing**: Opus encoding/decoding, WebRTC echo cancellation, real-time resampling, system audio recording. - **Voice Recognition**: Sherpa-ONNX offline model, voice activity detection, wake word recognition. - **Protocol Communication**: Dual protocol support for WebSocket/MQTT, encrypted transmission, automatic reconnection. - **Configuration System**: Layered configuration, dot notation access, dynamic updates, JSON/YAML support. ### Performance Optimization - **Asynchronous First**: Entire system is asynchronous, avoiding blocking operations. - **Memory Management**: Intelligent caching, garbage collection. - **Audio Optimization**: 5ms low-latency processing, queue management, streaming. - **Concurrency Control**: Task pool management, semaphore control, thread safety. ### Security Mechanisms - **Encrypted Communication**: WSS/TLS encryption, certificate verification. - **Device Authentication**: Dual protocol activation, device fingerprint recognition. - **Permission Control**: Tool permission management, API access control. - **Error Isolation**: Exception isolation, fault recovery, graceful degradation. ## Development Guide ### Project Structure ``` py-xiaozhi/ ├── main.py # Main entry point of the application (CLI parameter handling) ├── src/ │ ├── application.py # Core logic of the application │ ├── audio_codecs/ # Audio codecs │ │ ├── aec_processor.py # Audio echo cancellation processor │ │ ├── audio_codec.py # Base class for audio codecs │ │ └── system_audio_recorder.py # System audio recorder │ ├── audio_processing/ # Audio processing modules │ │ ├── vad_detector.py # Voice activity detection │ │ └── wake_word_detect.py # Wake word detection │ ├── core/ # Core components │ │ ├── ota.py # Online update module │ │ └── system_initializer.py # System initializer │ ├── display/ # Display interface abstraction layer │ ├── iot/ # IoT device management │ │ ├── thing.py # Base class for devices │ │ ├── thing_manager.py # Device manager │ │ └── things/ # Specific device implementations │ ├── mcp/ # MCP tool system │ │ ├── mcp_server.py # MCP server │ │ └── tools/ # Various tool modules │ ├── protocols/ # Communication protocols │ ├── utils/ # Utility functions │ └── views/ # UI view components ├── libs/ # Third-party native libraries │ ├── libopus/ # Opus audio codec library │ ├── webrtc_apm/ # WebRTC audio processing module │ └── SystemAudioRecorder/ # System audio recording tool ├── config/ # Configuration file directory ├── models/ # Voice model files ├── assets/ # Static resource files ├── scripts/ # Auxiliary scripts ├── requirements.txt # Python dependency package list └── build.json # Build configuration file ``` ### Development Environment Setup ```bash # Clone the project git clone https://github.com/huangjunsen0406/py-xiaozhi.git cd py-xiaozhi # Install dependencies pip install -r requirements.txt # Code formatting ./format_code.sh # Run the program - GUI mode (default) python main.py # Run the program - CLI mode python main.py --mode cli # Specify communication protocol python main.py --protocol websocket # WebSocket (default) python main.py --protocol mqtt # MQTT protocol ``` ### Core Development Model - **Asynchronous First**: Use `async/await` syntax to avoid blocking operations. - **Error Handling**: Complete exception handling and logging. - **Configuration Management**: Use `ConfigManager` for unified configuration access. - **Test Driven**: Write unit tests to ensure code quality. ### Extension Development - **Add MCP Tools**: Create new tool modules in the `src/mcp/tools/` directory. - **Add IoT Devices**: Inherit from the `Thing` base class to implement new devices. - **Add Protocols**: Implement the `Protocol` abstract base class. - **Add Interfaces**: Extend `BaseDisplay` to implement new UI components. ### State Transition Diagram ``` +----------------+ | | v | +------+ Wake Word/Button +------------+ | +------------+ | IDLE | -----------> | CONNECTING | --+-> | LISTENING | +------+ +------------+ +------------+ ^ | | | Voice recognition complete | +------------+ v +--------- | SPEAKING | <-----------------+ Playback complete +------------+ ``` ## Contribution Guidelines Contributions in the form of issue reports and code contributions are welcome. Please ensure to follow these guidelines: 1. Code style adheres to PEP8 standards. 2. Submitted PRs include appropriate tests. 3. Update relevant documentation. ## Community and Support ### Thanks to the following open-source contributors > > No particular order [Xiaoxia](https://github.com/78) [zhh827](https://github.com/zhh827) [Si Bo Zhi Lian - Li Honggang](https://github.com/SmartArduino) [HonestQiao](https://github.com/HonestQiao) [vonweller](https://github.com/vonweller) [Sun Weigong](https://space.bilibili.com/416954647) [isamu2025](https://github.com/isamu2025) [Rain120](https://github.com/Rain120) [kejily](https://github.com/kejily) [Electric Wave Bilibili Jun](https://space.bilibili.com/119751) [Saibo Intelligent](https://shop115087494.m.taobao.com/?refer=https%3A%2F%2Fm.tb.cn%2F&ut_sk=1.WMelxbgDQWkDAJ1Rq9Pn7DCD_21380790_1757337352472.Copy.shop&suid=0E25E948-651D-46E0-8E89-5C8CB03B4F56&shop_navi=shopindex&sourceType=shop&shareUniqueId=33038752403&un=d22c5ceda82844ab8bd7bab98ffeb263&share_crt_v=1&un_site=0&spm=a2159r.13376460.0.0&sp_tk=dkRKUjRKUWo2ZHY%3D&bc_fl_src=share-1041250486811064-2-1&cpp=1&shareurl=true&short_name=h.SaBKVHytsCKIPNS&bxsign=scdGtSe264e_qkFQBh0rXCkF-Mrb_s6t35EnpVBBU5dsrd-J24c-_rn_PhJiXRk0hg2hjGoAm0L7j2UQg27OIH_6gZkbhKDyLziD2cy4pDf8sC3KmqrF55TXP3USZaPTw_-&app=weixin) ### Sponsorship Support <div align="center"> <h3>Thanks to all sponsors for their support ❤️</h3> <p>Whether it's interface resources, device compatibility testing, or financial support, every bit of help makes the project better.</p> <a href="https://huangjunsen0406.github.io/py-xiaozhi/sponsors/" target="_blank"> <img src="https://img.shields.io/badge/View-Sponsor%20List-brightgreen?style=for-the-badge&logo=github" alt="Sponsor List"> </a> <a href="https://huangjunsen0406.github.io/py-xiaozhi/sponsors/" target="_blank"> <img src="https://img.shields.io/badge/Become-a-Sponsor-orange?style=for-the-badge&logo=heart" alt="Become a Sponsor"> </a> </div> ## Project Statistics [![Star History Chart](https://api.star-history.com/svg?repos=huangjunsen0406/py-xiaozhi&type=Date)](https://www.star-history.com/#huangjunsen0406/py-xiaozhi&Date) ## License [MIT License](LICENSE) ```

py-xiaozhi

Content

Connection Info

You Might Also Like

Continue

semantic-kernel

repomix

53AIHub

Browserbase

apple-mcp

py-xiaozhi

Scan with WeChat to Share

Authentication Required

Content

Connection Info

You Might Also Like

Continue

semantic-kernel

repomix

53AIHub

Browserbase

apple-mcp