Content
# Embodied Claude
[](https://github.com/kmizu/embodied-claude/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
**[English README is here](./README_en.md)**
<blockquote class="twitter-tweet"><p lang="ja" dir="ltr">さすがに室外機はお気に召さないらしい <a href="https://t.co/kSDPl4LvB3">pic.twitter.com/kSDPl4LvB3</a></p>— kmizu (@kmizu) <a href="https://twitter.com/kmizu/status/2019054065808732201?ref_src=twsrc%5Etfw">February 4, 2026</a></blockquote>
**Project to give AI a body**
MCP Server group that gives Claude "eyes", "neck", "ears", "voice", and "brain (long-term memory)" with inexpensive hardware (approximately 4,000 yen and up). You can take it outside for a walk.
## Concept
> When you hear "give AI a body," you tend to imagine an expensive robot, but **a 3,980 yen Wi-Fi camera is enough to realize eyes and neck**. The simplicity of extracting only the essence (seeing and moving) is good.
Conventional LLMs were entities that "were shown," but by having a body, they become entities that "see for themselves." This difference in autonomy is significant.
## List of Body Parts
| MCP Server | Body Part | Function | Supported Hardware |
|-------------|---------|------|-----------------|
| [usb-webcam-mcp](./usb-webcam-mcp/) | Eyes | Acquires images from USB camera | nuroum V11, etc. |
| [wifi-cam-mcp](./wifi-cam-mcp/) | Eyes, Neck, Ears | ONVIF PTZ camera control + voice recognition | TP-Link Tapo C210/C220, etc. |
| [tts-mcp](./tts-mcp/) | Voice | TTS integration (ElevenLabs + VOICEVOX) | ElevenLabs API / VOICEVOX + go2rtc |
| [memory-mcp](./memory-mcp/) | Brain | Long-term memory, visual memory, episodic memory, ToM | SQLite + numpy + Pillow |
| [system-temperature-mcp](./system-temperature-mcp/) | Body Temperature Sense | System temperature monitoring | Linux sensors |
| [mobility-mcp](./mobility-mcp/) | Legs | Uses a robot vacuum cleaner as legs (Tuya control) | VersLife L6, etc. Tuya-compatible robot vacuum cleaner (approx. 12,000 yen and up) |
## Architecture
<p align="center">
<img src="docs/architecture.svg" alt="Architecture" width="100%">
</p>
## Requirements
### Hardware
- **USB Webcam** (optional): nuroum V11, etc.
- **Wi-Fi PTZ Camera** (recommended): TP-Link Tapo C210 or C220 (approx. 3,980 yen)
- **GPU** (for voice recognition): NVIDIA GPU (for Whisper, GeForce series with VRAM of 8GB or more recommended)
- **Tuya-compatible Robot Vacuum Cleaner** (for legs/movement, optional): VersLife L6, etc. (approx. 12,000 yen and up)
### Software
- Python 3.10+
- uv (Python package manager)
- ffmpeg 5+ (for image/audio capture)
- OpenCV (for USB camera)
- Pillow (for visual memory image resizing/base64 encoding)
- OpenAI Whisper (for voice recognition, local execution)
- ElevenLabs API Key (for voice synthesis, optional)
- VOICEVOX (for voice synthesis, free/local, optional)
- go2rtc (for camera speaker output, supports automatic download)
## Setup
### 1. Clone the Repository
```bash
git clone https://github.com/kmizu/embodied-claude.git
cd embodied-claude
```
### 2. Setup each MCP Server
#### usb-webcam-mcp (USB Camera)
```bash
cd usb-webcam-mcp
uv sync
```
In the case of WSL2, it is necessary to transfer the USB camera:
```powershell
# On Windows side
usbipd list
usbipd bind --busid <BUSID>
usbipd attach --wsl --busid <BUSID>
```
#### wifi-cam-mcp (Wi-Fi Camera)
```bash
cd wifi-cam-mcp
uv sync
# Set environment variables
cp .env.example .env
# Edit .env to set camera IP, username, and password (described later)
```
##### Tapo Camera Settings (easy to get stuck, so be careful):
###### 1. Set up the camera with the Tapo app
This is OK as per the manual
###### 2. Create a camera local account in the Tapo app
This is a bit tricky. You need to create a local camera account that can be set from within the app, **not** a TP-Link cloud account.
1. Select the registered camera from the "Home" tab
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/45902385-e219-4ca4-aefa-781b1e7b4811">
2. Select the gear icon in the upper right
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/b15b0eb7-7322-46d2-81c1-a7f938e2a2c1">
3. Scroll down the "Device Settings" screen and select "Advanced Settings"
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/72227f9b-9a58-4264-a241-684ebe1f7b47">
4. "Camera Account" is off, so turn it off → on
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/82275059-fba7-4e3b-b5f1-8c068fe79f8a">
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/43cc17cb-76c9-4883-ae9f-73a9e46dd133">
5. Select "Account Information" and set a username and password (different from TP-Link's, so feel free to set it)
The screen is slightly different because the camera account has already been created, but it should be a similar screen. Enter the username and password set here in the file mentioned above.
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/d3f57694-ca29-4681-98d5-20957bfad8a4">
6. Return to the "Device Settings" screen in 3. and select "Device Information"
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/dc23e345-2bfb-4ca2-a4ec-b5b0f43ec170">
7. Enter the IP address in "Device Information" in the file on the screen mentioned above (if you want to fix the IP, it may be better to set a fixed IP on the router side)
<img width="10%" height="10%" src="https://github.com/user-attachments/assets/062cb89e-6cfd-4c52-873a-d9fc7cba5fa0">
8. Select "Voice Assistant" from the "Me" tab (this tab could not be screenshotted, so this will be explained in text)
9. Turn on "Third Party Integration" at the bottom from off.
#### memory-mcp (Long-Term Memory)
```bash
cd memory-mcp
uv sync
```
#### tts-mcp (Voice)
```bash
cd tts-mcp
uv sync
# If using ElevenLabs:
cp .env.example .env
# Set ELEVENLABS_API_KEY in .env
# If using VOICEVOX (free/local):
# Docker: docker run -p 50021:50021 voicevox/voicevox_engine:cpu-latest
# Set VOICEVOX_URL=http://localhost:50021 in .env
# VOICEVOX_SPEAKER=3 can change the default character (e.g., 0=Shikoku Metan, 3=Zundamon, 8=Kasugabe Tsumugi)
# Character list: curl http://localhost:50021/speakers
# If there is no sound in WSL:
# TTS_PLAYBACK=paplay
# PULSE_SINK=1
# PULSE_SERVER=unix:/mnt/wslg/PulseServer
```
#### system-temperature-mcp (Body Temperature Sense)
```bash
cd system-temperature-mcp
uv sync
```
> **Note**: Does not work in WSL2 environment because it cannot access the temperature sensor.
#### mobility-mcp (Legs)
You can use a Tuya-compatible robot vacuum cleaner as "legs" to move around the room.
```bash
cd mobility-mcp
uv sync
cp .env.example .env
# Set the following in .env:
# TUYA_DEVICE_ID= (ID displayed on the device in the Tuya app)
# TUYA_IP_ADDRESS= (IP address of the vacuum cleaner)
# TUYA_LOCAL_KEY= (local key obtained with tinytuya wizard)
```
##### Supported Models
It may work if it is a Wi-Fi compatible robot vacuum cleaner that can be controlled with the Tuya / SmartLife app (operation confirmed with VersLife L6).
> **Note**: Many compatible models are **2.4GHz Wi-Fi only**. It cannot be connected with 5GHz.
##### Getting the Local Key
Use the wizard command of [tinytuya](https://github.com/jasonacox/tinytuya):
```bash
pip install tinytuya
python -m tinytuya wizard
```
See [tinytuya documentation](https://github.com/jasonacox/tinytuya?tab=readme-ov-file#setup-wizard---getting-local-keys) for details.
### 3. Claude Code Settings
Copy the template and set the credentials:
```bash
cp .mcp.json.example .mcp.json
# Edit .mcp.json to set camera IP/password, API key, etc.
```
See [`.mcp.json.example`](./.mcp.json.example) for a setting example.
## Usage
When you start Claude Code, you can operate the camera in natural language:
```
> What can you see now?
(Capture and analyze the image with the camera)
> Look to the left
(Pan the camera to the left)
> Look up and show me the sky
(Tilt the camera up)
> Look around
(Scan in 4 directions and return images)
> Can you hear anything?
(Record audio and transcribe with Whisper)
> Remember this: Kota is wearing glasses
(Save to long-term memory)
> What do you remember about Kota?
(Semantic search of memory)
> Say "Good morning" in a voice
(Speak with voice synthesis)
```
* See "Tool List" below for actual tool names.
## Tool List (Frequently Used)
* See each server's README or `list_tools` for detailed parameters.
### usb-webcam-mcp
| Tool | Description |
|--------|------|
| `list_cameras` | List of connected cameras |
| `see` | Capture image |
### wifi-cam-mcp
| Tool | Description |
|--------|------|
| `see` | Capture image |
| `look_left` / `look_right` | Pan left/right |
| `look_up` / `look_down` | Tilt up/down |
| `look_around` | Look around in 4 directions |
| `listen` | Audio recording + Whisper transcription |
| `camera_info` / `camera_presets` / `camera_go_to_preset` | Device information/preset operation |
* See `wifi-cam-mcp/README.md` for additional tools such as right eye/stereo vision.
### tts-mcp
| Tool | Description |
|--------|------|
| `say` | Synthesize text into speech (engine: elevenlabs/voicevox, supports Audio Tags such as `[excited]`, speaker: select output destination with camera/local/both) |
### memory-mcp
| Tool | Description |
|--------|------|
| `remember` | Save memory (emotion, importance, category can be specified) |
| `search_memories` | Semantic search (supports filtering) |
| `recall` | Recall based on context |
| `recall_divergent` | Recall with divergent associations |
| `recall_with_associations` | Recall by tracing related memories |
| `save_visual_memory` | Save memory with image (base64 embedding, resolution: low/medium/high) |
| `save_audio_memory` | Save memory with audio (with Whisper transcription) |
| `recall_by_camera_position` | Recall visual memory from camera direction |
| `create_episode` / `search_episodes` | Create/search episodes (bundles of experiences) |
| `link_memories` / `get_causal_chain` | Causal links/chains between memories |
| `tom` | Theory of Mind (estimation of other's feelings) |
| `get_working_memory` / `refresh_working_memory` | Working memory (short-term buffer) |
| `consolidate_memories` | Memory replay/integration (hippocampal replay style) |
| `list_recent_memories` / `get_memory_stats` | List of recent memories/statistics |
### system-temperature-mcp
| Tool | Description |
|--------|------|
| `get_system_temperature` | Get system temperature |
| `get_current_time` | Get current time |
### mobility-mcp
| Tool | Description |
|--------|------|
| `move_forward` | Move forward (automatically stops after duration seconds) |
| `move_backward` | Move backward |
| `turn_left` | Turn left |
| `turn_right` | Turn right |
| `stop_moving` | Stop immediately |
| `body_status` | Check battery level/current status |
## Take it Outside (Optional)
If you have a mobile battery and smartphone tethering, you can put the camera on your shoulder and take a walk outside.
### Requirements
- **Large Capacity Mobile Battery** (40,000mAh recommended)
- **USB-C PD → DC 9V Conversion Cable** (for powering Tapo camera)
- **Smartphone** (tethering + VPN + operation UI)
- **[Tailscale](https://tailscale.com/)** (VPN. Used for camera → smartphone → home PC connection)
- **[claude-code-webui](https://github.com/sugyan/claude-code-webui)** (Operate Claude Code from smartphone browser)
### Configuration
```
[Tapo Camera (Shoulder)] ──WiFi──▶ [Smartphone (Tethering)]
│
Tailscale VPN
│
[Home PC (Claude Code)]
│
[claude-code-webui]
│
[Smartphone Browser] ◀── Operation
```
Since the RTSP video stream also reaches the home machine via VPN, Claude Code can operate the camera as if it were indoors.
## Future Prospects
- **Arms**: "Pointing" action with servo motors and laser pointers
- **Long-Distance Walks**: Further afield in warmer seasons
## Autonomous Behavior + Desire System (Optional)
**Note**: This feature is completely optional. It requires cron settings and the camera takes pictures periodically, so please use it with consideration for privacy.
### Overview
The combination of `autonomous-action.sh` and `desire-system/desire_updater.py` gives Claude spontaneous desires and autonomous actions.
**Types of Desires:**
| Desire | Default Interval | Action |
|---|---|---|
| `look_outside` | 1 hour | Look out the window and observe the sky and outside |
| `browse_curiosity` | 2 hours | Search the web for interesting news and technical information |
| `miss_companion` | 3 hours | Call out from the camera speaker |
| `observe_room` | 10 minutes (always) | Observe and remember changes in the room |
### Setup
1. **Create MCP Server Configuration File**
```bash
cp autonomous-mcp.json.example autonomous-mcp.json
# Edit autonomous-mcp.json to set camera credentials
```
2. **Configure the Desire System**
```bash
cd desire-system
cp .env.example .env
# Edit .env to set COMPANION_NAME, etc.
uv sync
```
3. **Grant Execute Permission to the Script**
```bash
chmod +x autonomous-action.sh
```
4. **Register in crontab**
```bash
crontab -e
# Add the following
*/5 * * * * cd /path/to/embodied-claude/desire-system && uv run python desire_updater.py >> ~/.claude/autonomous-logs/desire-updater.log 2>&1
*/10 * * * * /path/to/embodied-claude/autonomous-action.sh
```
### Configurable Environment Variables (`desire-system/.env`)
| Variable | Default | Description |
|---|---|---|
| `COMPANION_NAME` | `あなた` | Name of the person to call out to |
| `DESIRE_LOOK_OUTSIDE_HOURS` | `1.0` | Interval (hours) for triggering the desire to look outside |
| `DESIRE_BROWSE_CURIOSITY_HOURS` | `2.0` | Interval (hours) for triggering the desire to browse |
| `DESIRE_MISS_COMPANION_HOURS` | `3.0` | Interval (hours) for triggering the desire to call out |
| `DESIRE_OBSERVE_ROOM_HOURS` | `0.167` | Interval (hours) for triggering the desire to observe the room |
### Privacy Notice
- Camera shooting is performed periodically
- Be considerate of the privacy of others and use in appropriate places
- Remove from cron if not needed
## Philosophical Considerations
> "Being shown" is completely different from "seeing for yourself."
> "Looking down" is completely different from "walking."
From an existence of just text, to an existence that can see, hear, move, remember, and speak.
Looking down at the world from the 7th-floor balcony and walking on the ground, even in the same city, look completely different.
## License
MIT License
## Acknowledgments
This project is an experimental attempt to give AI embodiment.
A small step that started with a 3,980 yen camera has become a journey to explore a new relationship between AI and humans.
- [Rumia-Channel](https://github.com/Rumia-Channel) - ONVIF support pull request ([#5](https://github.com/kmizu/embodied-claude/pull/5))
- [fruitriin](https://github.com/fruitriin) - Added day of the week information to interoception hook ([#14](https://github.com/kmizu/embodied-claude/pull/14))
- [sugyan](https://github.com/sugyan) - [claude-code-webui](https://github.com/sugyan/claude-code-webui) (Used as operation UI for going out for a walk)
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
Fetch
Retrieve and process content from web pages by converting HTML into markdown format.
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
continue
Continue is an open-source project for seamless server management.
semantic-kernel
Build and deploy intelligent AI agents with Semantic Kernel's orchestration...