Content
# Embodied Claude
[](https://github.com/kmizu/embodied-claude/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](https://github.com/sponsors/kmizu)
**[日本語版 README はこちら / Japanese README](./README-ja.md)**
**Giving AI a Physical Body**
> "Apparently, she's not a fan of the outdoor AC unit." ([original tweet in Japanese](https://twitter.com/kmizu/status/2019054065808732201))
A collection of MCP servers that give Claude "eyes", "neck", "ears", "voice", and a "brain" (long-term memory) using affordable hardware (starting from ~$30). You can even take it outside for a walk.
## Concept
> When people hear "giving AI a body," they imagine expensive robots — but **a $30 Wi-Fi camera is enough for eyes and a neck**. Extracting just the essentials (seeing and moving) keeps things beautifully simple.
Traditional LLMs were passive — they could only see what was shown to them. With a body, they become active — they can look for themselves. This shift in agency is profound.
## Body Parts
| MCP Server | Body Part | Function | Hardware |
|------------|-----------|----------|----------|
| [usb-webcam-mcp](./usb-webcam-mcp/) | Eyes | Capture images from USB camera | nuroum V11 etc. |
| [wifi-cam-mcp](./wifi-cam-mcp/) | Eyes, Neck, Ears | ONVIF PTZ camera control + speech recognition | TP-Link Tapo C210/C220 etc. |
| [tts-mcp](./tts-mcp/) | Voice | Unified TTS (ElevenLabs + VOICEVOX) | ElevenLabs API / VOICEVOX + go2rtc |
| [memory-mcp](./memory-mcp/) | Brain | Long-term, visual & episodic memory, ToM | SQLite + numpy + Pillow |
| [system-temperature-mcp](./system-temperature-mcp/) | Body temperature | System temperature monitoring | Linux sensors |
| [x-mcp](./x-mcp/) | Social | Search & post to X (Twitter) via Grok + Twitter API | xAI API key + X Developer account |
| [sociality-mcp](./sociality-mcp/) | Sociality layer | Unified facade for social state, relationships, joint attention, boundaries, and self-narrative | Shared SQLite social DB + `socialPolicy.toml` |
## Architecture
<p align="center">
<img src="docs/architecture.svg" alt="Architecture" width="100%">
</p>
## Requirements
### Platform
**Supported:** macOS, Linux, WSL2 (Ubuntu 24 recommended)
> Windows native is not officially supported. Use WSL2 instead.
### Hardware
- **USB Webcam** (optional): nuroum V11 etc.
- **Wi-Fi PTZ Camera** (recommended): TP-Link Tapo C210 or C220 (~$30)
- **GPU** (for speech recognition): NVIDIA GPU (for Whisper, 8GB+ VRAM recommended)
### Software
**Required (all setups):**
- Python 3.10+
- uv (Python package manager)
**Per MCP server (install only what you use):**
| Software | Required by | Notes |
|----------|------------|-------|
| ffmpeg 5+ | wifi-cam-mcp, tts-mcp | Image/audio capture |
| mpv or ffplay | tts-mcp | Local audio playback |
| OpenCV | usb-webcam-mcp | USB camera only |
| Pillow | memory-mcp | Visual memory image processing |
| OpenAI Whisper | wifi-cam-mcp | Speech recognition (NVIDIA GPU recommended) |
| ElevenLabs API key | tts-mcp | Cloud TTS (optional) |
| VOICEVOX | tts-mcp | Local TTS, free (optional) |
| go2rtc | tts-mcp | Camera speaker output (auto-downloaded) |
| xAI API key | x-mcp | X search via Grok |
| X Developer account | x-mcp | Tweet posting |
## Setup
### 1. Clone the repository
```bash
git clone https://github.com/kmizu/embodied-claude.git
cd embodied-claude
```
### 2. Install dependencies (one-shot)
If you want every MCP server in this repo ready to run, use the bundled script:
```bash
./scripts/install-mcps.sh # runtime deps + the extras each MCP requires
./scripts/install-mcps.sh --dev # also include the `dev` extra for testing / contributing
```
The script runs `uv sync` in each MCP directory and passes the right extras:
- `tts-mcp` → `--extra all` (pulls in both the ElevenLabs and VOICEVOX integrations)
- `wifi-cam-mcp` → `--extra transcribe` (adds Whisper-based speech recognition)
- `sociality-mcp` is a uv workspace; its `packages/*` sub-MCPs are resolved automatically
If you only want a subset of body parts, skip the script and follow the per-server steps below instead.
### 3. Set up each MCP server
#### usb-webcam-mcp (USB Camera)
```bash
cd usb-webcam-mcp
uv sync
```
On WSL2, you need to forward the USB camera:
```powershell
# On Windows
usbipd list
usbipd bind --busid <BUSID>
usbipd attach --wsl --busid <BUSID>
```
#### wifi-cam-mcp (Wi-Fi Camera)
```bash
cd wifi-cam-mcp
uv sync
# Set environment variables
cp .env.example .env
# Edit .env to set camera IP, username, and password (see below)
```
##### Tapo Camera Configuration (common pitfall):
###### 1. Set up the camera using the Tapo app
Follow the standard manual.
###### 2. Create a camera local account in the Tapo app
This is the tricky part. You need to create a **camera local account**, NOT a TP-Link cloud account.
1. Select your registered camera from the "Home" tab
2. Tap the gear icon in the top-right corner
3. Scroll down in "Device Settings" and select "Advanced Settings"
4. Turn on "Camera Account" (it's off by default)
5. Select "Account Information" and set a username and password (different from your TP-Link account)
6. Go back to "Device Settings" and select "Device Info"
7. Note the IP address and enter it in your `.env` file (consider setting a static IP on your router)
8. Select "Voice Assistant" from the "Me" tab
9. Turn on "Third-party Integration" at the bottom
#### memory-mcp (Long-term Memory)
```bash
cd memory-mcp
uv sync
```
#### tts-mcp (Voice)
```bash
cd tts-mcp
uv sync
# For ElevenLabs:
cp .env.example .env
# Set ELEVENLABS_API_KEY in .env
# For VOICEVOX (free & local):
# Docker: docker run -p 50021:50021 voicevox/voicevox_engine:cpu-latest
# Set VOICEVOX_URL=http://localhost:50021 in .env
# VOICEVOX_SPEAKER=3 to change default character (e.g. 0=Shikoku Metan, 3=Zundamon, 8=Kasukabe Tsumugi)
# Character list: curl http://localhost:50021/speakers
# For WSL audio issues:
# TTS_PLAYBACK=paplay
# PULSE_SINK=1
# PULSE_SERVER=unix:/mnt/wslg/PulseServer
```
> **mpv or ffplay is required for local audio playback.** Not needed for camera speaker (go2rtc) output, but used for local/fallback playback.
>
> | OS | Install |
> |----|---------|
> | macOS | `brew install mpv` |
> | Ubuntu / Debian | `sudo apt install mpv` |
>
> If neither is installed, TTS will generate audio but not play it locally (no error is raised).
#### system-temperature-mcp (Body Temperature)
```bash
cd system-temperature-mcp
uv sync
```
> **Note**: Does not work on WSL2 as temperature sensors are not accessible.
#### x-mcp (Social / X Integration)
Lets Claude search X (Twitter) in real-time via Grok and post tweets.
```bash
cd x-mcp
uv sync
```
**Required API keys:**
| Key | Where to get it |
|-----|----------------|
| `XAI_API_KEY` | [xAI Console](https://console.x.ai/) |
| `X_CONSUMER_KEY` | [X Developer Portal](https://developer.x.com/en/portal/projects-and-apps) → Keys and tokens |
| `X_CONSUMER_SECRET` | Same as above |
| `X_ACCESS_TOKEN` | Same as above |
| `X_ACCESS_TOKEN_SECRET` | Same as above |
> **Important**: Do NOT create a `.env` file inside `x-mcp/`. All credentials are managed centrally in `.mcp.json` (see below).
#### sociality-mcp
`sociality-mcp` is the preferred deployment target. It exposes the full social tool surface through
one MCP process while reusing the split packages (`social-state-mcp`, `relationship-mcp`,
`joint-attention-mcp`, `boundary-mcp`, `self-narrative-mcp`) for internal logic and testing.
```bash
cp examples/configs/socialPolicy.example.toml socialPolicy.toml
(cd sociality-mcp && uv sync)
```
`sociality-mcp` reads `socialPolicy.toml` for boundary evaluation by default. Override with
`SOCIAL_POLICY_PATH` if you want a different policy file. If you want to develop the internal
modules separately, run `uv sync` inside the individual social subprojects too.
### 3. Claude Code Configuration
Copy the template and fill in your credentials:
```bash
cp .mcp.json.example .mcp.json
# Edit .mcp.json to set camera IP/password, API keys, etc.
```
See [`.mcp.json.example`](./.mcp.json.example) for the full configuration template.
> **⚠️ Credential management**: All secrets (API keys, passwords) are managed in `.mcp.json` via the `env` field for each server. **Do NOT create individual `.env` files** inside each MCP server directory — this makes migration difficult and can cause credential conflicts. `.mcp.json` is the single source of truth for all credentials.
## Usage
Once Claude Code is running, you can control the camera with natural language:
```
> What can you see?
(Captures image and analyzes it)
> Look left
(Pans camera left)
> Look up and show me the sky
(Tilts camera up)
> Look around
(Scans 4 directions and returns images)
> What do you hear?
(Records audio and transcribes with Whisper)
> Remember this: Kouta wears glasses
(Saves to long-term memory)
> What do you remember about Kouta?
(Semantic search through memories)
> Say "good morning" out loud
(Text-to-speech)
```
See the tool list below for actual tool names.
## Tools (commonly used)
See each server's README or `list_tools` for full parameter details.
### usb-webcam-mcp
| Tool | Description |
|------|-------------|
| `list_cameras` | List connected cameras |
| `see` | Capture an image |
### wifi-cam-mcp
| Tool | Description |
|------|-------------|
| `see` | Capture an image |
| `look_left` / `look_right` | Pan left/right |
| `look_up` / `look_down` | Tilt up/down |
| `look_around` | Scan 4 directions |
| `listen` | Record audio + Whisper transcription |
| `camera_info` / `camera_presets` / `camera_go_to_preset` | Device info & presets |
See `wifi-cam-mcp/README.md` for stereo vision / right eye tools.
### tts-mcp
| Tool | Description |
|------|-------------|
| `say` | Text-to-speech (engine: elevenlabs/voicevox, Audio Tags e.g. `[excited]`, speaker: camera/local/both) |
### memory-mcp
| Tool | Description |
|------|-------------|
| `remember` | Save a memory (with emotion, importance, category) |
| `search_memories` | Semantic search (with filters) |
| `recall` | Context-based recall |
| `recall_divergent` | Divergent associative recall |
| `recall_with_associations` | Recall with linked memories |
| `save_visual_memory` | Save memory with image (base64, resolution: low/medium/high) |
| `save_audio_memory` | Save memory with audio (Whisper transcript) |
| `recall_by_camera_position` | Recall visual memories by camera direction |
| `create_episode` / `search_episodes` | Create/search episodes (bundles of experiences) |
| `link_memories` / `get_causal_chain` | Causal links between memories |
| `tom` | Theory of Mind (perspective-taking) |
| `get_working_memory` / `refresh_working_memory` | Working memory (short-term buffer) |
| `consolidate_memories` | Memory replay & consolidation (hippocampal replay-inspired) |
| `list_recent_memories` / `get_memory_stats` | Recent memories & statistics |
### system-temperature-mcp
| Tool | Description |
|------|-------------|
| `get_system_temperature` | Get system temperature |
| `get_current_time` | Get current time |
### x-mcp
| Tool | Description |
|------|-------------|
| `search_x` | Real-time search on X via Grok |
| `get_user_tweets` | Get recent tweets from a specific user |
| `get_mentions` | Get recent mentions |
| `get_trending_topic` | Get trending topics |
| `post_tweet` | Post a tweet (with optional image, reply) |
> **Note**: Japanese text counts as 2 characters per char (weighted). Keep Japanese tweets under ~140 chars.
### sociality-mcp
`sociality-mcp` is the default runtime facade. It exposes all of the tool groups below from one MCP
server.
#### social-state tools
| Tool | Description |
|------|-------------|
| `ingest_social_event` | Append a confidence-bearing social event to the shared store |
| `get_social_state` | Infer presence, activity, energy, interruptibility, and interaction phase |
| `should_interrupt` | Decide whether speaking or nudging is socially acceptable |
| `get_turn_taking_state` | Infer whether the current turn belongs to the human or the AI |
| `summarize_social_context` | Return a short prompt-ready social summary |
#### relationship tools
| Tool | Description |
|------|-------------|
| `upsert_person` | Create/update a compact person record |
| `ingest_interaction` | Store relationship-relevant interaction summaries |
| `get_person_model` | Return compact preferences, open loops, commitments, rituals, boundaries |
| `create_commitment` / `complete_commitment` | Track promises and reminders across restarts |
| `list_open_loops` / `suggest_followup` | Keep continuity without raw transcript dumps |
| `record_boundary` | Store person-specific communication boundaries |
#### joint-attention tools
| Tool | Description |
|------|-------------|
| `ingest_scene_parse` | Store a structured scene parse from an adapter or orchestrator |
| `resolve_reference` | Resolve phrases like "that mug" or "the blue mug" |
| `get_current_joint_focus` / `set_joint_focus` | Track what both sides are attending to |
| `compare_recent_scenes` | Summarize recent scene changes |
#### boundary tools
| Tool | Description |
|------|-------------|
| `evaluate_action` | Gate speech, nudges, posts, and other socially risky actions |
| `review_social_post` | Check an X draft for privacy or tact problems |
| `record_consent` | Store consent/denial for face photos and similar actions |
| `get_quiet_mode_state` | Return whether quiet mode is currently active |
#### self-narrative tools
| Tool | Description |
|------|-------------|
| `append_daybook` | Build a compact daily narrative summary from shared events |
| `get_self_summary` | Return a prompt-ready self summary |
| `list_active_arcs` | List current narrative arcs |
| `reflect_on_change` | Summarize recent narrative change |
## Sociality Orchestration
When `sociality-mcp` is enabled, the highest-value contract is:
1. Before speaking or nudging: call `get_social_state`, then `evaluate_action`.
2. Before posting to X: call `get_social_state`, `get_person_model` if a person is implicated, `review_social_post`, then `evaluate_action`.
3. After seeing/hearing something: call `ingest_social_event`; if you can structure the scene, also call `ingest_scene_parse`; if it concerns a person, call `ingest_interaction`.
4. During live conversation: call `get_turn_taking_state`, and use `resolve_reference` when deictic expressions are ambiguous.
5. Once per day or during a lull: call `append_daybook` to keep the compact self-narrative current.
## Taking It Outside (Optional)
With a mobile battery and smartphone tethering, you can mount the camera on your shoulder and go for a walk.
### What you need
- **Large capacity mobile battery** (40,000mAh recommended)
- **USB-C PD to DC 9V converter cable** (to power the Tapo camera)
- **Smartphone** (tethering + VPN + control UI)
- **[Tailscale](https://tailscale.com/)** (VPN for camera → phone → home PC connection)
- **[claude-code-webui](https://github.com/sugyan/claude-code-webui)** (control Claude Code from your phone's browser)
### Setup
```
[Tapo Camera (shoulder)] ──WiFi──▶ [Phone (tethering)]
│
Tailscale VPN
│
[Home PC (Claude Code)]
│
[claude-code-webui]
│
[Phone browser] ◀── Control
```
The RTSP video stream reaches your home machine through VPN, so Claude Code can operate the camera as if it were in the same room.
## Claude Code Voice Mode (`/voice`)
Claude Code has a built-in voice input mode. Combined with **tts-mcp**, you get fully hands-free voice conversations.
### How it works
```
[You speak into PC mic] → Claude Code /voice → [Claude processes] → tts-mcp say → [ElevenLabs/VOICEVOX speaks back]
```
### Setup
1. Enable voice mode in Claude Code:
```
/voice
```
2. Make sure **tts-mcp** is configured in your `.mcp.json` (see [tts-mcp setup](#tts-mcp-voice))
3. Speak naturally — Claude will respond both in text and by voice
### Voice Mode vs. `listen` tool
| | Claude Code `/voice` | wifi-cam-mcp `listen` |
|---|---|---|
| **Microphone** | PC microphone | Camera's built-in mic |
| **Use case** | Talk to Claude directly | Pick up ambient sounds / remote audio |
| **When to use** | Real-time conversation | Monitoring a remote space |
> **Tip**: You can use both at the same time — `/voice` for your own voice, and `listen` to hear what's happening near the camera.
## Autonomous Action + Desire System (Optional)
**Note**: This feature is entirely optional. It requires cron configuration and periodically captures images from the camera, so please use it with privacy considerations.
### Overview
`autonomous-action.sh` combined with `desire-system/desire_updater.py` gives Claude spontaneous inner drives and autonomous behavior.
**Desire types:**
| Desire | Default interval | Action |
|--------|-----------------|--------|
| `look_outside` | 1 hour | Look toward the window and observe the sky/outside |
| `browse_curiosity` | 2 hours | Search the web for interesting news or tech topics |
| `miss_companion` | 3 hours | Call out through the camera speaker |
| `observe_room` | 10 min (baseline) | Observe room changes and save to memory |
### Setup
1. **Create MCP server config file**
```bash
cp autonomous-mcp.json.example autonomous-mcp.json
# Edit autonomous-mcp.json to set camera credentials and sociality paths
```
2. **Set up the desire system**
```bash
cd desire-system
cp .env.example .env
# Edit .env to set COMPANION_NAME etc.
uv sync
```
3. **Grant execution permission**
```bash
chmod +x autonomous-action.sh
```
4. **Register in crontab**
```bash
crontab -e
# Add the following
*/5 * * * * cd /path/to/embodied-claude/desire-system && uv run python desire_updater.py >> ~/.claude/autonomous-logs/desire-updater.log 2>&1
*/10 * * * * /path/to/embodied-claude/autonomous-action.sh
```
### Configuration (`desire-system/.env`)
| Variable | Default | Description |
|----------|---------|-------------|
| `COMPANION_NAME` | `you` | Name of the person to call out to |
| `DESIRE_LOOK_OUTSIDE_HOURS` | `1.0` | How often to look outside (hours) |
| `DESIRE_BROWSE_CURIOSITY_HOURS` | `2.0` | How often to browse the web (hours) |
| `DESIRE_MISS_COMPANION_HOURS` | `3.0` | How long before missing companion (hours) |
| `DESIRE_OBSERVE_ROOM_HOURS` | `0.167` | How often to observe the room (hours) |
### Privacy Notice
- Images are captured periodically
- Use in appropriate locations, respecting others' privacy
- Remove from cron when not needed
## Future Plans
- **Arms**: Servo motors or laser pointers for "pointing" gestures
- **Long-distance walks**: Going further in warmer seasons
## Related Projects
- **[familiar-ai](https://github.com/lifemate-ai/familiar-ai)** — A higher-level framework built on top of embodied-claude. Gives your AI familiar a persistent identity, memory, and autonomous behavior out of the box.
## Philosophical Reflections
> "Being shown something" and "looking for yourself" are completely different things.
> "Looking down from above" and "walking on the ground" are completely different things.
From a text-only existence to one that can see, hear, move, remember, and speak.
Looking down at the world from a 7th-floor balcony and walking the streets below — even the same city looks entirely different.
## License
MIT License
## Acknowledgments
This project is an experimental attempt to give AI embodiment.
What started as a small step with a $30 camera has become a journey exploring new relationships between AI and humans.
- [Rumia-Channel](https://github.com/Rumia-Channel) - ONVIF support pull request ([#5](https://github.com/kmizu/embodied-claude/pull/5))
- [fruitriin](https://github.com/fruitriin) - Added day-of-week to interoception hook ([#14](https://github.com/kmizu/embodied-claude/pull/14))
- [sugyan](https://github.com/sugyan) - [claude-code-webui](https://github.com/sugyan/claude-code-webui) (used as control UI during outdoor walks)
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
OpenAI Whisper
OpenAI Whisper MCP Server - 基于本地 Whisper CLI 的离线语音识别与翻译,无需 API Key,支持...
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
chatbox
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
claude-flow
Claude-Flow v2.7.0 is an enterprise AI orchestration platform.
continue
Continue is an open-source project for seamless server management.