Content
# LiteLLM MCP Server
An MCP (Model Context Protocol) server that exposes LiteLLM proxy APIs as tools. This allows AI assistants to interact with LiteLLM's unified LLM API gateway.
## Features
- **86 tools** dynamically generated from LiteLLM's OpenAPI spec
- **Full API coverage**: chat completions, embeddings, image generation, audio, assistants, files, fine-tuning, RAG, and more
- **Easy configuration** via environment variables
- **npx-runnable** for quick setup
## Installation
### Using npx (recommended)
```bash
npx litellm-mcp
```
### Global installation
```bash
npm install -g litellm-mcp
litellm-mcp
```
### From source
```bash
git clone https://github.com/shin-bot-litellm/litellm-mcp.git
cd litellm-mcp
npm install
npm run build
npm start
```
## Configuration
Set these environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `LITELLM_API_BASE` | LiteLLM proxy base URL | `http://localhost:4000` |
| `LITELLM_API_KEY` | API key for authentication | (none) |
## Usage with Claude Desktop
Add to your Claude Desktop configuration (`~/.config/claude-desktop/config.json` on Linux, `~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"litellm": {
"command": "npx",
"args": ["litellm-mcp"],
"env": {
"LITELLM_API_BASE": "http://localhost:4000",
"LITELLM_API_KEY": "your-api-key"
}
}
}
}
```
## Available Tools
The MCP server exposes 86 tools including:
### Chat & Completions
- `chat_completion_chat_completions_post` - Chat completions (OpenAI-compatible)
- `completion_completions_post` - Text completions
- `responses_api_responses_post` - OpenAI Responses API
### Embeddings
- `embeddings_embeddings_post` - Generate embeddings
### Images
- `image_generation_images_generations_post` - Generate images
- `image_edit_api_images_edits_post` - Edit images
### Audio
- `audio_speech_audio_speech_post` - Text-to-speech
- `audio_transcriptions_audio_transcriptions_post` - Speech-to-text
### Assistants (OpenAI-compatible)
- `get_assistants_assistants_get` - List assistants
- `create_assistant_assistants_post` - Create assistant
- `create_threads_threads_post` - Create thread
- `add_messages_threads_thread_id_messages_post` - Add message to thread
- `run_thread_threads_thread_id_runs_post` - Run assistant
### Files
- `list_files_files_get` - List files
- `create_file_files_post` - Upload file
- `get_file_files_file_id_get` - Get file info
- `delete_file_files_file_id_delete` - Delete file
### Batches
- `list_batches_batches_get` - List batch jobs
- `create_batch_batches_post` - Create batch job
- `retrieve_batch_batches_batch_id_get` - Get batch status
### Fine-tuning
- `list_fine_tuning_jobs_fine_tuning_jobs_get` - List fine-tuning jobs
- `create_fine_tuning_job_fine_tuning_jobs_post` - Create fine-tuning job
### Models
- `model_list_models_get` - List available models
- `model_info_v1_model_info_get` - Get model info
- `model_group_info_model_group_info_get` - Get model group info
### RAG
- `rag_ingest_rag_ingest_post` - Ingest documents
- `rag_query_rag_query_post` - Query with RAG
### Search
- `search_search_post` - Web search
- `list_search_tools_search_tools_get` - List search tools
### Utilities
- `token_counter_utils_token_counter_post` - Count tokens
- `supported_openai_params_utils_supported_openai_params_get` - Get supported params
- `rerank_rerank_post` - Rerank results
- `ocr_ocr_post` - OCR extraction
Run `litellm-mcp --list-tools` to see all available tools.
## Example: Chat Completion
Once configured, you can ask Claude to use LiteLLM:
> "Use the litellm chat_completion tool to ask gpt-4o what the capital of France is"
The assistant will call the `chat_completion_chat_completions_post` tool with:
```json
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}
```
## Development
```bash
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run in development mode
npm run dev
# List all tools
npm test
```
## How It Works
1. The server loads the LiteLLM OpenAPI spec (`openapi.json`)
2. Parses all API endpoints and converts them to MCP tools
3. When a tool is called, it makes the corresponding HTTP request to the LiteLLM proxy
4. Returns the response as tool output
## Updating the OpenAPI Spec
To update to the latest LiteLLM API:
```bash
curl -o openapi.json https://litellm-api.up.railway.app/openapi.json
npm run build
```
## License
MIT
## Links
- [LiteLLM Documentation](https://docs.litellm.ai/)
- [Model Context Protocol](https://modelcontextprotocol.io/)
- [LiteLLM GitHub](https://github.com/BerriAI/litellm)
MCP Config
Below is the configuration for this MCP Server. You can copy it directly to Cursor or other MCP clients.
mcp.json
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.