Content
# Databricks MCP Server
A production-ready **Model Context Protocol (MCP)** server that exposes Databricks REST capabilities to MCP-compatible agents and tooling. Version **0.4.4** introduces structured responses, resource caching, retry-aware networking, and end-to-end resilience improvements.
---
## Table of Contents
1. [Key Capabilities](#key-capabilities)
2. [Architecture Highlights](#architecture-highlights)
3. [Installation](#installation)
4. [Configuration](#configuration)
5. [Running the Server](#running-the-server)
6. [Integrating with MCP Clients](#integrating-with-mcp-clients)
7. [Working with Tool Responses](#working-with-tool-responses)
8. [Available Tools](#available-tools)
9. [Development Workflow](#development-workflow)
10. [Testing](#testing)
11. [Publishing Builds](#publishing-builds)
12. [Support & Contact](#support--contact)
13. [License](#license)
---
## Key Capabilities
- **Structured MCP Responses** - Each tool returns a `CallToolResult` with a human-readable summary in `content` and machine-readable payloads in `structuredContent` that conform to the tool’s `outputSchema`.
- **Resource Caching** - Large notebook/workspace exports are cached once and returned as `resource_link` content blocks with URIs such as `resource://databricks/exports/{id}` (also reflected in metadata for convenience).
- **Progress & Metrics** - Long-running actions stream MCP progress notifications and track per-tool success/error/timeout/cancel metrics.
- **Resilient Networking** - Shared HTTP client injects request IDs, enforces timeouts, and retries retryable Databricks responses (408/429/5xx) with exponential backoff.
- **Async Runtime** - Built on `mcp.server.FastMCP` with centralized JSON logging and concurrency guards for predictable stdio behaviour.
## Architecture Highlights
- `databricks_mcp/server/databricks_mcp_server.py` - FastMCP server with tool registration, progress handling, metrics, and resource caching.
- `databricks_mcp/core/utils.py` - HTTP utilities with correlation IDs, retries, and error mapping to `DatabricksAPIError`.
- `databricks_mcp/core/logging_utils.py` - JSON logging configuration for stderr/file outputs.
- `databricks_mcp/core/models.py` - Pydantic models (e.g., `ClusterConfig`) used by tool schemas.
- Tests under `tests/` mock Databricks APIs to validate orchestration, structured responses, and schema metadata without shell scripts.
For an in-depth tour of data flow and design decisions, see [ARCHITECTURE.md](ARCHITECTURE.md).
## Installation
### Prerequisites
- Python 3.10+
- [`uv`](https://github.com/astral-sh/uv) for dependency management and publishing
### Quick Install (recommended)
Register the server with Cursor using the deeplink below - it resolves to `uvx databricks-mcp-server@latest` and picks up future updates automatically.
```text
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
```
### Manual Installation
```bash
# Clone and enter the repository
git clone https://github.com/markov-kernel/databricks-mcp.git
cd databricks-mcp
# Create an isolated environment (optional but recommended)
uv venv
source .venv/bin/activate # Linux/Mac
# .\.venv\Scripts\activate # Windows PowerShell
# Install package and development dependencies
uv pip install -e .
uv pip install -e ".[dev]"
```
## Configuration
Set the following environment variables (or populate `.env` from `.env.example`).
```bash
export DATABRICKS_HOST="https://your-workspace.databricks.com"
export DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX"
export DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" # optional default
export TOOL_TIMEOUT_SECONDS=300
export MAX_CONCURRENT_REQUESTS=8
export HTTP_TIMEOUT_SECONDS=60
export API_MAX_RETRIES=3
export API_RETRY_BACKOFF_SECONDS=0.5
```
## Running the Server
```bash
uvx databricks-mcp-server@latest
```
> Tip: append `--refresh` (e.g., `uvx databricks-mcp-server@latest --refresh`) to force `uv` to resolve the latest PyPI release after publishing. Logs are emitted as JSON lines to stderr and persisted to `databricks_mcp.log` in the working directory.
To adjust logging:
```bash
uvx databricks-mcp-server@latest -- --log-level DEBUG
```
## Integrating with MCP Clients
### Codex CLI (STDIO)
Register the server and inject credentials via the CLI:
```bash
codex mcp add databricks --env DATABRICKS_HOST="https://your-workspace.databricks.com" --env DATABRICKS_TOKEN="dapi_XXXXXXXXXXXXXXXX" --env DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" -- uvx databricks-mcp-server@latest
# Add --refresh immediately after a publish to invalidate the uv cache
```
Or edit `~/.codex/config.toml`:
```toml
[mcp_servers.databricks]
command = "uvx"
args = ["databricks-mcp-server@latest"]
env = {
DATABRICKS_HOST = "https://your-workspace.databricks.com",
DATABRICKS_TOKEN = "dapi_XXXXXXXXXXXXXXXX",
DATABRICKS_WAREHOUSE_ID = "sql_warehouse_12345"
}
startup_timeout_sec = 15
tool_timeout_sec = 300
```
> Planning an HTTP deployment? Codex also supports `url = "https://…"` plus
> `bearer_token_env_var = "DATABRICKS_TOKEN"` or `codex mcp login` (with
> `experimental_use_rmcp_client = true`).
### Cursor
```jsonc
{
"mcpServers": {
"databricks-mcp-local": {
"command": "uvx",
"args": ["databricks-mcp-server@latest"],
"env": {
"DATABRICKS_HOST": "https://your-workspace.databricks.com",
"DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXX",
"DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345",
"RUNNING_VIA_CURSOR_MCP": "true"
}
}
}
}
```
Restart Cursor after saving and invoke tools as `databricks-mcp-local:<tool>`.
### Claude CLI
```bash
claude mcp add databricks-mcp-local -s user -e DATABRICKS_HOST="https://your-workspace.databricks.com" -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX" -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" -- uvx databricks-mcp-server@latest
```
## Working with Tool Responses
`structuredContent` carries machine-readable payloads. Large artifacts are returned as `resource_link` content blocks using URIs like `resource://databricks/exports/{id}` and can be fetched via the MCP resources API.
```python
result = await session.call_tool("list_clusters", {})
summary = next((block.text for block in result.content if getattr(block, "type", "") == "text"), "")
clusters = (result.structuredContent or {}).get("clusters", [])
resource_links = [block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"]
```
Progress notifications follow MCP’s progress token mechanism; Codex surfaces these messages in the UI while a tool runs.
### Example - SQL Query
```python
result = await session.call_tool("execute_sql", {"statement": "SELECT * FROM samples LIMIT 10"})
print(result.content[0].text)
rows = (result.structuredContent or {}).get("result", [])
```
### Example - Workspace File Export
```python
result = await session.call_tool("get_workspace_file_content", {
"path": "/Users/user@domain.com/report.ipynb",
"format": "SOURCE"
})
resource_link = next((block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"), None)
if resource_link:
contents = await session.read_resource(resource_link["uri"])
```
## Available Tools
| Category | Tool | Description |
| --- | --- | --- |
| Clusters | `list_clusters`, `create_cluster`, `terminate_cluster`, `get_cluster`, `start_cluster`, `resize_cluster`, `restart_cluster` | Manage interactive clusters |
| Jobs | `list_jobs`, `create_job`, `delete_job`, `run_job`, `run_notebook`, `sync_repo_and_run_notebook`, `get_run_status`, `list_job_runs`, `cancel_run` | Manage scheduled and ad-hoc jobs |
| Workspace | `list_notebooks`, `export_notebook`, `import_notebook`, `delete_workspace_object`, `get_workspace_file_content`, `get_workspace_file_info` | Inspect and manage workspace assets |
| DBFS | `list_files`, `dbfs_put`, `dbfs_delete` | Explore DBFS and manage files |
| SQL | `execute_sql` | Submit SQL statements with optional `warehouse_id`, `catalog`, `schema_name` |
| Libraries | `install_library`, `uninstall_library`, `list_cluster_libraries` | Manage cluster libraries |
| Repos | `create_repo`, `update_repo`, `list_repos`, `pull_repo` | Manage Databricks repos |
| Unity Catalog | `list_catalogs`, `create_catalog`, `list_schemas`, `create_schema`, `list_tables`, `create_table`, `get_table_lineage` | Unity Catalog operations |
## Development Workflow
```bash
uv run black databricks_mcp tests
uv run pylint databricks_mcp tests
uv run pytest
uv build
uv publish --token "$PYPI_TOKEN"
```
## Testing
```bash
uv run pytest
```
Pytest suites mock Databricks APIs, providing deterministic structured outputs and transcript tests.
## Publishing Builds
Ensure `PYPI_TOKEN` is available (via `.env` or environment) before publishing:
```bash
uv build
uv publish --token "$PYPI_TOKEN"
```
## Support & Contact
- Maintainer: Olivier Debeuf De Rijcker (olivier@markov.bot)
- Issues: [GitHub Issues](https://github.com/markov-kernel/databricks-mcp/issues)
- Architecture deep dive: [ARCHITECTURE.md](ARCHITECTURE.md)
## License
Released under the MIT License. See [LICENSE](LICENSE).
Connection Info
You Might Also Like
MarkItDown MCP
Converting files and office documents to Markdown.
Time
Obtaining current time information and converting time between different...
Filesystem
Model Context Protocol Servers
Sequential Thinking
Offers a structured approach to dynamic and reflective problem-solving,...
Git
Model Context Protocol Servers
Context 7
Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors