Content
# Alibaba Cloud Container Service MCP Server (ack-mcp-server)
[](LICENSE)
[](https://python.org)
[](https://github.com/jlowin/fastmcp)
Alibaba Cloud Container Service MCP Server toolset: ack-mcp-server.
Unifies ACK cluster/resource management, native Kubernetes operations, and container scenario observability capabilities, security auditing, diagnostic inspections, and other O&M capabilities into an AI-native standardized toolset.
The capabilities of this toolset are integrated into the [Alibaba Cloud Container Service Intelligent Assistant function](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/use-container-ai-assistant-for-troubleshooting-and-intelligent-q-a). It also supports integration with third-party AI Agents ([kubectl-ai](https://github.com/GoogleCloudPlatform/kubectl-ai/blob/main/pkg/mcp/README.md#local-stdio-based-server-configuration), [QWen Code](https://qwenlm.github.io/qwen-code-docs/zh/tools/mcp-server/#%E4%BD%BF%E7%94%A8-qwen-mcp-%E7%AE%A1%E7%90%86-mcp-%E6%9C%8D%E5%8A%A1%E5%99%A8), [Claude Code](https://docs.claude.com/zh-CN/docs/claude-code/mcp), [Cursor](https://cursor.com/cn/docs/context/mcp/directory), [Gemini CLI](https://github.com/google-gemini/gemini-cli/blob/main/docs/tools/mcp-server.md#configure-the-mcp-server-in-settingsjson), [VS Code](https://code.visualstudio.com/docs/copilot/chat/mcp-servers#_add-an-mcp-server), etc.) or automated system calls, based on the [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro) protocol.
It enables the completion of complex container O&M tasks through natural language interaction with AI assistants. Helps build users' own container scenario AIOps O&M system.
* [1. Overview & Function Introduction](#-1-overview--function-introduction)
* [2. How to Use & Deploy](#-2-how-to-use--deploy)
* [3. How to Develop and Run Locally](#-3-how-to-develop-and-run-locally)
* [4. How to Participate in Community Contributions](#-4-how-to-participate-in-community-contributions)
* [5. Effect-benchmark](#-5-effect--benchmark-under-continuous-construction)
* [6. Evolution Plan-roadmap](#-6-evolution-plan--roadmap)
* [7. Common Problems](#7-common-problems)
## 🌟 1. Overview & Function Introduction
### 🎬 1.1 Demonstration Effect
https://github.com/user-attachments/assets/9e48cac3-0af1-424c-9f16-3862d047cc68
### 🎯 1.2 Core Functions
**Alibaba Cloud ACK Full Lifecycle Resource Management**
- Cluster Query (`list_clusters`)
- Node Resource Management, Node Pool Scaling (Later)
- Component Addon Management (Later)
- Cluster Creation, Deletion (Later)
- Cluster Upgrade (Later)
- Cluster Resource O&M Task Query (Later)
**Native Kubernetes Operations** (`ack_kubectl`)
- Execute `kubectl` like operations (read and write permissions are controllable)
- Get logs, events, resource CRUD
- Supports all standard Kubernetes APIs
**AI-Native Container Scenario Observability**
- **Prometheus**: Supports Alibaba Cloud Prometheus and self-built Prometheus metric queries corresponding to ACK clusters, natural language to PromQL (`query_prometheus` / `query_prometheus_metric_guidance`)
- **Cluster Control Plane Log Query**: Supports querying the control plane SLS logs of ACK clusters, including SLS SQL queries, natural language to SLS-SQL (`query_controlplane_logs`)
- **Audit Logs**: Kubernetes operation audit tracking (`query_audit_log`)
- …… (More [container observability capabilities](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/observability-best-practices) ing)
**Alibaba Cloud ACK Diagnosis, Inspection Functions**
- Cluster Resource Diagnosis (`diagnose_resource`)
- Cluster Health Inspection (`query_inspect_report`)
**Enterprise-Level Engineering Capabilities**
- 🏗️ Layered Architecture: Tool layer, service layer, and authentication layer are completely decoupled
- 🔐 Dynamic Credential Injection: Supports request-level AK injection or environment credentials
- 📊 Robust Error Handling: Structured error output and typed responses
- 📦 Modular Design: Each sub-service can run independently
### 🏆 1.3 Core Advantages
- **🤖 AI Native**: Standardized interface designed for AI agents
- **🔧 Unified Toolset**: One-stop container O&M capability integration
- **⚡ Knowledge Accumulation**: Built-in best practice experience of ACK, K8s, and container observability system
- **🛡️ Enterprise-Level**: Complete authentication, authorization, and logging mechanisms
- **📈 Extensible**: Plug-in architecture, easy to extend new functions
### 📈 1.4 Benchmark Effect Verification (Continuously Updating)
AI capability evaluation based on actual scenarios, supporting effect comparison of various AI agents and large models:
| Task Scenario | AI Agent | Large Model | Success Rate | Average Processing Time |
|------|------------|------|-------|--------|
| Pod OOM Repair | qwen_code | qwen3-coder-plus | ✅ 100% | 2.3min |
| Cluster Health Check | qwen_code | qwen3-coder-plus | ✅ 95% | 6.4min |
| Resource Anomaly Diagnosis | kubectl-ai | qwen3-32b | ✅ 90% | 4.1min |
| Historical Resource Analysis | qwen_code | qwen3-coder-plus | ✅ 85% | 3.8min |
See the latest Benchmark report in the [`benchmarks/results/`](benchmarks/results/) directory.
---
## 🚀 2. How to Use & Deploy
### 💻 2.1 Alibaba Cloud Authentication, Permission Preparation
It is recommended to configure the Alibaba Cloud account authentication for ack-mcp-server as a sub-account of the main account, and follow the principle of least privilege to grant the following permission policy sets to this sub-account.
**Required RAM Permission Policy Set**
How to add the required permissions to the RAM account of the Alibaba Cloud account, refer to the document: [RAM Permission Policy](https://help.aliyun.com/zh/ram/user-guide/policy-overview)
The current read-only permission set required by ack-mcp-server is:
- Container Service cs all read-only permissions
- Log Service log all read-only permissions
- Alibaba Cloud prometheus(arms) instance read-only permissions
- …… Subsequent addition of resource change permissions to support full lifecycle management of resources
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cs:Check*",
"cs:Describe*",
"cs:Get*",
"cs:List*",
"cs:Query*",
"cs:RunClusterCheck",
"cs:RunClusterInspect"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "arms:GetPrometheusInstance",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"log:Describe*",
"log:Get*",
"log:List*"
],
"Resource": "*"
}
]
}
```
### 💻 2.2 (Optional) Create ACK Cluster
- ACK cluster already created in the Alibaba Cloud account
- In the case where the generated cluster network is accessible, configure the corresponding Kubernetes cluster access credentials, refer to [Configuration Method](./DESIGN.md#kubernetes Cluster Access Policy), it is recommended to connect the cluster network in the production environment, and access the cluster through the internal network by configuring KUBECONFIG_MODE = ACK_PRIVATE.
### 📍 2.3 Deploy and Run ack-mcp-server
#### 2.3.1 Deployment Method 1 - Deploy in k8s Cluster Using Helm
Deploy in a Kubernetes cluster:
```bash
# Clone the code repository
git clone https://github.com/aliyun/alibabacloud-ack-mcp-server
cd alibabacloud-ack-mcp-server
# Deploy using Helm
helm install \
--set accessKeyId=<your-access-key-id> \
--set accessKeySecret=<your-access-key-secret> \
--set transport=sse \
ack-mcp-server \
./deploy/helm \
-n kube-system
```
After deployment, expose the external network access service by configuring load balancing for the ack-mcp-server service to connect to the AI Agent.
**Parameter Description**
- `accessKeyId`: AccessKeyId of the Alibaba Cloud account
- `accessKeySecret`: AccessKeySecret of the Alibaba Cloud account
#### 2.3.2 Deployment Method 2 - 📦 Deploy ack-mcp-server Using Docker Image
```bash
# Pull the image
docker pull registry-cn-beijing.ack.aliyuncs.com/acs/ack-mcp-server:latest
# Run the container
docker run \
-d \
--name ack-mcp-server \
-e ACCESS_KEY_ID="your-access-key-id" \
-e ACCESS_KEY_SECRET="your-access-key-secret" \
-p 8000:8000 \
registry-cn-beijing.ack.aliyuncs.com/acs/ack-mcp-server:latest \
python -m main_server --transport sse --host 0.0.0.0 --port 8000 --allow-write
```
#### 2.3.3 Deployment Method 3 - 💻 Start and Deploy Using Binary Method
Download the pre-compiled binary file or build the binary file locally and run:
```bash
# Build the binary file (local build)
make build-binary
# Run
./dist/ack-mcp-server --help
```
#### 2.3.4 Deployment Method 3 - Integrate with Gemini CLI extensions
```bash
# Gemini CLI
gemini extensions install https://github.com/aliyun/alibabacloud-ack-mcp-server --ref master --auto-update
# Verify:
gemini extensions list
```
Configure parameters through environment variables or `<home>/.gemini/extensions/ack-mcp-server/.env` file
Run `gemini`
### 2.4 Use ack-mcp-server through Agent
#### 2.4.1 [Qwen Code](https://github.com/QwenLM/qwen-code)
```bash
qwen mcp add --transport http --scope user ack-mcp-server <endpoint>
# Verify:
qwen mcp list
```
#### 2.4.2 [Qoder CLI](https://docs.qoder.com/cli/quick-start)
```bash
qodercli mcp add --transport http --scope user ack-mcp-server <endpoint>
# Verify:
qodercli mcp list
```
#### 2.4.3 [Gemini CLI](https://github.com/google-gemini/gemini-cli)
```bash
gemini mcp add --transport http --scope user ack-mcp-server <endpoint>
# Verify:
gemini mcp list
```
#### 2.4.4 [Claude Code](https://github.com/anthropics/claude-code)
```bash
claude mcp add --transport http --scope user ack-mcp-server <endpoint>
# Verify:
claude mcp list
```
#### 2.4.5 [Codex CLI](https://github.com/openai/codex)
```bash
codex mcp add ack-mcp-server --url <endpoint>
# Verify:
codex mcp list
```
## 🎯 3 How to Develop and Run Locally
### 💻 3.1 Environment Preparation
**Build Environment Requirements**
- Python 3.12+
- Alibaba Cloud account and AccessKey, AccessSecretKey, required permission set
- ACK cluster already created in the Alibaba Cloud account
- Configure the kubeconfig configuration that allows the ACK cluster to be accessed by the ack-mcp-server local network, refer to [Configuration Method](./DESIGN.md#kubernetes Cluster Access Policy).
- Note: It is recommended to connect the cluster network in the production environment and access the cluster through the internal network by configuring KUBECONFIG_MODE = ACK_PRIVATE. For local testing, the public network access cluster kubeconfig needs to be [enabled for public network access kubeconfig in the corresponding ACK](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/control-public-access-to-the-api-server-of-a-cluster).
### 📋 3.2 Development Environment Setup
```bash
# Clone the project
git clone https://github.com/aliyun/alibabacloud-ack-mcp-server
cd alibabacloud-ack-mcp-server
# Install dependencies
uv sync
# Activate the virtual environment (Bash)
source .venv/bin/activate
# Configure the environment
cp .env.example .env
vim .env
# Run the development service
make run
```
**Install Dependencies**
Use `uv` (recommended):
```bash
uv sync
source .venv/bin/activate
```
Or use `pip`:
```bash
pip install -r requirements.txt
```
### ⚙️ 3.3 Configuration Settings
Create a `.env` file (refer to `.env.example`):
```env
# Alibaba Cloud Credentials and Region
ACCESS_KEY_ID=your-access-key-id
ACCESS_KEY_SECRET=your-access-key-secret
# Cache Configuration
CACHE_TTL=300
CACHE_MAX_SIZE=1000
# Log Configuration
FASTMCP_LOG_LEVEL=INFO
DEVELOPMENT=false
```
> ⚠️ **Note**: When ACCESS_KEY_ID/ACCESS_KEY_SECRET is not set, some functions that rely on cloud API are not available.
### 3.4.1 Run Mode 1. Interactive interface based on [MCP Inspector](https://github.com/modelcontextprotocol/inspector) (suitable for local effect debugging)
```bash
npx @modelcontextprotocol/inspector --config ./mcp.json
```
#### 3.4.2 Run ack-mcp-server locally with python command
**Run ack-mcp-server locally in Stdio mode (suitable for local development)**
```bash
make run
# Or
python -m src.main_server
```
**Run ack-mcp-server locally in Streaming HTTP mode (recommended for online system integration)**
```bash
make run-http
# Or
python -m src.main_server --transport http --host 0.0.0.0 --port 8000
```
**Run ack-mcp-server locally in SSE mode**
```bash
make run-sse
# Or
python -m src.main_server --transport sse --host 0.0.0.0 --port 8000
```
**Common Parameters**
| Parameter | Description | Default Value |
|-----|------------------|--------------------|
| `--access-key-id` | AccessKey ID | Alibaba Cloud account credential AK |
| `--access-key-secret` | AccessKey Secret | Alibaba Cloud account credential SK |
| `--allow-write` | Enable write operations | Not enabled by default |
| `--transport` | Transport mode | stdio / sse / http |
| `--host` | Bind host | localhost |
| `--port` | Port number | 8000 |
### 3.5. Functional Test UT
```bash
# Run all test UT
make test
```
## 🛠️ 4. How to Participate in Community Contributions
### 🏗️ 4.1 Engineering Architecture Design
**Technical Stack**: Python 3.12+ + FastMCP 2.12.2+ + Alibaba Cloud SDK + Kubernetes Client
See [`DESIGN.md`](DESIGN.md) for detailed architectural design.
### 👥 4.2 Project Maintenance Mechanism
#### 🤝 How to Contribute
1. **Issue Feedback**: Via [GitHub Issues](https://github.com/aliyun/alibabacloud-ack-mcp-server/issues)
2. **Feature Request**: Discuss and communicate through DingTalk group: 70080006301
3. **Code Contribution**: Fork → Feature Branch → Pull Request
4. **Documentation Improvement**: API documentation, tutorial writing
### 💬 Community Communication
- GitHub Discussions through Issues: Technical discussions, Q&A
- DingTalk Group: Daily communication, Q&A support, community co-building. Search DingTalk group number: 70080006301
---
## 📊 5. Effects & Benchmark (Under Continuous Construction)
### 🔍 Test Scenarios
| Scenario | Description | Modules Involved |
|------|------|----------|
| Pod OOM Fix | Diagnosis and repair of memory overflow issues | kubectl, Diagnosis |
| Cluster Health Check | Comprehensive cluster status inspection | Diagnosis, Inspection |
| Resource Anomaly Diagnosis | Root cause analysis of abnormal resources | kubectl, Diagnosis |
| Historical Resource Analysis | Resource usage trend analysis | prometheus, sls |
### 📊 Effect Data
Based on the latest Benchmark results:
- Success Rate: 92%
- Average Processing Time: 4.2 minutes
- Supports AI Agents: qwen_code, kubectl-ai
- Supports LLM: qwen3-coder-plus, qwen3-32b
### How to Run Benchmark
See [`Benchmark README.md`](./benchmarks/README.md) for details.
```bash
# Run Benchmark
cd benchmarks
./run_benchmark.sh --openai-api-key your-key --agent qwen_code --model qwen3-coder-plus
```
---
## 🗺️ 6. Evolution Plan & Roadmap
### 🎯 Recent Plans
- Support full lifecycle resource O&M for ACK clusters, nodes, and functional components (addons)
- Use benchmark results as a baseline target to continuously optimize the effectiveness of core scenarios in general third-party Agents and LLM models, and improve the success rate of core O&M scenarios
- Continuously supplement core O&M scenario cases for benchmarks, covering most ACK O&M scenarios. If you have any needs, please feel free to raise an issue
- Performance optimization and cache improvements
### 🚀 Medium and Long-Term Goals
- Cover the [Five Pillars of Well-Architected Framework](https://help.aliyun.com/product/2362200.html) in container scenarios: security, stability, cost, efficiency, and performance (high reliability, etc.), and provide a better AIOps experience for multi-step complex container O&M scenarios.
- - Insight and governance of cluster costs
- - Best practices for cluster elastic scaling
- - Discovery and governance of cluster security vulnerabilities
- - ……
- Enterprise-level features (RBAC, security scanning)
- AI automated O&M capabilities
## 7. Common Issues
- **AK not configured**: Please check the ACCESS_KEY_ID/ACCESS_KEY_SECRET environment variables
- **ACK cluster network is not accessible**: When ack-mcp-server uses KUBECONFIG_MODE = ACK_PUBLIC public network mode to access the cluster kubeconfig, the ACK cluster needs to enable public network access to the kubeconfig. In a production environment, it is recommended to connect the cluster network and use ACK_PRIVATE private network mode to access the cluster kubeconfig in order to comply with production security best practices.
## 8. Security
- Please send an email to **kubernetes-security@service.aliyun.com** to report security vulnerabilities. See the [SECURITY.md](./SECURITY.md) file for details.
## License
Apache-2.0. See [`LICENSE`](LICENSE).
Connection Info
You Might Also Like
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
firecrawl
Firecrawl MCP Server enables web scraping, crawling, and content extraction.
servers
Model Context Protocol Servers
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.