Content
# MCP Demo - Intelligent Video Processing AI-Agent Based on NVIDIA NIM
<div align="center">




An intelligent video processing AI-Agent based on MCP (Model Context Protocol), integrating NVIDIA NIM, FFmpeg, and web search capabilities to provide a natural language video editing experience.
[🚀 Quick Start](#-快速开始) • [📖 User Guide](#-使用指南) • [🛠️ API Documentation](#️-api-文档)
</div>
## ✨ Project Features
### 🎯 Core Highlights
- **🤖 Natural Language Interaction**: Describe requirements in Chinese, and the AI automatically selects the appropriate tools to execute
- **🎬 Professional Video Processing**: A complete video editing toolchain based on FFmpeg
- **🌐 Modern Web Interface**: Responsive design, supporting drag-and-drop uploads and real-time previews
- **⚡ Streamed Responses**: Real-time display of processing progress and AI thought processes
### 🛠️ Supported Video Operations
| Feature | Description | Example Command |
|---------|-------------|-----------------|
| 📹 **Video Information** | Get detailed information such as duration, resolution, encoding, etc. | "Get details of video.mp4" |
| ✂️ **Smart Cutting** | Precisely cut video clips by time segment | "Cut 1 minute starting from the 30th second" |
| 🔗 **Seamless Merging** | Smartly stitch multiple video files together | "Merge these three videos into one" |
| 📐 **Resolution Adjustment** | Video scaling and resolution conversion | "Adjust the video to 1080p" |
| 🎭 **Picture-in-Picture Effect** | Video overlay and picture-in-picture creation | "Add a small window in the top right corner of the main video" |
| 🎵 **Audio Extraction** | Extract high-quality audio from video | "Extract background music from the video" |
| 🖼️ **Frame Extraction** | Extract video screenshots by frame rate | "Extract one image per second" |
| ▶️ **Preview Playback** | Built-in video player for preview | "Play the processed video" |
## 📁 Project Structure
```text
mcp_demo/
├── 🌐 Web Frontend Layer
│ ├── static/
│ │ ├── index.html # Main interface - Modern responsive design
│ │ ├── demo_separated.html # AI conversation demonstration page
│ │ ├── test_stream.html # Streaming response test page
│ │ ├── style.css # Style file - CSS Grid + Flexbox
│ │ └── script.js # Frontend logic - Native ES6+
│ └── app.py # FastAPI Web server
│
├── 🤖 AI Processing Layer
│ ├── ffmpeg_mcp_demo.py # MCP client core
│ ├── ffmpeg_mcp_config.py # Configuration management
│ └── demo_web.py # Web demonstration script
│
├── 🎬 Video Processing Layer (Submodule)
│ └── ffmpeg-mcp/ # FFmpeg MCP server
│ └── src/ffmpeg_mcp/
│ ├── server.py # MCP protocol server
│ ├── cut_video.py # Core algorithm for video processing
│ ├── ffmpeg.py # FFmpeg command encapsulation
│ ├── typedef.py # Type definitions and data structures
│ └── utils.py # Utility function library
│
├── 📁 Data Storage Layer
│ ├── uploads/ # User uploaded files
│ └── outputs/ # Processed results output
│
└── ⚙️ Configuration Files
├── pyproject.toml # Project dependencies and configuration
├── uv.lock # Dependency version lock
├── .gitmodules # Git submodule configuration
└── env.example # Environment variable template
```
## 🚀 Quick Start
### 📋 Environment Requirements
- **Python**: 3.12+ (recommended 3.12.7)
- **Package Manager**: [uv](https://docs.astral.sh/uv/) (modern Python package management)
- **System Tools**: Git, FFmpeg
- **API Key**: NVIDIA API Key
### 🔧 Installation Steps
#### 1️⃣ Clone the Project
```bash
```
# Clone the Main Project
git clone https://github.com/JackyHua23/mcp_demo.git
cd mcp_demo
# Initialize Submodules
```bash
git submodule update --init --recursive
```
#### 2️⃣ Install Dependencies
```bash
# Install main project dependencies using uv
uv sync
# Install FFmpeg MCP Submodule Dependencies
cd ffmpeg-mcp
uv sync
cd ..
#### 3️⃣ Configure Environment Variables
```bash
# Copy Environment Variable Template
cp env.example .env
# Edit Configuration File
nano .env
```
**Environment Variable Configuration:**
```bash
# NVIDIA API Key (Required) - Get it at: https://build.nvidia.com/
NVIDIA_API_KEY="your_nvidia_api_key_here"
```
#### 4️⃣ Start the Application
```bash
# Method 1: Start with the demo script (Recommended)
uv run python demo_web.py
# Method 2: Start FastAPI Application Directly
uv run python app.py
# Method 3: Start with uvicorn (Development Mode)
```
uv run uvicorn app:app --host 0.0.0.0 --port 8000 --reload
```
🎉 **Access the application**: http://localhost:8000
## 💻 User Guide
### 🌐 Web Interface Operations
#### 📤 File Upload
1. **Drag and Drop Upload**: Drag the video file to the upload area on the left
2. **Click to Upload**: Click the upload button to select a file
3. **Supported Formats**: MP4, AVI, MOV, MKV, WMV, FLV, WebM
#### 💬 Intelligent Dialogue
Enter natural language commands in the chat area on the right:
```text
✅ Supported command examples:
• "Get detailed information about the current video"
• "Cut 1 minute of content starting from the 30th second"
• "Adjust the video resolution to 1920x1080"
• "Extract audio from the video and save it in MP3 format"
• "Add a watermark effect to the top right corner of the video"
```
#### ⚡ Quick Actions
Use preset buttons to quickly perform common operations:
- 🔍 **Get Info** - View detailed parameters of the video
- ✂️ **Smart Cut** - Quickly trim video clips
- 🎵 **Extract Audio** - Export audio files
- 📐 **Adjust Size** - Modify video resolution
### 🖥️ Command Line Usage
#### Basic Example
```python
import asyncio
from ffmpeg_mcp_demo import FFmpegMCPClient
async def main():
client = FFmpegMCPClient()
# Natural Language Processing
response = await client.process_video_request(
"Cut uploads/video.mp4 starting from the 10th second for 30 seconds"
)
print(response)
asyncio.run(main())
```
#### Advanced Configuration
```python
from ffmpeg_mcp_config import FFmpegMCPConfig
from ffmpeg_mcp_demo import FFmpegMCPClient
```
# Custom Configuration
```python
config = FFmpegMCPConfig(
api_key="your_nvidia_api_key",
model="nvidia/llama-3.1-nemotron-ultra-253b-v1",
base_url="https://integrate.api.nvidia.com/v1"
)
client = FFmpegMCPClient(
api_key=config.api_key,
model=config.model,
base_url=config.base_url
)
```
## 🛠️ API Documentation
### 🌐 Web API Endpoints
| Method | Endpoint | Description | Parameters |
|--------|----------|-------------|------------|
| `GET` | `/` | Main page | - |
| `GET` | `/demo` | AI conversation demo page | - |
| `POST` | `/api/upload` | File upload | `file: UploadFile` |
| `GET` | `/api/files` | Get file list | - |
| `POST` | `/api/process` | Process video request | `message: str, video_path?: str` |
| `POST` | `/api/process-stream` | Stream processing request | `message: str, video_path?: str` |
| `GET` | `/api/tools` | Get available tools | - |
| `GET` | `/api/download/{type}/{filename}` | File download | `type: str, filename: str` |
| `DELETE` | `/api/files/{type}/{filename}` | File deletion | `type: str, filename: str` |
### 🎬 FFmpeg MCP Tools
| Tool Name | Function Description | Parameter Description |
|-----------|---------------------|----------------------|
| `find_video_path` | Smartly find video files | `root_path`, `video_name` |
| `get_video_info` | Get detailed information about the video | `video_path` |
| `clip_video` | Precisely cut video segments | `video_path`, `start`, `end/duration`, `output_path?` |
| `concat_videos` | Seamlessly merge multiple videos | `input_files[]`, `output_path?`, `fast?` |
| `scale_video` | Adjust video resolution | `video_path`, `width`, `height`, `output_path?` |
| `overlay_video` | Video overlay effects | `background_video`, `overlay_video`, `position?`, `dx?`, `dy?` |
| `extract_audio_from_video` | Extract audio tracks | `video_path`, `output_path?`, `audio_format?` |
| `extract_frames_from_video` | Extract video frames | `video_path`, `fps?`, `output_folder?`, `format?` |
| `play_video` | Play video preview | `video_path`, `speed?`, `loop?` |
## 🎯 Technical Stack Overview
### 🔧 Backend Technology
- **FastAPI**: A high-performance asynchronous web framework that automatically generates API documentation
- **MCP**: Model Context Protocol, the standard protocol for AI tool invocation
- **NVIDIA NIM**: Enterprise-level AI inference service that supports Llama 3.1 Nemotron
- **FFmpeg**: The industry-standard multimedia processing tool
- **uv**: The next-generation Python package manager, 10-100 times faster than pip
### 🎨 Frontend Technology
- **HTML5**: Semantic markup, supports Drag and Drop API
- **CSS3**: Modern styles, Grid + Flexbox layout, CSS animations
- **JavaScript ES6+**: Native JavaScript, Fetch API, WebSocket
- **Font Awesome**: Vector icon library
### 📦 Dependency Management
- **pyproject.toml**: Modern Python project configuration standard
- **uv.lock**: Ensures consistency of dependency versions
- **Git Submodules**: Modular code management
Connection Info
You Might Also Like
OpenAI Whisper
OpenAI Whisper MCP Server - 基于本地 Whisper CLI 的离线语音识别与翻译,无需 API Key,支持...
markitdown
Python tool for converting files and office documents to Markdown.
oh-my-opencode
Background agents · Curated agents like oracle, librarians, frontend...
vibeframe
AI-Native Video Editor — CLI-first, MCP-ready. Generate, edit, and ship...
agent-bridge
A local bridge for bidirectional collaboration between Claude Code and Codex.
Awareness-Local
Local-first AI agent memory — one command, works offline, no account needed....