Content
# Visara - Visual MCP Server
Visara is a Model Context Protocol (MCP) compliant visual analysis server that provides image processing capabilities through the official MCP protocol. It can analyze images, extract text content, understand scenes, and provide detailed descriptions for frontend development workflows.
## Features
- **MCP Protocol Compliance**: Full compliance with the Model Context Protocol specification using the official `@modelcontextprotocol/sdk`
- **Image Analysis**: Analyze images and extract detailed information including objects, text, and scene understanding
- **Frontend Development Support**: Specialized prompts for UI/UX analysis and frontend development
- **Local File Path Support**: Automatically converts local file paths to base64 data URLs
- **Production Ready**: Includes Docker support, health checks, and caching
- **Qwen-VL Plus Integration**: Connects to Qwen-VL Plus multimodal API for advanced image analysis
## Installation
```bash
git clone <repository-url>
cd visara
npm install
```
## Usage
### Development
```bash
# Build the project
npm run build
# Start the server
npm start
```
The server will be available at `http://localhost:9451`.
### Docker
```bash
# Copy environment variables
cp .env.example .env
# Edit .env with your Qwen-VL API key
# Build and run with Docker Compose
docker-compose up --build
```
## MCP Endpoints
- `GET /health` - Health check endpoint
- `GET /tools` - List available tools
- `GET /resources` - List available resources
- `GET /prompts` - List available prompts
- `POST /` - Main MCP endpoint for tool calls
- `POST /images/upload` - File upload endpoint for direct image processing
## Tools
### `analyze_image`
Analyze an image and extract detailed information.
**Parameters:**
- `imageUrl` (string, required): URL of the image to analyze or local file path
- `imageBase64` (string, optional): Base64 encoded image data
- `prompt` (string, optional): Custom prompt for image analysis
- `model` (string, optional): Model to use (default: qwen-vl-plus)
- `temperature` (number, optional): Temperature for generation (0.0-1.0)
- `maxTokens` (number, optional): Maximum tokens for response
## Prompts
- **detailed_description**: Get a detailed description of all visible elements in the image
- **frontend_ui_analysis**: Analyze UI/UX prototype and extract component structure, layout, and styling information
- **react_component_generation**: Generate React component structure based on UI prototype
- **css_style_extraction**: Extract detailed CSS styles, colors, typography, and spacing
- **ui_component_inventory**: Create inventory of all UI components and elements present in the prototype
- **responsive_design_analysis**: Analyze responsive design aspects and breakpoints
- **object_detection**: Identify and list all objects in the image with their positions
- **text_extraction**: Extract all visible text from the image
- **scene_understanding**: Provide high-level understanding of the scene context
## Environment Variables
- `QWEN_VL_API_KEY`: Your Qwen-VL API key from https://dashscope.console.aliyun.com/apiKey
- `QWEN_VL_API_BASE_URL`: Qwen-VL API base URL (default: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation)
- `PORT`: Server port (default: 9451)
- `HOST`: Server host (default: 0.0.0.0)
- `CACHE_TTL`: Cache time-to-live in seconds (default: 3600)
- `MAX_FILE_SIZE`: Maximum file size for uploads in bytes (default: 10485760 = 10MB)
- `ALLOWED_MIME_TYPES`: Allowed MIME types for file uploads (default: image/jpeg,image/png,image/webp)
## License
MIT