Content
# Extractous MCP Server
A Model Context Protocol (MCP) server for text extraction using the extractous library.
## Features
- Extract text from various file formats (PDF, DOCX, HTML, images with OCR, audio/video with speech-to-text)
- Get file metadata along with extracted text
- Support for MIME type override
- List supported file formats
## Installation
```bash
# Install dependencies
uv pip install -e .
# Or using pip
pip install extractous-mcp
```
## Usage
### As MCP Server
```bash
# Run the server
extractous-mcp
# Or run directly
python -m extractous_mcp.server
```
### Available Tools
1. **extract_text**
- Extract text from a file
- Parameters: `file_path` (required), `mime_type` (optional)
2. **extract_text_with_metadata**
- Extract text and metadata from a file
- Parameters: `file_path` (required), `mime_type` (optional)
3. **get_supported_formats**
- Get list of supported file formats
- No parameters
### Example Usage
```python
# Extract text from PDF
extract_text("/path/to/document.pdf")
# Extract with specific MIME type
extract_text("/path/to/file", mime_type="application/pdf")
# Get text with metadata
extract_text_with_metadata("/path/to/document.docx")
# Check supported formats
get_supported_formats()
```
## Supported Formats
- **Documents**: PDF, DOC, DOCX, RTF, ODT
- **Spreadsheets**: XLS, XLSX, ODS, CSV
- **Presentations**: PPT, PPTX, ODP
- **Web**: HTML, HTM, XML
- **Text**: TXT, MD, JSON, YAML, YML
- **Images**: JPG, JPEG, PNG, TIFF, BMP, GIF (with OCR)
- **Audio**: MP3, WAV, FLAC, M4A, OGG (speech-to-text)
- **Video**: MP4, AVI, MOV, MKV, WEBM
## Development
```bash
# Install in development mode
uv pip install -e .
# Run tests
python -m pytest tests/
```
Connection Info
You Might Also Like
markitdown
Python tool for converting files and office documents to Markdown.
markitdown
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
TrendRadar
TrendRadar: Your hotspot assistant for real news in just 30 seconds.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.
mempalace
The highest-scoring AI memory system ever benchmarked. And it's free.