Content
# Lynkr - Claude Code Proxy with Multi-Provider Support
[](https://www.npmjs.com/package/lynkr)
[](https://github.com/vishalveerareddy123/homebrew-lynkr)
[](LICENSE)
[](https://deepwiki.com/vishalveerareddy123/Lynkr)
[](https://www.databricks.com/)
[](https://aws.amazon.com/bedrock/)
[](https://openai.com/)
[](https://ollama.ai/)
[](https://github.com/ggerganov/llama.cpp)
> **Production-ready Claude Code proxy supporting 9+ LLM providers with 60-80% cost reduction through token optimization.**
---
## Overview
Lynkr is a **self-hosted proxy server** that unlocks Claude Code CLI and Cursor IDE by enabling:
- 🚀 **Any LLM Provider** - Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Ollama (local), llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio
- 💰 **60-80% Cost Reduction** - Built-in token optimization with smart tool selection, prompt caching, and memory deduplication
- 🔒 **100% Local/Private** - Run completely offline with Ollama or llama.cpp
- 🎯 **Zero Code Changes** - Drop-in replacement for Anthropic's backend
- 🏢 **Enterprise-Ready** - Circuit breakers, load shedding, Prometheus metrics, health checks
**Perfect for:**
- Developers who want provider flexibility and cost control
- Enterprises needing self-hosted AI with observability
- Privacy-focused teams requiring local model execution
- Teams seeking 60-80% cost reduction through optimization
---
## 💰 Cost Savings
Lynkr reduces AI costs by **60-80%** through intelligent token optimization:
### Real-World Savings Example
**Scenario:** 100,000 API requests/month, 50k input tokens, 2k output tokens per request
| Provider | Without Lynkr | With Lynkr | **Monthly Savings** | **Annual Savings** |
|----------|---------------|------------|---------------------|-------------------|
| **Claude Sonnet 4.5** (Databricks) | $16,000 | $6,400 | **$9,600** | **$115,200** |
| **GPT-4o** (OpenRouter) | $12,000 | $4,800 | **$7,200** | **$86,400** |
| **Ollama** (Local) | API costs | **$0** | **$12,000+** | **$144,000+** |
### How We Achieve 60-80% Cost Reduction
**6 Token Optimization Phases:**
1. **Smart Tool Selection** (50-70% reduction)
- Filters tools based on request type
- Chat queries don't get file/git tools
- Only sends relevant tools to model
2. **Prompt Caching** (30-45% reduction)
- Caches repeated prompts and system messages
- Reuses context across conversations
- Reduces redundant token usage
3. **Memory Deduplication** (20-30% reduction)
- Removes duplicate conversation context
- Compresses historical messages
- Eliminates redundant information
4. **Tool Response Truncation** (15-25% reduction)
- Truncates long tool outputs intelligently
- Keeps only relevant portions
- Reduces tool result tokens
5. **Dynamic System Prompts** (10-20% reduction)
- Adapts prompts to request complexity
- Shorter prompts for simple queries
- Full prompts only when needed
6. **Conversation Compression** (15-25% reduction)
- Summarizes old conversation turns
- Keeps recent context detailed
- Archives historical context
📖 **[Detailed Token Optimization Guide](documentation/token-optimization.md)**
---
## 🚀 Key Features
### Multi-Provider Support (9+ Providers)
- ✅ **Cloud Providers:** Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
- ✅ **Local Providers:** Ollama (free), llama.cpp (free), LM Studio (free)
- ✅ **Hybrid Routing:** Automatically route between local (fast/free) and cloud (powerful) based on complexity
- ✅ **Automatic Fallback:** Transparent failover if primary provider is unavailable
### Cost Optimization
- 💰 **60-80% Token Reduction** - 6-phase optimization pipeline
- 💰 **$77k-$115k Annual Savings** - For typical enterprise usage (100k requests/month)
- 💰 **100% FREE Option** - Run completely locally with Ollama or llama.cpp
- 💰 **Hybrid Routing** - 65-100% cost savings by using local models for simple requests
### Privacy & Security
- 🔒 **100% Local Operation** - Run completely offline with Ollama/llama.cpp
- 🔒 **Air-Gapped Deployments** - No internet required for local providers
- 🔒 **Self-Hosted** - Full control over your data and infrastructure
- 🔒 **Local Embeddings** - Private @Codebase search with Ollama/llama.cpp
- 🔐 **Policy Enforcement** - Git restrictions, test requirements, web fetch controls
- 🔐 **Sandboxing** - Optional Docker isolation for MCP tools
### Enterprise Features
- 🏢 **Production-Ready** - Circuit breakers, load shedding, graceful shutdown
- 🏢 **Observability** - Prometheus metrics, structured logging, health checks
- 🏢 **Kubernetes-Ready** - Liveness, readiness, startup probes
- 🏢 **High Performance** - ~7μs overhead, 140K req/sec throughput
- 🏢 **Reliability** - Exponential backoff, automatic retries, error resilience
- 🏢 **Scalability** - Horizontal scaling, connection pooling, load balancing
### IDE Integration
- ✅ **Claude Code CLI** - Drop-in replacement for Anthropic backend
- ✅ **Cursor IDE** - Full OpenAI API compatibility (Requires Cursor Pro)
- ✅ **Continue.dev** - Works with any OpenAI-compatible client
- ✅ **Cline +VSCode** - Confgiure it similar to cursor in openai compatible section
### Advanced Capabilities
- 🧠 **Long-Term Memory** - Titans-inspired memory system with surprise-based filtering
- 🧠 **Semantic Memory** - FTS5 search with multi-signal retrieval (recency, importance, relevance)
- 🧠 **Automatic Extraction** - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
- 🔧 **MCP Integration** - Automatic Model Context Protocol server discovery
- 🔧 **Tool Calling** - Full tool support with server and client execution modes
- 🔧 **Custom Tools** - Easy integration of custom tool implementations
- 🔍 **Embeddings Support** - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
- 📊 **Token Tracking** - Real-time usage monitoring and cost attribution
### Developer Experience
- 🎯 **Zero Code Changes** - Works with existing Claude Code CLI/Cursor setups
- 🎯 **Hot Reload** - Development mode with auto-restart
- 🎯 **Comprehensive Logging** - Structured logs with request ID correlation
- 🎯 **Easy Configuration** - Environment variables or .env file
- 🎯 **Docker Support** - docker-compose with GPU support
- 🎯 **400+ Tests** - Comprehensive test coverage for reliability
### Streaming & Performance
- ⚡ **Real-Time Streaming** - Token-by-token streaming for all providers
- ⚡ **Low Latency** - Minimal overhead (~7μs per request)
- ⚡ **High Throughput** - 140K requests/second capacity
- ⚡ **Connection Pooling** - Efficient connection reuse
- ⚡ **Prompt Caching** - LRU cache with SHA-256 keying
📖 **[Complete Feature Documentation](documentation/features.md)**
---
## Quick Start
### Installation
**Option 1: NPM Package (Recommended)**
```bash
# Install globally
npm install -g lynkr
# Or run directly with npx
npx lynkr
```
**Option 2: Git Clone**
```bash
# Clone repository
git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr
# Install dependencies
npm install
# Create .env from example
cp .env.example .env
# Edit .env with your provider credentials
nano .env
# Start server
npm start
```
**Option 3: Homebrew (macOS/Linux)**
```bash
brew tap vishalveerareddy123/lynkr
brew install lynkr
lynkr start
```
**Option 4: Docker**
```bash
docker-compose up -d
```
---
## Supported Providers
Lynkr supports **9+ LLM providers**:
| Provider | Type | Models | Cost | Privacy |
|----------|------|--------|------|---------|
| **AWS Bedrock** | Cloud | 100+ (Claude, Titan, Llama, Mistral, etc.) | $$-$$$ | Cloud |
| **Databricks** | Cloud | Claude Sonnet 4.5, Opus 4.5 | $$$ | Cloud |
| **OpenRouter** | Cloud | 100+ (GPT, Claude, Llama, Gemini, etc.) | $-$$ | Cloud |
| **Ollama** | Local | Unlimited (free, offline) | **FREE** | 🔒 100% Local |
| **llama.cpp** | Local | GGUF models | **FREE** | 🔒 100% Local |
| **Azure OpenAI** | Cloud | GPT-4o, GPT-5, o1, o3 | $$$ | Cloud |
| **Azure Anthropic** | Cloud | Claude models | $$$ | Cloud |
| **OpenAI** | Cloud | GPT-4o, o1, o3 | $$$ | Cloud |
| **LM Studio** | Local | Local models with GUI | **FREE** | 🔒 100% Local |
📖 **[Full Provider Configuration Guide](documentation/providers.md)**
---
## Claude Code Integration
Configure Claude Code CLI to use Lynkr:
```bash
# Set Lynkr as backend
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy
# Run Claude Code
claude "Your prompt here"
```
That's it! Claude Code now uses your configured provider.
📖 **[Detailed Claude Code Setup](documentation/claude-code-cli.md)**
---
## Cursor Integration
Configure Cursor IDE to use Lynkr:
1. **Open Cursor Settings**
- Mac: `Cmd+,` | Windows/Linux: `Ctrl+,`
- Navigate to: **Features** → **Models**
2. **Configure OpenAI API Settings**
- **API Key**: `sk-lynkr` (any non-empty value)
- **Base URL**: `http://localhost:8081/v1`
- **Model**: `claude-3.5-sonnet` (or your provider's model)
3. **Test It**
- Chat: `Cmd+L` / `Ctrl+L`
- Inline edits: `Cmd+K` / `Ctrl+K`
- @Codebase search: Requires [embeddings setup](documentation/embeddings.md)
📖 **[Full Cursor Setup Guide](documentation/cursor-integration.md)** | **[Embeddings Configuration](documentation/embeddings.md)**
---
## Documentation
### Getting Started
- 📦 **[Installation Guide](documentation/installation.md)** - Detailed installation for all methods
- ⚙️ **[Provider Configuration](documentation/providers.md)** - Complete setup for all 9+ providers
- 🎯 **[Quick Start Examples](documentation/installation.md#quick-start-examples)** - Copy-paste configs
### IDE Integration
- 🖥️ **[Claude Code CLI Setup](documentation/claude-code-cli.md)** - Connect Claude Code CLI
- 🎨 **[Cursor IDE Setup](documentation/cursor-integration.md)** - Full Cursor integration with troubleshooting
- 🔍 **[Embeddings Guide](documentation/embeddings.md)** - Enable @Codebase semantic search (4 options: Ollama, llama.cpp, OpenRouter, OpenAI)
### Features & Capabilities
- ✨ **[Core Features](documentation/features.md)** - Architecture, request flow, format conversion
- 🧠 **[Memory System](documentation/memory-system.md)** - Titans-inspired long-term memory
- 💰 **[Token Optimization](documentation/token-optimization.md)** - 60-80% cost reduction strategies
- 🔧 **[Tools & Execution](documentation/tools.md)** - Tool calling, execution modes, custom tools
### Deployment & Operations
- 🐳 **[Docker Deployment](documentation/docker.md)** - docker-compose setup with GPU support
- 🏭 **[Production Hardening](documentation/production.md)** - Circuit breakers, load shedding, metrics
- 📊 **[API Reference](documentation/api.md)** - All endpoints and formats
### Support
- 🔧 **[Troubleshooting](documentation/troubleshooting.md)** - Common issues and solutions
- ❓ **[FAQ](documentation/faq.md)** - Frequently asked questions
- 🧪 **[Testing Guide](documentation/testing.md)** - Running tests and validation
---
## External Resources
- 📚 **[DeepWiki Documentation](https://deepwiki.com/vishalveerareddy123/Lynkr)** - AI-powered documentation search
- 💬 **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Community Q&A
- 🐛 **[Report Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Bug reports and feature requests
- 📦 **[NPM Package](https://www.npmjs.com/package/lynkr)** - Official npm package
---
## Key Features Highlights
- ✅ **Multi-Provider Support** - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
- ✅ **60-80% Cost Reduction** - Token optimization with smart tool selection, prompt caching, memory deduplication
- ✅ **100% Local Option** - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
- ✅ **OpenAI Compatible** - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
- ✅ **Embeddings Support** - 4 options for @Codebase search: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
- ✅ **MCP Integration** - Automatic Model Context Protocol server discovery and orchestration
- ✅ **Enterprise Features** - Circuit breakers, load shedding, Prometheus metrics, K8s health checks
- ✅ **Streaming Support** - Real-time token streaming for all providers
- ✅ **Memory System** - Titans-inspired long-term memory with surprise-based filtering
- ✅ **Tool Calling** - Full tool support with server and passthrough execution modes
- ✅ **Production Ready** - Battle-tested with 400+ tests, observability, and error resilience
---
## Architecture
```
┌─────────────────┐
│ Claude Code CLI │ or Cursor IDE
└────────┬────────┘
│ Anthropic/OpenAI Format
↓
┌─────────────────┐
│ Lynkr Proxy │
│ Port: 8081 │
│ │
│ • Format Conv. │
│ • Token Optim. │
│ • Provider Route│
│ • Tool Calling │
│ • Caching │
└────────┬────────┘
│
├──→ Databricks (Claude 4.5)
├──→ AWS Bedrock (100+ models)
├──→ OpenRouter (100+ models)
├──→ Ollama (local, free)
├──→ llama.cpp (local, free)
├──→ Azure OpenAI (GPT-4o, o1)
├──→ OpenAI (GPT-4o, o3)
└──→ Azure Anthropic (Claude)
```
📖 **[Detailed Architecture](documentation/features.md#architecture)**
---
## Quick Configuration Examples
**100% Local (FREE)**
```bash
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:latest
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
npm start
```
**AWS Bedrock (100+ models)**
```bash
export MODEL_PROVIDER=bedrock
export AWS_BEDROCK_API_KEY=your-key
export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
npm start
```
**OpenRouter (simplest cloud)**
```bash
export MODEL_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-v1-your-key
npm start
```
📖 **[More Examples](documentation/providers.md#quick-start-examples)**
---
## Contributing
We welcome contributions! Please see:
- **[Contributing Guide](documentation/contributing.md)** - How to contribute
- **[Testing Guide](documentation/testing.md)** - Running tests
---
## License
Apache 2.0 - See [LICENSE](LICENSE) file for details.
---
## Community & Support
- ⭐ **Star this repo** if Lynkr helps you!
- 💬 **[Join Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions, share tips
- 🐛 **[Report Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Bug reports welcome
- 📖 **[Read the Docs](documentation/)** - Comprehensive guides
---
**Made with ❤️ by developers, for developers.**
Connection Info
You Might Also Like
MarkItDown MCP
MarkItDown-MCP is a lightweight server for converting URIs to Markdown.
Time
A Model Context Protocol server for time and timezone conversions.
Filesystem
Node.js MCP Server for filesystem operations with dynamic access control.
Sequential Thinking
A structured MCP server for dynamic problem-solving and reflective thinking.
Git
A Model Context Protocol server for Git automation and interaction.
Everything
Model Context Protocol Servers