llm-cli-gateway
Unified MCP server providing access to Claude Code, Codex, and Gemini CLIs through a single gateway. Features multi-LLM orchestration, persistent session management, async job execution with polling, approval gates, retry with circuit breakers, and token optimization. Install: npx -y llm-cli-gateway
llm-cli-gateway
"Without consultation, plans are frustrated, but with many counselors they succeed." — Proverbs 15:22 (LSB)
A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, and Gemini CLIs with session management, retry logic, and async job orchestration.
Features
Core Capabilities
- Multi-LLM Orchestration: Unified interface for Claude Code, Codex, and Gemini CLIs
- Session Management: Track and resume conversations across all CLIs with persistent storage
- Token Optimization: Automatic 44% reduction on prompts, 37% on responses (opt-in)
- Correlation ID Tracking: Full request tracing across all LLM interactions
- Cross-Tool Collaboration: LLMs can use each other via MCP (validated through dogfooding)
Reliability & Performance
- Retry Logic: Exponential backoff with circuit breaker for transient failures
- Atomic File Writes: Process-specific temp files with fsync for data integrity
- Memory Limits: 50MB cap on CLI output prevents DoS attacks
- NVM Path Caching: Eliminates I/O overhead on every request
- Long-Running Jobs: Non-time-bound async execution via
*_request_async+ polling tools
Security & Quality
- Comprehensive Testing: 221 tests covering unit, integration, and regression scenarios
- Input Validation: Zod schemas prevent injection attacks
- No Secret Leakage: Generic session descriptions only (file permissions 0o600)
- No ReDoS: Bounded regex patterns prevent catastrophic backtracking
- Type Safety: Strict TypeScript with comprehensive error handling
- 221 Tests: Unit, integration, and regression tests with real CLI execution
Prerequisites
Before using this gateway, you need to install the CLI tools you want to use:
Claude Code CLI
# Installation instructions for Claude Code
# Visit: https://docs.anthropic.com/claude-code
npm install -g @anthropic-ai/claude-code
Codex CLI
npm install -g @openai/codex
codex login
Gemini CLI
npm install -g @google/gemini-cli
# Or: https://github.com/google-gemini/gemini-cli
Installation
As an MCP server (npm)
npm install -g llm-cli-gateway
Or use directly with npx:
{
"mcpServers": {
"llm-gateway": {
"command": "npx",
"args": ["-y", "llm-cli-gateway"]
}
}
}
From source
git clone https://github.com/verivus-oss/llm-cli-gateway.git
cd llm-cli-gateway
npm install
npm run build
Usage
As an MCP Server
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"llm-cli-gateway": {
"command": "node",
"args": ["/path/to/llm-cli-gateway/dist/index.js"]
}
}
}
Available Tools
LLM Request Tools
claude_request
Execute a Claude Code request with optional session management.
Parameters:
prompt(string, required): The prompt to send (1-100,000 chars)model(string, optional): Model name or alias (uselist_modelsfor available values; supportslatest)outputFormat(string, optional): Output format ("text" or "json"), default: "text"sessionId(string, optional): Specific session ID to usecontinueSession(boolean, optional): Continue the active sessioncreateNewSession(boolean, optional): Always create a new sessionallowedTools(string[], optional): Restrict Claude tools to this allow-listdisallowedTools(string[], optional): Explicitly deny listed Claude toolsdangerouslySkipPermissions(boolean, optional): Request CLI-side permission bypass (legacy mode only)approvalStrategy(string, optional):"legacy"(default) or"mcp_managed"approvalPolicy(string, optional):"strict","balanced", or"permissive"mcpServers(string[], optional): Claude MCP servers to expose (default:["sqry","exa","ref_tools"];"trstr"available as opt-in)strictMcpConfig(boolean, optional): Require Claude to use only supplied MCP config, default: true (request fails if any requested server is unavailable)optimizePrompt(boolean, optional): Optimize prompt for token efficiency (44% reduction), default: falseoptimizeResponse(boolean, optional): Optimize response for token efficiency (37% reduction), default: falsecorrelationId(string, optional): Request trace ID (auto-generated if omitted)
Response extras:
approval: Approval decision record whenapprovalStrategy="mcp_managed"mcpServers: Requested/enabled/missing MCP servers for this call
Example:
{
"prompt": "Write a Python function to calculate fibonacci numbers",
"model": "sonnet",
"continueSession": true,
"optimizePrompt": true,
"optimizeResponse": true
}
codex_request
Execute a Codex request with optional session tracking.
Parameters:
prompt(string, required): The prompt to send (1-100,000 chars)model(string, optional): Model name or alias (uselist_modelsfor available values; supportslatest, recommended:gpt-5.4)fullAuto(boolean, optional): Enable full-auto mode, default: falsedangerouslyBypassApprovalsAndSandbox(boolean, optional): Request Codex bypass flagsapprovalStrategy(string, optional):"legacy"(default) or"mcp_managed"approvalPolicy(string, optional):"strict","balanced", or"permissive"mcpServers(string[], optional): MCP servers expected for Codex execution contextsessionId(string, optional): Session identifier for trackingcreateNewSession(boolean, optional): Always create a new sessionoptimizePrompt(boolean, optional): Optimize prompt for token efficiency, default: falseoptimizeResponse(boolean, optional): Optimize response for token efficiency, default: falsecorrelationId(string, optional): Request trace ID (auto-generated if omitted)idleTimeoutMs(number, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms
Response extras:
approval: Approval decision record whenapprovalStrategy="mcp_managed"mcpServers: Requested MCP servers for this call
Example:
{
"prompt": "Create a REST API endpoint",
"model": "gpt-5.4",
"fullAuto": true,
"optimizePrompt": true
}
gemini_request
Execute a Gemini CLI request with session support.
Parameters:
prompt(string, required): The prompt to send (1-100,000 chars)model(string, optional): Model name or alias (uselist_modelsfor available values; supportslatest,pro,flash)sessionId(string, optional): Session ID to resumeresumeLatest(boolean, optional): Resume the latest session automaticallycreateNewSession(boolean, optional): Always create a new sessionapprovalMode(string, optional): Gemini approval mode (default|auto_edit|yolo) in legacy modeapprovalStrategy(string, optional):"legacy"(default) or"mcp_managed"approvalPolicy(string, optional):"strict","balanced", or"permissive"mcpServers(string[], optional): Allowed Gemini MCP server namesallowedTools(string[], optional): Restrict Gemini tools to this allow-listincludeDirs(string[], optional): Additional workspace directories for GeminioptimizePrompt(boolean, optional): Optimize prompt for token efficiency, default: falseoptimizeResponse(boolean, optional): Optimize response for token efficiency, default: falsecorrelationId(string, optional): Request trace ID (auto-generated if omitted)
Response extras:
approval: Approval decision record whenapprovalStrategy="mcp_managed"mcpServers: Requested MCP servers for this call
Example:
{
"prompt": "Explain quantum computing",
"model": "latest",
"resumeLatest": true,
"optimizePrompt": true
}
claude_request_async / codex_request_async
Start a long-running Claude or Codex request without waiting for completion in the same MCP call.
Use this flow when analysis/runtime can exceed client tool-call limits:
- Start job with
*_request_async - Poll with
llm_job_status - Fetch output with
llm_job_result - Optionally stop with
llm_job_cancel
Async request tools accept the same approval strategy fields as their sync variants:
approvalStrategy:"legacy"(default) or"mcp_managed"approvalPolicy:"strict"|"balanced"|"permissive"overridemcpServers: Requested MCP servers (sqry,exa,ref_tools,trstr)claude_request_asyncalso supportsstrictMcpConfigand fails fast when requested servers are unavailable
llm_job_status
Return lifecycle status (running, completed, failed, canceled) and metadata for an async job.
llm_job_result
Return captured stdout/stderr for an async job (with configurable max chars per stream).
llm_job_cancel
Cancel a running async job.
approval_list
List recent MCP-managed approval decisions recorded by the gateway.
Parameters:
limit(number, optional): Max records (1-500), default: 50cli(string, optional): Filter by"claude","codex", or"gemini"
Approval records are persisted to ~/.llm-cli-gateway/approvals.jsonl.
Session Management Tools
session_create
Create a new session for a specific CLI.
Parameters:
cli(string, required): CLI to create session for ("claude", "codex", "gemini")description(string, optional): Description for the sessionsetAsActive(boolean, optional): Set as active session, default: true
Example:
{
"cli": "claude",
"description": "Code review session",
"setAsActive": true
}
session_list
List all sessions, optionally filtered by CLI.
Parameters:
cli(string, optional): Filter by CLI ("claude", "codex", "gemini")
Response includes:
- Total session count
- Session details (ID, CLI, description, timestamps, active status)
- Active session IDs for each CLI
session_set_active
Set the active session for a specific CLI.
Parameters:
cli(string, required): CLI to set active session forsessionId(string, required): Session ID to activate (or null to clear)
session_get
Retrieve details for a specific session.
Parameters:
sessionId(string, required): Session ID to retrieve
session_delete
Delete a specific session.
Parameters:
sessionId(string, required): Session ID to delete
session_clear_all
Clear all sessions, optionally for a specific CLI.
Parameters:
cli(string, optional): Clear sessions for specific CLI only
Utility Tools
list_models
List available models for each CLI.
Parameters:
cli(string, optional): Specific CLI to list models for ("claude", "codex", "gemini")
Response includes:
- Model names and descriptions
- Best use cases for each model
- CLI-specific information
Session Management
How It Works
- Automatic Session Tracking: By default, the gateway automatically tracks sessions for each CLI
- Active Sessions: Each CLI can have one active session that's used by default
- Persistent Storage: Sessions are stored in
~/.llm-cli-gateway/sessions.json - Context Reuse: Using sessions maintains conversation history and context
Session Workflow
// 1. Create a new session
await callTool("session_create", {
cli: "claude",
description: "Debugging session",
setAsActive: true
});
// 2. Make requests (automatically uses active session)
await callTool("claude_request", {
prompt: "What's the bug in this code?",
// sessionId is automatically used
});
// 3. Continue the conversation
await callTool("claude_request", {
prompt: "Can you explain that fix in more detail?",
continueSession: true
});
// 4. List all sessions
await callTool("session_list", { cli: "claude" });
// 5. Switch to a different session
await callTool("session_set_active", {
cli: "claude",
sessionId: "some-other-session-id"
});
// 6. Delete when done
await callTool("session_delete", {
sessionId: "session-id-to-delete"
});
Configuration
Environment Variables
DEBUG: Enable debug logging (set to any value)DEBUG=1 node dist/index.jsLLM_GATEWAY_APPROVAL_POLICY: Default approval policy when request does not passapprovalPolicy(strict,balanced,permissive)LLM_GATEWAY_APPROVAL_POLICY=strict node dist/index.js
CLI-Specific Settings
Each CLI can be configured through its own configuration files:
- Claude Code:
~/.claude/config.json - Codex:
~/.codex/config.toml - Gemini:
~/.gemini/config.json
Development
Project Structure
llm-cli-gateway/
├── src/
│ ├── index.ts # Main MCP server and tool definitions
│ ├── executor.ts # CLI execution with timeout support
│ ├── session-manager.ts # Session management logic
│ └── __tests__/
│ ├── executor.test.ts # Unit tests for executor
│ └── integration.test.ts # Integration tests
├── dist/ # Compiled JavaScript
├── package.json
├── tsconfig.json
└── vitest.config.ts
Running Tests
# Run all tests
npm test
# Run unit tests only
npm run test:unit
# Run integration tests only
npm run test:integration
# Watch mode
npm run test:watch
Building
npm run build
Starting the Server
npm start
Error Handling
The gateway provides detailed error messages for common issues:
CLI Not Found
Error executing claude CLI:
spawn claude ENOENT
The 'claude' command was not found. Please ensure claude CLI is installed and in your PATH.
External Timeout / Legacy Timeout Option
Error executing codex CLI: Command timed out
Process timed out after 120000ms
Invalid Parameters
Prompt cannot be empty
Prompt too long (max 100k chars)
Logging
Logs are written to stderr (stdout is reserved for MCP protocol):
[INFO] 2026-01-24T05:00:00.000Z - Starting llm-cli-gateway MCP server
[INFO] 2026-01-24T05:00:01.000Z - claude_request invoked with model=sonnet, prompt length=150
[INFO] 2026-01-24T05:00:05.000Z - claude_request completed successfully in 4523ms, response length=2048
[ERROR] 2026-01-24T05:00:10.000Z - codex CLI execution failed: spawn codex ENOENT
Enable debug logging:
DEBUG=1 node dist/index.js
Troubleshooting
CLIs Not Found
Make sure the CLIs are installed and in your PATH:
which claude
which codex
which gemini
The gateway extends PATH to include common locations:
~/.local/bin/usr/local/bin/usr/bin- All
~/.nvm/versions/node/*/bindirectories
Permission Errors
If you encounter permission errors, ensure the CLI tools have proper permissions:
chmod +x $(which claude)
chmod +x $(which codex)
chmod +x $(which gemini)
Session Storage Issues
Sessions are stored in ~/.llm-cli-gateway/sessions.json. If you encounter issues:
- Check file permissions:
ls -la ~/.llm-cli-gateway/
- Reset sessions:
rm ~/.llm-cli-gateway/sessions.json
- Or manually edit the session file:
cat ~/.llm-cli-gateway/sessions.json
Performance
Timeouts
The gateway does not enforce a default execution timeout for LLM CLI requests.
If your MCP client/runtime enforces per-tool-call deadlines, use async tools (*_request_async + llm_job_status/llm_job_result) so long-running jobs can complete outside a single call window.
Concurrent Requests
The gateway supports concurrent requests across different CLIs. Each request spawns a separate process.
Security Considerations
- Input Validation: All prompts are validated (min 1 char, max 100k chars)
- Command Execution: Uses
spawnwith separate arguments (not shell execution) - No Eval: No dynamic code evaluation
- Sandboxing: Consider running in containers for production use
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
npm test - Build:
npm run build - Submit a pull request
License
MIT. See LICENSE for details.
Support
For issues and questions:
- Open an issue on GitHub
- Check existing issues and documentation
- Review CLI-specific documentation for CLI-related problems
Changelog
See CHANGELOG.md for detailed release history.
Servidores relacionados
Scout Monitoring MCP
patrocinadorPut performance and error data directly in the hands of your AI assistant.
Alpha Vantage MCP Server
patrocinadorAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
xctools
🍎 MCP server for Xcode's xctrace, xcrun, xcodebuild.
AKF — The AI Native File Format
EXIF for AI. AKF embeds trust scores, source provenance, and compliance metadata into every file your AI touches — DOCX, PDF, images, code, and 20+ formats. 9 MCP tools: stamp, inspect, trust, audit, scan, embed, extract, detect. Audit against EU AI Act, SOX, HIPAA, NIST in one command.
OpenAPI Invoker
Invokes any OpenAPI specification through a Model Context Protocol (MCP) server.
Freento MCP Server
Freento MCP Server connects AI assistants to a Magento 2 store via the Model Context Protocol, enabling secure access to products, customers, and order data through a standardized API.
EChart Server
A Go service that dynamically generates ECharts chart pages from JSON configurations.
MCPatterns
A server for storing and retrieving personalized coding patterns from a local JSONL file.
Micromanage
A server for managing sequential development tasks with configurable rules using external .mdc files.
JavaScript Sandbox
Provides a secure JavaScript execution environment for running code snippets.
DevContext
Provides developers with continuous, project-centric context awareness. Requires a TursoDB database.
Kubeshark
MCP access to cluster-wide L4 and L7 network traffic, packets, APIs, and complete payloads.