llm-cli-gateway

Unified MCP server providing access to Claude Code, Codex, and Gemini CLIs through a single gateway. Features multi-LLM orchestration, persistent session management, async job execution with polling, approval gates, retry with circuit breakers, and token optimization. Install: npx -y llm-cli-gateway

GitHub

llm-cli-gateway

"Without consultation, plans are frustrated, but with many counselors they succeed." — Proverbs 15:22 (LSB)

A Model Context Protocol (MCP) server providing unified access to Claude Code, Codex, Gemini, and Grok CLIs with session management, retry logic, and async job orchestration.

Features

Core Capabilities

Multi-LLM Orchestration: Unified interface for Claude Code, Codex, Gemini, and Grok CLIs
Session Management: Track and resume conversations across all CLIs with persistent storage
Token Optimization: Automatic 44% reduction on prompts, 37% on responses (opt-in)
Correlation ID Tracking: Full request tracing across all LLM interactions
Cross-Tool Collaboration: LLMs can use each other via MCP (validated through dogfooding)

Observability

SQLite Flight Recorder: Every request/response logged to ~/.llm-cli-gateway/logs.db with correlation IDs, token usage, duration, retry counts, and circuit breaker state. Browse with Datasette: datasette ~/.llm-cli-gateway/logs.db
Structured Metadata: Tool responses include machine-readable structuredContent (model, cli, correlationId, sessionId, durationMs, token counts)

Reliability & Performance

Retry Logic: Exponential backoff with circuit breaker for transient failures
Atomic File Writes: Process-specific temp files with fsync for data integrity
Memory Limits: 50MB cap on CLI output prevents DoS attacks
NVM Path Caching: Eliminates I/O overhead on every request
Long-Running Jobs: Non-time-bound async execution via *_request_async + polling tools

Security & Quality

Comprehensive Testing: 284 tests covering unit, integration, and regression scenarios
Input Validation: Zod schemas prevent injection attacks
No Secret Leakage: Generic session descriptions only (file permissions 0o600)
No ReDoS: Bounded regex patterns prevent catastrophic backtracking
Type Safety: Strict TypeScript with comprehensive error handling
221 Tests: Unit, integration, and regression tests with real CLI execution

Prerequisites

Before using this gateway, you need to install the CLI tools you want to use:

Claude Code CLI

# Installation instructions for Claude Code
# Visit: https://docs.anthropic.com/claude-code
npm install -g @anthropic-ai/claude-code

Codex CLI

npm install -g @openai/codex
codex login

Gemini CLI

npm install -g @google/gemini-cli
# Or: https://github.com/google-gemini/gemini-cli

Grok CLI (xAI)

npm install -g grok-build
grok login   # OAuth flow, or set GROK_CODE_XAI_API_KEY
# Docs: https://docs.x.ai/build/cli

Installation

As an MCP server (npm)

npm install -g llm-cli-gateway

Or use directly with npx:

{
  "mcpServers": {
    "llm-gateway": {
      "command": "npx",
      "args": ["-y", "llm-cli-gateway"]
    }
  }
}

From source

git clone https://github.com/verivus-oss/llm-cli-gateway.git
cd llm-cli-gateway
npm install
npm run build

Usage

As an MCP Server

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "llm-cli-gateway": {
      "command": "node",
      "args": ["/path/to/llm-cli-gateway/dist/index.js"]
    }
  }
}

Available Tools

LLM Request Tools

`claude_request`

Execute a Claude Code request with optional session management.

Parameters:

prompt (string, required): The prompt to send (1-100,000 chars)
model (string, optional): Model name or alias (use list_models for available values; supports latest)
outputFormat (string, optional): Output format ("text" or "json"), default: "text"
sessionId (string, optional): Specific session ID to use
continueSession (boolean, optional): Continue the active session
createNewSession (boolean, optional): Always create a new session
allowedTools (string[], optional): Restrict Claude tools to this allow-list
disallowedTools (string[], optional): Explicitly deny listed Claude tools
dangerouslySkipPermissions (boolean, optional): Request CLI-side permission bypass (legacy mode only)
approvalStrategy (string, optional): "legacy" (default) or "mcp_managed"
approvalPolicy (string, optional): "strict", "balanced", or "permissive"
mcpServers (string[], optional): Claude MCP servers to expose (default: ["sqry","exa","ref_tools"]; "trstr" available as opt-in)
strictMcpConfig (boolean, optional): Require Claude to use only supplied MCP config, default: true (request fails if any requested server is unavailable)
optimizePrompt (boolean, optional): Optimize prompt for token efficiency (44% reduction), default: false
optimizeResponse (boolean, optional): Optimize response for token efficiency (37% reduction), default: false
correlationId (string, optional): Request trace ID (auto-generated if omitted)

Response extras:

approval: Approval decision record when approvalStrategy="mcp_managed"
mcpServers: Requested/enabled/missing MCP servers for this call

Example:

{
  "prompt": "Write a Python function to calculate fibonacci numbers",
  "model": "sonnet",
  "continueSession": true,
  "optimizePrompt": true,
  "optimizeResponse": true
}

`codex_request`

Execute a Codex request with optional session tracking.

Parameters:

prompt (string, required): The prompt to send (1-100,000 chars)
model (string, optional): Model name or alias (use list_models for available values; supports latest, recommended: gpt-5.4)
fullAuto (boolean, optional): Enable full-auto mode, default: false
dangerouslyBypassApprovalsAndSandbox (boolean, optional): Request Codex bypass flags
approvalStrategy (string, optional): "legacy" (default) or "mcp_managed"
approvalPolicy (string, optional): "strict", "balanced", or "permissive"
mcpServers (string[], optional): MCP servers expected for Codex execution context
sessionId (string, optional): Session identifier for tracking
createNewSession (boolean, optional): Always create a new session
optimizePrompt (boolean, optional): Optimize prompt for token efficiency, default: false
optimizeResponse (boolean, optional): Optimize response for token efficiency, default: false
correlationId (string, optional): Request trace ID (auto-generated if omitted)
idleTimeoutMs (number, optional): Kill a stuck Codex process after output inactivity; 30,000 to 3,600,000 ms

Response extras:

approval: Approval decision record when approvalStrategy="mcp_managed"
mcpServers: Requested MCP servers for this call

Example:

{
  "prompt": "Create a REST API endpoint",
  "model": "gpt-5.4",
  "fullAuto": true,
  "optimizePrompt": true
}

`gemini_request`

Execute a Gemini CLI request with session support.

Parameters:

prompt (string, required): The prompt to send (1-100,000 chars)
model (string, optional): Model name or alias (use list_models for available values; supports latest, pro, flash)
sessionId (string, optional): Session ID to resume
resumeLatest (boolean, optional): Resume the latest session automatically
createNewSession (boolean, optional): Always create a new session
approvalMode (string, optional): Gemini approval mode (default|auto_edit|yolo) in legacy mode
approvalStrategy (string, optional): "legacy" (default) or "mcp_managed"
approvalPolicy (string, optional): "strict", "balanced", or "permissive"
mcpServers (string[], optional): Allowed Gemini MCP server names
allowedTools (string[], optional): Restrict Gemini tools to this allow-list
includeDirs (string[], optional): Additional workspace directories for Gemini
optimizePrompt (boolean, optional): Optimize prompt for token efficiency, default: false
optimizeResponse (boolean, optional): Optimize response for token efficiency, default: false
correlationId (string, optional): Request trace ID (auto-generated if omitted)

Response extras:

approval: Approval decision record when approvalStrategy="mcp_managed"
mcpServers: Requested MCP servers for this call

Example:

{
  "prompt": "Explain quantum computing",
  "model": "latest",
  "resumeLatest": true,
  "optimizePrompt": true
}

`grok_request`

Execute a Grok CLI (xAI) request with session support.

Parameters:

prompt (string, required): The prompt to send (1-100,000 chars)
model (string, optional): Model name or alias (e.g. grok-build, latest)
outputFormat (string, optional): "plain" (default), "json", or "streaming-json"
sessionId (string, optional): Session ID to resume (--resume <id>)
resumeLatest (boolean, optional): Resume the most recent session in the current cwd (--continue)
createNewSession (boolean, optional): Always create a new session
alwaysApprove (boolean, optional): Auto-approve all tool executions (--always-approve) in legacy mode
permissionMode (string, optional): default|acceptEdits|auto|dontAsk|bypassPermissions|plan
effort (string, optional): low|medium|high|xhigh|max
reasoningEffort (string, optional): Reasoning effort for reasoning models
approvalStrategy (string, optional): "legacy" (default) or "mcp_managed"
approvalPolicy (string, optional): "strict", "balanced", or "permissive"
mcpServers (string[], optional): MCP server names tracked for approvals (Grok manages its own MCP config via grok mcp)
allowedTools (string[], optional): Allowed built-in tools (passed as --tools comma list)
disallowedTools (string[], optional): Disallowed built-in tools (passed as --disallowed-tools comma list)
optimizePrompt (boolean, optional): Optimize prompt for token efficiency, default: false
optimizeResponse (boolean, optional): Optimize response for token efficiency, default: false
correlationId (string, optional): Request trace ID (auto-generated if omitted)

Example:

{
  "prompt": "Summarize the latest commit message in 1 sentence",
  "model": "grok-build",
  "effort": "low"
}

Durable job results & automatic dedup

Every async job is persisted to a jobs table in ~/.llm-cli-gateway/logs.db as it transitions through running → completed/failed/canceled. This makes the gateway a durable collection layer:

Re-issuing a request is safe. Identical *_request / *_request_async calls within the dedup window (default 1 hour) short-circuit onto the existing running or completed job — the caller gets back the same job ID instead of starting a duplicate run. This directly fixes the "agent times out polling, re-issues, and the whole job starts over" failure mode.
llm_job_status and llm_job_result work across gateway restarts. Job rows live for 30 days by default; callers can fetch results long after the in-memory cache has evicted them.
Jobs running at shutdown are marked orphaned on the next gateway boot (the detached child can't be reattached to). Their captured partial output remains readable.
Pass forceRefresh: true on any request tool to bypass dedup and force a fresh CLI run.

Environment variables:

LLM_GATEWAY_JOB_RETENTION_DAYS — how long completed jobs stay queryable. Default 30.
LLM_GATEWAY_DEDUP_WINDOW_MS — how recent an existing job must be to dedup against. Default 3600000 (1 hour). Set 0 to disable dedup.
LLM_GATEWAY_JOBS_DB — override the sqlite path. Defaults to the value of LLM_GATEWAY_LOGS_DB, then ~/.llm-cli-gateway/logs.db. Set to none to disable durability entirely (in-memory only).

`claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`

Start a long-running Claude, Codex, Gemini, or Grok request without waiting for completion in the same MCP call.

Use this flow when analysis/runtime can exceed client tool-call limits:

Start job with *_request_async
Poll with llm_job_status
Fetch output with llm_job_result
Optionally stop with llm_job_cancel

Async request tools accept the same approval strategy fields as their sync variants:

approvalStrategy: "legacy" (default) or "mcp_managed"
approvalPolicy: "strict"|"balanced"|"permissive" override
mcpServers: Requested MCP servers (sqry, exa, ref_tools, trstr)
claude_request_async also supports strictMcpConfig and fails fast when requested servers are unavailable

`llm_job_status`

Return lifecycle status (running, completed, failed, canceled) and metadata for an async job.

`llm_job_result`

Return captured stdout/stderr for an async job (with configurable max chars per stream).

`llm_job_cancel`

Cancel a running async job.

`approval_list`

List recent MCP-managed approval decisions recorded by the gateway.

Parameters:

limit (number, optional): Max records (1-500), default: 50
cli (string, optional): Filter by "claude", "codex", or "gemini"

Approval records are persisted to ~/.llm-cli-gateway/approvals.jsonl.

Session Management Tools

`session_create`

Create a new session for a specific CLI.

Parameters:

cli (string, required): CLI to create session for ("claude", "codex", "gemini", "grok")
description (string, optional): Description for the session
setAsActive (boolean, optional): Set as active session, default: true

Example:

{
  "cli": "claude",
  "description": "Code review session",
  "setAsActive": true
}

`session_list`

List all sessions, optionally filtered by CLI.

Parameters:

cli (string, optional): Filter by CLI ("claude", "codex", "gemini", "grok")

Response includes:

Total session count
Session details (ID, CLI, description, timestamps, active status)
Active session IDs for each CLI

`session_set_active`

Set the active session for a specific CLI.

Parameters:

cli (string, required): CLI to set active session for
sessionId (string, required): Session ID to activate (or null to clear)

`session_get`

Retrieve details for a specific session.

Parameters:

sessionId (string, required): Session ID to retrieve

`session_delete`

Delete a specific session.

Parameters:

sessionId (string, required): Session ID to delete

`session_clear_all`

Clear all sessions, optionally for a specific CLI.

Parameters:

cli (string, optional): Clear sessions for specific CLI only

Utility Tools

`list_models`

List available models for each CLI.

Parameters:

cli (string, optional): Specific CLI to list models for ("claude", "codex", "gemini", "grok")

Response includes:

Model names and descriptions
Best use cases for each model
CLI-specific information
defaultModel and defaultModelSource when a default is explicitly configured
modelMetadata with source/confidence (fallback, config, env, observed)
aliases and warnings when configured or when discovery degrades gracefully

The registry treats explicit configuration as authoritative. Bundled fallback models are low-confidence hints, and Gemini models observed in local session history are merged as low-confidence entries only; they do not become the default model.

Model registry environment overrides:

# Explicit defaults
CLAUDE_DEFAULT_MODEL=haiku
CODEX_DEFAULT_MODEL=<codex-model-id>
GEMINI_DEFAULT_MODEL=gemini-2.5-flash

# Additional models: comma/newline list, JSON array, or JSON object of model->description
GEMINI_MODELS='{"gemini-team-default":"Team-approved Gemini model"}'

# Aliases
GEMINI_MODEL_ALIASES='team=gemini-team-default'
LLM_GATEWAY_MODEL_ALIASES='codex.fast=gpt-5.3-codex-spark,gemini.fast=gemini-team-default'

# Deterministic config/discovery paths
CODEX_CONFIG_PATH=/path/to/config.toml
CLAUDE_SETTINGS_PATH=/path/to/settings.json
CLAUDE_SETTINGS_LOCAL_PATH=/path/to/settings.local.json
GEMINI_SETTINGS_PATH=/path/to/settings.json
GEMINI_HISTORY_ROOT=/path/to/.gemini/tmp

# Disable local model-history discovery
LLM_GATEWAY_DISABLE_MODEL_DISCOVERY=1

`cli_versions`

Report installed CLI versions.

Parameters:

cli (string, optional): Specific CLI to inspect ("claude", "codex", "gemini", "grok")

`cli_upgrade`

Plan or run an upgrade for one CLI.

Parameters:

cli (string, required): CLI to upgrade ("claude", "codex", "gemini", "grok")
target (string, optional): Package tag/version/target, default: latest
dryRun (boolean, optional): Return the upgrade plan without running it, default: true
timeoutMs (number, optional): Upgrade timeout when dryRun=false

Upgrade strategies:

Claude latest: claude update
Claude explicit target: claude install <target>
Codex latest: codex update
Codex explicit target: npm install -g @openai/codex@<target>
Gemini: npm install -g @google/gemini-cli@<target>

Example dry run:

{
  "cli": "gemini",
  "target": "latest",
  "dryRun": true
}

Session Management

How It Works

Automatic Session Tracking: By default, the gateway automatically tracks sessions for each CLI
Active Sessions: Each CLI can have one active session that's used by default
Persistent Storage: Sessions are stored in ~/.llm-cli-gateway/sessions.json
Context Reuse: Using sessions maintains conversation history and context

Session Workflow

// 1. Create a new session
await callTool("session_create", {
  cli: "claude",
  description: "Debugging session",
  setAsActive: true
});

// 2. Make requests (automatically uses active session)
await callTool("claude_request", {
  prompt: "What's the bug in this code?",
  // sessionId is automatically used
});

// 3. Continue the conversation
await callTool("claude_request", {
  prompt: "Can you explain that fix in more detail?",
  continueSession: true
});

// 4. List all sessions
await callTool("session_list", { cli: "claude" });

// 5. Switch to a different session
await callTool("session_set_active", {
  cli: "claude",
  sessionId: "some-other-session-id"
});

// 6. Delete when done
await callTool("session_delete", {
  sessionId: "session-id-to-delete"
});

Configuration

Environment Variables

DEBUG: Enable debug logging (set to any value)
```
DEBUG=1 node dist/index.js
```
LLM_GATEWAY_APPROVAL_POLICY: Default approval policy when request does not pass approvalPolicy (strict, balanced, permissive)
```
LLM_GATEWAY_APPROVAL_POLICY=strict node dist/index.js
```

LLM_GATEWAY_LOGS_DB: Path to SQLite flight recorder database. Default: ~/.llm-cli-gateway/logs.db. Set to empty string or none to disable logging.

# Custom path
LLM_GATEWAY_LOGS_DB=/var/log/gateway/logs.db node dist/index.js
# Disable flight recorder
LLM_GATEWAY_LOGS_DB=none node dist/index.js

CLI-Specific Settings

Each CLI can be configured through its own configuration files:

Claude Code: ~/.claude/config.json
Codex: ~/.codex/config.toml
Gemini: ~/.gemini/config.json

For Fans of Simon Willison

Simon's llm tool made it trivially easy to talk to any LLM from the command line. But as AI-assisted development matures, the challenge shifts from "how do I call a model" to "how do I orchestrate multiple models reliably, and what did they actually do?"

Multiple models increase the confidence factor. When Claude writes code, Codex reviews it, and Gemini checks for bugs -- each bringing different training data and reasoning patterns -- the result is more robust than any single model alone. And often this isn't even enough. Having the models do iterative reviews is where you start getting real confidence.

Every interaction should be queryable data. Inspired by llm's SQLite logging philosophy, the gateway records every request and response to a local SQLite database. Not just prompts and responses -- retry counts, circuit breaker states, approval decisions, thinking blocks, cost estimates. Open it with Datasette and you have a complete operational picture of your AI usage:

datasette ~/.llm-cli-gateway/logs.db

The llm-gateway plugin bridges both worlds. Install it, and your existing llm workflows gain orchestration features without changing how you work:

llm install llm-gateway
llm -m gateway-claude "explain this function"

Your gateway interactions appear in both llm logs (for your personal history) and the gateway's flight recorder (for operational observability). Two audiences, one workflow.

Composability over monoliths. The gateway doesn't replace llm -- it complements it. Use llm directly when you want simplicity. Route through the gateway when you want resilience, multi-model coordination, or detailed operational telemetry. The plugin is the bridge, not the destination.

Development

Project Structure

llm-cli-gateway/
├── src/
│   ├── index.ts              # Main MCP server and tool definitions
│   ├── executor.ts           # CLI execution with timeout support
│   ├── session-manager.ts    # Session management logic
│   └── __tests__/
│       ├── executor.test.ts  # Unit tests for executor
│       └── integration.test.ts # Integration tests
├── dist/                     # Compiled JavaScript
├── package.json
├── tsconfig.json
└── vitest.config.ts

Running Tests

# Run all tests
npm test

# Run unit tests only
npm run test:unit

# Run integration tests only
npm run test:integration

# Watch mode
npm run test:watch

Building

npm run build

Starting the Server

npm start

Error Handling

The gateway provides detailed error messages for common issues:

CLI Not Found

Error executing claude CLI:
spawn claude ENOENT

The 'claude' command was not found. Please ensure claude CLI is installed and in your PATH.

External Timeout / Legacy Timeout Option

Error executing codex CLI: Command timed out
Process timed out after 120000ms

Invalid Parameters

Prompt cannot be empty
Prompt too long (max 100k chars)

Logging

Logs are written to stderr (stdout is reserved for MCP protocol):

[INFO] 2026-01-24T05:00:00.000Z - Starting llm-cli-gateway MCP server
[INFO] 2026-01-24T05:00:01.000Z - claude_request invoked with model=sonnet, prompt length=150
[INFO] 2026-01-24T05:00:05.000Z - claude_request completed successfully in 4523ms, response length=2048
[ERROR] 2026-01-24T05:00:10.000Z - codex CLI execution failed: spawn codex ENOENT

Enable debug logging:

DEBUG=1 node dist/index.js

Troubleshooting

CLIs Not Found

Make sure the CLIs are installed and in your PATH:

which claude
which codex
which gemini

The gateway extends PATH to include common locations:

~/.local/bin
/usr/local/bin
/usr/bin
All ~/.nvm/versions/node/*/bin directories

Permission Errors

If you encounter permission errors, ensure the CLI tools have proper permissions:

chmod +x $(which claude)
chmod +x $(which codex)
chmod +x $(which gemini)

Session Storage Issues

Sessions are stored in ~/.llm-cli-gateway/sessions.json. If you encounter issues:

Check file permissions:

ls -la ~/.llm-cli-gateway/

Reset sessions:

rm ~/.llm-cli-gateway/sessions.json

Or manually edit the session file:

cat ~/.llm-cli-gateway/sessions.json

Performance

Timeouts

The gateway does not enforce a default execution timeout for LLM CLI requests.

If your MCP client/runtime enforces per-tool-call deadlines, use async tools (*_request_async + llm_job_status/llm_job_result) so long-running jobs can complete outside a single call window.

Concurrent Requests

The gateway supports concurrent requests across different CLIs. Each request spawns a separate process.

Security Considerations

Input Validation: All prompts are validated (min 1 char, max 100k chars)
Command Execution: Uses spawn with separate arguments (not shell execution)
No Eval: No dynamic code evaluation
Sandboxing: Consider running in containers for production use

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: npm test
Build: npm run build
Submit a pull request

License

MIT. See LICENSE for details.

Support

For issues and questions:

Open an issue on GitHub
Check existing issues and documentation
Review CLI-specific documentation for CLI-related problems

Changelog

See CHANGELOG.md for detailed release history.

llm-cli-gateway

llm-cli-gateway

Features

Core Capabilities

Observability

Reliability & Performance

Security & Quality

Prerequisites

Claude Code CLI

Codex CLI

Gemini CLI

Grok CLI (xAI)

Installation

As an MCP server (npm)

From source

Usage

As an MCP Server

Available Tools

LLM Request Tools

claude_request

codex_request

gemini_request

grok_request

Durable job results & automatic dedup

claude_request_async / codex_request_async / gemini_request_async / grok_request_async

llm_job_status

llm_job_result

llm_job_cancel

approval_list

Session Management Tools

session_create

session_list

session_set_active

session_get

session_delete

session_clear_all

Utility Tools

list_models

cli_versions

cli_upgrade

Session Management

How It Works

Session Workflow

Configuration

Environment Variables

CLI-Specific Settings

For Fans of Simon Willison

Development

Project Structure

Running Tests

Building

Starting the Server

Error Handling

CLI Not Found

External Timeout / Legacy Timeout Option

Invalid Parameters

Logging

Troubleshooting

CLIs Not Found

Permission Errors

Session Storage Issues

Performance

Timeouts

Concurrent Requests

Security Considerations

Contributing

License

Support

Changelog

関連サーバー

Alpha Vantage MCP Server

ast-grep MCP

Postman MCP Server

Honeybadger

eBPF MCP

MCP LaTeX Server

LastSaaS

Remote MCP Server on Cloudflare

Adobe After Effects

Pulsar Edit MCP Server

`claude_request`

`codex_request`

`gemini_request`

`grok_request`

`claude_request_async` / `codex_request_async` / `gemini_request_async` / `grok_request_async`

`llm_job_status`

`llm_job_result`

`llm_job_cancel`

`approval_list`

`session_create`

`session_list`

`session_set_active`

`session_get`

`session_delete`

`session_clear_all`

`list_models`

`cli_versions`

`cli_upgrade`