vibops-mcp Server

GPU infrastructure control plane + per-agent LLM FinOps. 74 tools. Deploy, scale, track cost per agent, enforce budgets. MIT license.

Documentation

vibops-mcp

License: MIT Python 3.11+ MCP Tools Tests

The provider-agnostic MCP server for GPU infrastructure — one interface for any cloud, any cluster, any provider.

The problem

Large enterprises and CSPs managing GPU infrastructure deal with fragmentation — AWS, GCP, Azure, on-prem, neoclouds, each with their own API, dashboard, and cost model. Correlating utilisation, cost, workload type, and compliance posture across providers requires jumping between 5 tools.

The solution

vibops-mcp is a single MCP server that abstracts this complexity. One pip install, 70 tools, and your AI assistant can observe, operate, govern, and optimize your entire GPU fleet — regardless of where it runs.

  • Observe — GPU utilisation, workload breakdown, MTTR, cost estimates, live K8s deployments
  • Act — deploy models, scale deployments, run Helm/kubectl, trigger pipelines, submit Slurm jobs
  • Govern — anomaly detection, AI Act compliance, SOC 2/RGPD reports, immutable audit chain, policy management
  • FinOps — budget tracking, chargeback, spend trends, waste analysis

All operations go through your VibOps instance and are recorded in the audit log.

Installation

pip install git+https://github.com/VibOpsai/vibops-mcp.git

Configuration

You need two environment variables:

VariableDescription
VIBOPS_URLBase URL of your VibOps instance, e.g. https://vibops.example.com
VIBOPS_TOKENAPI token — create one in VibOps → Settings → API Tokens

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "vibops": {
      "command": "vibops-mcp",
      "env": {
        "VIBOPS_URL": "https://vibops.example.com",
        "VIBOPS_TOKEN": "your-token-here"
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project root, or to the global config:

{
  "mcpServers": {
    "vibops": {
      "command": "vibops-mcp",
      "env": {
        "VIBOPS_URL": "https://vibops.example.com",
        "VIBOPS_TOKEN": "your-token-here"
      }
    }
  }
}

Claude Code (CLI)

claude mcp add vibops vibops-mcp \
  -e VIBOPS_URL=https://vibops.example.com \
  -e VIBOPS_TOKEN=your-token-here

Available tools

Observation (16 tools — read-only)

ToolDescription
list_clustersList clusters and GPU utilisation
list_kubectl_contextsList available kubectl contexts
get_cluster_deploymentsLive K8s deployment status for a cluster
get_cluster_rateGet configured GPU cost rate for a cluster
list_jobsList recent jobs with optional filters
get_jobGet job details and result
get_job_metricsJob success rate, latency P50/P95/P99, error breakdown
get_gpu_metricsHourly GPU utilisation time-series
get_workload_breakdownJob count by workload type
get_mttrMean Time To Resolve GPU alerts
get_cost_estimateEstimated GPU spend
list_gatewaysList registered gateways and status
list_alertsList GPU alerts (open or resolved)
list_secretsList secrets (names only, never values)
list_providersList configured AI/GPU cloud providers
list_pipelinesList automation pipelines

Actions (18 tools — write)

ToolDescription
scale_deploymentScale a K8s deployment replica count
deploy_modelDeploy an AI model onto a GPU cluster
helm_upgradeRun helm upgrade --install
helm_uninstallUninstall a Helm release
run_kubectlRun an arbitrary kubectl command
git_cloneClone a git repository
create_secretStore an encrypted secret
trigger_pipelineManually trigger an automation pipeline
slurm_get_cluster_infoGet Slurm cluster info and partition details
slurm_list_jobsList Slurm jobs with optional filters
slurm_get_job_statusGet status of a specific Slurm job
slurm_get_job_outputRetrieve stdout/stderr of a completed Slurm job
slurm_submit_jobSubmit a new Slurm job
slurm_cancel_jobCancel a running or pending Slurm job
registry_list_reposList container registry repositories
registry_list_tagsList tags for a container image
registry_check_imageCheck image details (size, layers, created date)
registry_delete_tagDelete a stale image tag (requires confirmed=True)

Configuration (3 tools)

ToolDescription
set_cluster_rateSet GPU cost rate for a cluster (admin only)
register_gatewayRegister a new gateway (returns one-time token)
delete_gatewayRevoke a gateway

Agent Infrastructure Control Plane (12 tools)

The missing layer between your AI agents and your GPU fleet. Works with any framework (n8n, LangChain, CrewAI, Dify) — just point to the VibOps LLM Proxy.

ToolDescription
FinOps per agent
get_agent_usageGPU cost per agent — tokens, requests, cost, GPU-hours. "Which agent costs the most?"
get_agent_usage_detailDrill-down on one agent — daily breakdown, model distribution, cost trend
get_agent_budgetCurrent budget + MTD spend for an agent
set_agent_budgetSet monthly spend limit — soft alert at 80%, hard block at 100% (HTTP 429)
Model access control
get_agent_model_rulesList model access rules — which agent can use which LLM
update_agent_model_ruleCreate a rule: glob patterns, deny-first. "RH agents → Mistral only"
Identity lifecycle
list_agent_identitiesList machine identities for agents
create_agent_identityCreate a new machine identity (key shown once)
rotate_agent_identityRotate the key for an existing identity
revoke_agent_identityRevoke an identity immediately
Dependency graph
get_agent_dependency_graphFull org-wide graph: agent→model, agent→connector, agent→sub-agent
get_agent_dependenciesDependencies for one agent — impact analysis before migration

Governance & Compliance (21 tools)

ToolDescription
list_anomaliesList GPU anomalies with optional cluster/status filter
get_open_anomaliesGet all currently open anomalies
resolve_anomalyMark an anomaly as resolved
list_ai_act_controlsList AI Act compliance controls
get_ai_act_scoreGet the overall AI Act compliance score
update_ai_act_controlUpdate status, notes, or evidence URL for a control
list_compliance_reportsList generated compliance reports
generate_compliance_reportGenerate a SOC 2, RGPD, or HIPAA report asynchronously
get_compliance_reportPoll/retrieve a generated compliance report
list_audit_logsQuery the immutable audit log with filters
verify_audit_chainVerify HMAC-SHA256 integrity of the full audit chain
get_policyGet the current organisation policy
update_policyReplace the organisation policy (immediate effect)
list_eval_rubricsList LLM-as-judge evaluation rubrics
evaluate_jobTrigger LLM-as-judge evaluation for a job
get_job_evaluationsRetrieve evaluation results for a job
get_ldap_configGet LDAP / Active Directory configuration
update_ldap_configConfigure or enable/disable LDAP integration
get_siem_configGet SIEM push export configuration
update_siem_configSet Splunk/Datadog SIEM destination
push_to_siemExport audit events to configured SIEM

GPU FinOps (4 tools)

ToolDescription
get_budgetGet current GPU budget and consumed spend
get_chargebackGet chargeback breakdown by tenant for a given month
get_spend_trendGet daily GPU spend trend (default: last 30 days)
get_waste_analysisIdentify idle GPU resources and cost optimisation opportunities

LLM Inference Proxy

VibOps includes a transparent OpenAI-compatible proxy (port 8004) that sits between your AI agents and LLM inference servers (vLLM, Ollama, TGI). Every inference request is logged with agent attribution for FinOps.

Your agents point to the proxy instead of the LLM directly:

# Before
OPENAI_BASE_URL=http://vllm:8000/v1

# After
OPENAI_BASE_URL=http://vibops-proxy:8004/v1

Add a X-VibOps-Agent-Id header to attribute costs per agent:

curl -X POST http://vibops-proxy:8004/v1/chat/completions \
  -H "X-VibOps-Agent-Id: pricing-agent-v2" \
  -H "X-VibOps-Team: supply-chain" \
  -d '{"model": "mistral:7b", "messages": [...]}'

The proxy captures: agent ID, team, model, tokens, latency, GPU cost — visible in the console FinOps dashboard and queryable via get_agent_usage.

Example prompts

"What's our GPU utilisation trend over the last 7 days?"
"Show me the cost breakdown per cluster this week."
"Deploy llama3:8b on vibops-dev with 2 replicas."
"Which clusters have open critical GPU alerts?"
"Scale the inference deployment to 4 replicas on prod-cluster."
"What's our MTTR for critical alerts?"
"Are there any open GPU anomalies right now?"
"What's our AI Act compliance score and which controls are non-compliant?"
"Generate a SOC 2 report for Q1 2026."
"Verify the audit chain hasn't been tampered with."
"Show me the spend trend for the last 7 days and flag any waste."
"Create a machine identity for the pricing-agent with a 1-year expiry."
"Which agents depend on the claude-opus-4-6 model?"
"Which agent costs the most in GPU this month?"
"Show me the inference cost breakdown for the pricing agent."
"What's the GPU spend per team for the last 7 days?"

Contributing

See CONTRIBUTING.md. All contributions require a DCO sign-off (git commit -s).

License

MIT — free to use, modify, and distribute. See LICENSE.

Built on FastMCP and the VibOps platform.