Bernstein MCP Server

Multi-agent orchestration MCP server. Start parallel agent runs, manage task queues, track costs, and verify quality gates across 20+ CLI coding agents.

GitHub

Documentation

"To achieve great things, two things are needed: a plan and not quite enough time." - Leonard Bernstein

why the name?

Bernstein is named after Leonard Bernstein, the American conductor and composer. The project orchestrates a crew of CLI coding agents the way Bernstein conducted the New York Philharmonic: every player on cue, the score deterministic, the conductor accountable for the result. He is the original orchestrator the project takes its name from.

deterministic multi-agent CLI orchestration

website · docs · install · first run · glossary · limitations · sponsor

Bernstein is a deterministic Python scheduler that runs a crew of CLI coding agents (Claude Code, Codex, Gemini CLI, and 40 more) against a single goal in parallel git worktrees, with an HMAC-signed audit chain over every step.

at a glance

44 CLI agent adapters in v2.2.x: 41 third-party wrappers, 2 leaf-node delegators, plus a generic --prompt wrapper. Source of truth: the supported agents table below.
HMAC-SHA256 audit chain per RFC 2104, one record per scheduling decision, tamper-evident. Operator guide: docs/security/audit-log.md.
Bearer-token task server authenticates the manager and every worker. Per-session zero-trust JWT in .sdd/runtime/agent_tokens/, legacy BERNSTEIN_AUTH_TOKEN fallback, opt-out via BERNSTEIN_AUTH_DISABLED=1. Flow + diagnostics: docs/security/manager-auth.md.
Signed agent cards use detached JWS (RFC 7515 §A.5) over RFC 8785 (JCS) canonicalization, with Ed25519 / EdDSA keys. Code: src/bernstein/core/security/agent_card_signer.py.
Per-artefact lineage records every file write linked back to producer + inputs + prompt SHA + model + cost. CLI: bernstein lineage verify <run_id>.
Deterministic scheduler: zero LLM in the coordination loop. Plain Python decides who runs, where, with what budget. Replay yesterday's plan, get yesterday's task graph.

why this exists

i wrote bernstein because i was paying $400/month in claude bills running three coding agents in parallel and getting nondeterministic merges.

Apache 2.0, solo maintained. Live stats: bernstein.run.

install in 30 seconds

pipx install bernstein
bernstein init
bernstein run -g "fix the failing test in tests/test_foo.py"

See installed integrations: bernstein integrations list --installed.

sponsor

If Bernstein routed a model that saved you a Claude bill, $25 covers a month of my coffee.

github.com/sponsors/chernistry

who this is for

Specific shapes where the value lands:

engineering teams running >=3 CLI coding agents in parallel: each agent gets its own git worktree, the merge queue serialises landings, no race conditions
operators running compliance-sensitive workflows: every routing decision is plaintext, the audit log is HMAC-signed and tamper-evident, no SaaS hop, no third-party data plane
platform teams that need an audit log of agent decisions: the orchestrator writes one row per scheduling decision, you can grep it
anyone burning more than $1k/mo on coding agents who wants determinism: you can replay yesterday's plan and get yesterday's task graph
forward-deployed engineers dropping into a client repo: credentials stay in your env, not the client's; agents you spawn are whichever CLI tool the client already trusts

If you nodded at two of those bullets, this fits.

who this is NOT for

"I want one pair-programmer to chat with about my code": a single CLI agent is fine. Bernstein adds orchestration overhead you don't need.
prototypes where merge gates are overkill: the lint/types/tests/cross-model-review pipeline is value when the cost of a bad merge is real, friction when you're throwing the repo away on Friday.
non-coding tasks (research, writing, data analysis pipelines): Bernstein wraps CLI coding agents specifically, not generic LLM workflows.
anyone who wants a SaaS wrapper with a credit-card form: Bernstein is on-prem only by design.
teams that need a vendor with a support SLA and a contract: solo open-source project. GitHub issues are how support happens.
research-shape "let the agents collaborate emergently" use cases: the deterministic scheduler is a hard wall there.

how it compares

Closest neighbours in this category live in docs/compare/README.md. What Bernstein does well is the auditability surface: HMAC-chained audit, signed agent cards, per-artefact lineage, air-gap deploy profile, plus the widest CLI adapter coverage.

what is this, in one paragraph

You tell Bernstein what you want built. It splits the work across several AI coding agents, runs them in parallel inside isolated git worktrees, records every handoff in an HMAC-SHA256-chained audit log (RFC 2104), runs the tests, and merges the code that actually passes. File-based state (.sdd/), per-agent credential scoping, signed audit trail.

other install methods

curl -fsSL https://bernstein.run/install.sh | sh        # macOS / Linux one-liner
irm https://bernstein.run/install.ps1 | iex             # Windows PowerShell
pip install bernstein                                   # pip
uv tool install bernstein                               # uv
brew tap chernistry/tap && brew install bernstein       # Homebrew

See the full install matrix for dnf copr, npx, optional extras, and the wheelhouse path for air-gapped sites.

why the scheduler is plain Python

Most agent orchestrators use an LLM to decide who does what. That is non-deterministic and burns tokens on scheduling instead of code. Bernstein does one LLM call to break down your goal, then the rest (running agents in parallel, isolating their git branches, running tests, routing retries) is plain Python. Every run is reproducible. Every step is logged and replayable.

No framework to learn. No vendor lock-in. Swap any agent, any model, any provider.

Bernstein in action: parallel AI agents orchestrated in real time

What you see while it runs:

$ bernstein -g "Add JWT auth"
[manager] decomposed into 4 tasks
[agent-1] claude-sonnet: src/auth/middleware.py  (done, 2m 14s)
[agent-2] codex:         tests/test_auth.py      (done, 1m 58s)
[verify]  all gates pass. merging to main.

YAML workflow manifests (optional)

When bernstein run -g "<goal>" is too coarse-grained, bernstein workflow runs a declarative DAG of agent / command / loop nodes. Manifests are plain YAML, validated up-front, dispatched through the same AgentSpawner the rest of Bernstein uses.

bernstein workflow list                          # bundled + user-installed
bernstein workflow run idea-to-pr -g "Add JWT auth"
bernstein workflow init my-flow                  # scaffold a starter manifest
bernstein workflow validate path/to/flow.yaml

Stock workflows shipping in the wheel: idea-to-pr, refactor-with-tests, security-review, doc-update, dependency-bump, hot-fix. Loop nodes re-fire until a bash predicate exits 0. fresh_context: true mints a new agent session per iteration. Per-step CLI/model routing: docs/workflows/per-step-routing.md.

use cases

forward-deployed engineering: drop the crew onto a client repo when you arrive, take it with you when you leave.
self-evolving projects: point Bernstein at its own repo and let it execute the backlog (this codebase is one).
CI fleets: run a crew of agents in parallel on PRs, with per-agent credential scoping and signed audit trail.
air-gapped deployment: install from a signed wheelhouse, run with --profile airgap to deny outbound by default. See Air-gap installation.

supported agents

Bernstein auto-discovers installed CLI agents. Mix them in the same run. Cheap local models for boilerplate, heavier cloud models for architecture.

44 CLI agent adapters: 41 third-party wrappers, 2 leaf-node delegators, plus a generic wrapper for anything with --prompt.

Agent	Models	Install
Claude Code	Opus 4, Sonnet 4.6, Haiku 4.5	`npm install -g @anthropic-ai/claude-code`
Codex CLI	GPT-5, GPT-5 mini	`npm install -g @openai/codex`
OpenAI Agents SDK v2	GPT-5, GPT-5 mini, o4	`pip install 'bernstein[openai]'`
GitHub Copilot CLI	Copilot-managed (GPT-5, Sonnet 4.6)	`npm install -g @github/copilot`
Gemini CLI	Gemini 2.5 Pro, Gemini Flash	`npm install -g @google/gemini-cli`
Cursor	Sonnet 4.6, Opus 4, GPT-5	Cursor app
Devin Terminal (Cognition)	Devin-managed	`curl -fsSL https://cli.devin.ai/install.sh \| bash` then `devin auth login`
Aider	Any OpenAI/Anthropic-compatible	`pip install aider-chat`
Amp	Amp-managed	`npm install -g @sourcegraph/amp`
CLM gateway (sovereign / on-prem LLM)	Any OpenAI-compatible CLM endpoint	`pip install aider-chat`, then set `CLM_ENDPOINT` / `CLM_TOKEN`
Cody	Sourcegraph-hosted	`npm install -g @sourcegraph/cody`
Continue	Any OpenAI/Anthropic-compatible	`npm install -g @continuedev/cli` (binary: `cn`)
Goose	Any provider Goose supports	See Goose docs
IaC (Terraform/Pulumi)	Any provider the base agent uses	Built-in
Junie	BYOK (Anthropic, OpenAI, Google, xAI, OpenRouter, Copilot)	`curl -fsSL https://junie.jetbrains.com/install.sh \| bash`
Kilo	Kilo-hosted	See Kilo docs
Kiro	Kiro-hosted	See Kiro docs
AWS Q Developer	Amazon Q-managed (Claude-backed)	`brew install --cask amazon-q` then `q login`
Ollama + Aider	Local models (offline)	`brew install ollama`
OpenCode	Any provider OpenCode supports	See OpenCode docs
Qwen	Qwen Code models	`npm install -g @qwen-code/qwen-code`
Cloudflare Agents	Workers AI models	`bernstein cloud login`
OpenHands	Any LiteLLM-supported (Anthropic, OpenAI, ...)	`uv tool install openhands --python 3.12`
Open Interpreter	Any (LiteLLM-backed)	`pip install open-interpreter`
gptme	Anthropic, OpenAI, OpenRouter	`pipx install gptme`
Plandex	Plandex Cloud or self-hosted models	`curl -sL https://plandex.ai/install.sh \| bash`
AIChat	OpenAI, Anthropic, OpenRouter, Groq, Gemini	`cargo install aichat`
Letta Code	Letta-routed (Anthropic, OpenAI)	`npm install -g @letta-ai/letta-code`
Generic	Any CLI with `--prompt`	Built-in

Any adapter also works as the internal scheduler LLM:

internal_llm_provider: gemini            # or qwen, ollama, codex, goose, ...
internal_llm_model: gemini-3.1-pro

[!TIP] Run bernstein --headless for CI pipelines. No TUI, structured JSON output, non-zero exit on failure.

quick start

cd your-project
bernstein init                    # creates .sdd/ workspace + bernstein.yaml
bernstein -g "Add rate limiting"  # agents spawn, work in parallel, verify, exit
bernstein live                    # watch progress in the TUI dashboard
bernstein stop                    # graceful shutdown with drain

For multi-stage projects, define a YAML plan:

bernstein run plan.yaml           # skips LLM planning, goes straight to execution
bernstein run --dry-run plan.yaml # preview tasks and estimated cost

web UI

v2.0.0 ships a minimal web UI (operator-requested; UI is a side surface, core orchestrator is the priority).

bernstein gui serve               # http://127.0.0.1:8052/ui/
bernstein gui serve --dev         # expects `npm run dev` on :5173
bernstein gui serve --minimal     # skip the full /api/v1/* surface

The Vite bundle is committed under src/bernstein/gui/static/, so wheel installs work without a Node toolchain. Surface tour + per-task drawer: docs/web-ui.md.

how it works

Bernstein runs a four-stage pipeline per goal:

Decompose. The manager breaks your goal into tasks with roles, owned files, and completion signals. One LLM call, then plain Python from there.
Spawn. Agents start in isolated git worktrees, one per task. Main branch stays clean.
Verify. The janitor checks concrete signals: tests pass, files exist, lint clean, types correct.
Merge. Verified work lands in main. Failed tasks get retried or routed to a different model.

The orchestrator is a Python scheduler, not an LLM. Scheduling decisions are deterministic, auditable, and reproducible. Every step writes a record to the HMAC-chained audit log (.sdd/audit/YYYY-MM-DD.jsonl) per RFC 2104.

cloud execution (Cloudflare)

bernstein cloud runs agents on Cloudflare Workers with R2-backed workspace sync. See docs/cloudflare/.

bernstein cloud login      # authenticate with Bernstein Cloud
bernstein cloud deploy     # push agent workers
bernstein cloud run plan.yaml  # execute a plan on Cloudflare

capabilities

Bernstein ships parallel execution + worktree isolation + a janitor that gates merges on tests/lint/types, signed lineage records, MCP server mode, an HMAC-SHA256 audit chain, and 44 CLI adapters out of the box. Pluggable sandbox backends (worktree, Docker, E2B, Modal), pluggable artifact sinks (local, S3, GCS, Azure Blob, R2), progressive-disclosure skill packs, and a lethal-trifecta capability gate round it out.

Full feature matrix: docs/reference/FEATURE_MATRIX.md. Recent features: docs/whats-new.md.

regulatory anchors

Regulatory mappings (EU AI Act Article 12, SOC 2 CC4/CC7, DORA / NIS2, OWASP ASI06, RFC 2104/7515/8785/8037/7636/8707) live in docs/compliance/. These are mappings, not certifications.

operator commands

Highest-value commands; full list in docs/operations/commands.md.

Command	What it does
`bernstein pr`	Auto-creates a GitHub PR from a completed session; body carries the janitor's gate results and cost breakdown.
`bernstein from-ticket <url>`	Imports a Linear / GitHub Issues / Jira ticket as a Bernstein task.
`bernstein autofix`	Daemon that monitors open Bernstein PRs; spawns a fixer agent when CI fails.
`bernstein hooks`	Lifecycle hooks (`pre_task`, `post_task`, `pre_merge`, etc.) as shell scripts or pluggy `@hookimpl`s.
`bernstein backlog claim --role reviewer`	Atomically claims one eligible row from `.sdd/runtime/task-backlog.json` for external workers.
`bernstein chat serve --platform=telegram\|discord\|slack`	Drive runs from chat with `/run`, `/status`, `/approve`, `/reject`.
`bernstein workflow run <name>`	Run a YAML workflow manifest.
`bernstein schedule add\|list\|run`	Manage operator-registered recurring schedules; `schedule audit` walks persisted fire receipts to prove the sequence is replayable.

retrieval & caching: what's actually under the hood

Bernstein deliberately uses no neural embeddings, no vector databases, and no external embedding APIs. There are two retrieval/caching layers, both keyword/lexical:

Codebase RAG (core/knowledge/rag.py): SQLite FTS5 with BM25 ranking and AST-aware chunking for Python files.
Semantic cache (core/knowledge/semantic_cache.py): TF (term-frequency) cosine similarity over word counts, not learned embeddings.

If you need real semantic retrieval (vector DB, neural embeddings), wire it yourself via the retrieval role/skill in templates/; nothing in core performs vector search.

install

Method	Command
One-liner (macOS / Linux)	`curl -fsSL https://bernstein.run/install.sh \| sh`
One-liner (Windows)	`irm https://bernstein.run/install.ps1 \| iex`
pip	`pip install bernstein`
pipx	`pipx install bernstein`
uv	`uv tool install bernstein`
Homebrew	`brew tap chernistry/tap && brew install bernstein`
Fedora / RHEL	`sudo dnf copr enable alexchernysh/bernstein && sudo dnf install bernstein`
npm (wrapper)	`npx bernstein-orchestrator`
Docker (GHCR)	`docker run --rm -v "$PWD:/work" -w /work -e ANTHROPIC_API_KEY ghcr.io/sipyourdrink-ltd/bernstein:latest run -g "fix tests/test_foo.py"`

The one-liner scripts check for Python 3.12+, bootstrap pipx when it's missing, fix PATH for the current session, and install (or upgrade) bernstein. Script sources: install.sh · install.ps1.

optional extras

Provider SDKs are optional so the base install stays lean.

Extra	Enables
`bernstein[openai]`	OpenAI Agents SDK v2 adapter (`openai_agents`)
`bernstein[docker]`	Docker sandbox backend
`bernstein[e2b]`	E2B microVM sandbox backend (needs `E2B_API_KEY`)
`bernstein[modal]`	Modal sandbox backend, optional GPU (needs `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`)
`bernstein[s3]`	S3 artifact sink (via `boto3`)
`bernstein[gcs]`	Google Cloud Storage artifact sink
`bernstein[azure]`	Azure Blob artifact sink
`bernstein[r2]`	Cloudflare R2 artifact sink (S3-compatible `boto3`)
`bernstein[grpc]`	gRPC bridge
`bernstein[k8s]`	Kubernetes integrations

Combine extras with brackets, e.g. pip install 'bernstein[openai,docker,s3]'.

Editor extensions: VS Marketplace · Open VSX

security

OpenSSF Scorecard. Weekly run via .github/workflows/scorecard.yml. Results uploaded to GitHub Code Scanning. Badge above.
Fuzzing. ClusterFuzzLite config at .clusterfuzzlite/ plus a cifuzz-pr workflow (.github/workflows/cifuzz-pr.yml) provide an OSSF-recognized fuzzing harness on top of the existing Hypothesis property-test suite.
Vulnerability disclosure. See SECURITY.md.

contributing

PRs welcome. See CONTRIBUTING.md for setup and code style.

support

If Bernstein saves you time: GitHub Sponsors.

Contact: [email protected].

featured in

Augment Code - 9 Open-Source Agent Orchestrators for AI Coding (2026); editorial roundup.
nibzard/awesome-agentic-patterns; Bernstein cited as the production implementation of the "deterministic zero-LLM orchestration" pattern.
Python Weekly; newsletter mention.
Future Digest; cost-cutting playbook write-up.

More awesome-lists, MCP catalogs, and prior-art citations

Awesome lists: Jenqyang/Awesome-AI-Agents, jamesmurdza/awesome-ai-devtools, jim-schwoebel/awesome_ai_agents, Piebald-AI/awesome-gemini-cli, ComposioHQ/awesome-codex-skills, punkpeye/awesome-mcp-servers, jxzhangjhu/Awesome-LLM-RAG, rohitg00/awesome-claude-code-toolkit, numtide/llm-agents.nix, andyrewlee/awesome-agent-orchestrators, bradAGI/awesome-cli-coding-agents, milisp/awesome-codex-cli, yaolifeng0629/Awesome-independent-tools, caramaschiHG/awesome-ai-agents-2026, ai-for-developers/awesome-vibe-coding, taishi-i/awesome-ChatGPT-repositories, eudk/awesome-ai-tools, killop/anything_about_game, vinta/awesome-python, Zijian-Ni/awesome-ai-agents-2026, rohitg00/awesome-devops-mcp-servers, Glama MCP Catalog. Mirrors: icopy-site/awesome, icopy-site/awesome-cn, trackawesomelist/trackawesomelist.

Prior-art citations by peer projects: mkb23/overcode, Vintersong/NOVA-Cognition-Framework, AJV009/drupal-contrib-workbench, danielvaughan/codex-blog.

Directories: AlternativeTo.

cite

Machine-readable metadata lives in CITATION.cff (CFF 1.2.0); GitHub renders the "Cite this repository" button automatically. A Zenodo DOI will be minted on the next release.

license

Apache License 2.0

Alex Chernysh · GitHub · X · bernstein.run

Translations available in 11 languages: see docs/i18n/.