release-readiness-triage-mcp

Aggregates CI failures and outputs GO/NO_GO release verdicts

🚦 release-readiness-triage-mcp

npm CI License: MIT

Stop reading CI logs. Start getting verdicts.

MCP server that aggregates test failures, cross-references flakiness history, and outputs a GO / NO_GO / INVESTIGATE release decision β€” so your AI agent can triage a broken CI run in seconds instead of asking you to read 3000 lines of logs.


πŸ€” The problem

In any real codebase, CI always has something failing. The hard question isn't "are there failures?" β€” it's "are these failures real regressions, or just the usual noise?"

Answering that requires correlating three signals at once:

  • πŸ” Error signatures β€” is this the same failure repeated 12 times, or 12 different problems?
  • πŸ“Š Flakiness history β€” is this test known to be unreliable?
  • πŸ”— Code changes β€” is the failing test actually related to what changed?

An AI agent can't do this without structured tools. Raw CI logs are thousands of lines. Flakiness databases are external. Code→test mapping requires AST analysis. Without this MCP, the agent just guesses.


πŸ› οΈ Tools

aggregate_suite_failures

Groups failures by normalized error signature, deduplicates repeated errors, categorizes as assertion / timeout / network / crash. Pass customInfraPatterns for cloud-specific errors.

cross_reference_flakiness

Scores each failure against your flakiness history: KNOWN FLAKY, MILDLY FLAKY, or NO HISTORY.

correlate_code_changes

Matches changed files against failing tests. Works standalone or with pre-computed affected test lists from ast-impact-mapper-mcp.

generate_release_recommendation

The final step. Outputs GO / NO_GO / INVESTIGATE with confidence score and full breakdown. Supports format: "markdown" for GitHub PR comments and Slack.


πŸ§ͺ What it looks like in practice

5 failures in CI. What's real, what's noise?

failures:
  - Auth Suite > login with expired token   β†’ "Expected status 200, got 401"
  - API Suite > health check                β†’ "connect ECONNREFUSED 127.0.0.1:3000"
  - Button Suite > renders button correctly β†’ "Expected null, got <button>Submit</button>"
  - Search Suite > debounce timing          β†’ "Expected 42, received 43"
  - Storage Suite > upload avatar           β†’ "GCP quota exceeded for this project"

changedFiles: ["src/components/Button.tsx"]
affectedTests: ["renders button correctly"]
customInfraPatterns: ["GCP quota exceeded"]
format: "markdown"

Output:

## πŸ”΄ Release Recommendation: NO_GO (75% confidence)

> 1 confirmed regression(s) directly correlated with code changes. Do not release.

| Category            | Count |
| ------------------- | ----- |
| Total failures      | 5     |
| πŸ”΄ Real regressions | 1     |
| 🟑 Known flaky      | 2     |
| βšͺ Infra blips      | 2     |
| ❓ Unknown          | 0     |

### πŸ”΄ Blockers (must fix before release)

**Button Suite > renders button correctly**

- Test is directly affected by code changes in this commit
- `Expected null, got <button>Submit</button>`

### βœ… Safe to ignore

- ~~Auth Suite > login with expired token~~ β€” Historically flaky: 73% failure rate in history
- ~~API Suite > health check~~ β€” Error pattern matches infrastructure issues (network)
- ~~Search Suite > debounce timing~~ β€” Mildly flaky: 22% historical failure rate
- ~~Storage Suite > upload avatar~~ β€” Error pattern matches infrastructure issues (network)

One tool call. One verdict. Go fix Button.tsx.


⚑ Setup

{
  "mcpServers": {
    "release-readiness-triage": {
      "command": "npx",
      "args": ["-y", "release-readiness-triage-mcp"]
    }
  }
}

πŸš€ Usage

"Here are the failures from our CI run, our flakiness database, and the files changed in this PR. Is it safe to release?"

The agent calls generate_release_recommendation and returns a verdict with a full breakdown β€” ready to paste into a PR comment or Slack.

Works standalone, or as a meta-orchestrator on top of:


πŸ“¦ Links

License

MIT

Related Servers