AgentDesk MCP
Adversarial AI quality review for LLM pipelines. Dual-reviewer consensus with anti-gaming protection. BYOK — works with Claude Code, Claude Desktop, and any MCP client.
AgentDesk MCP — Adversarial AI Review
Quality control for AI pipelines — one MCP tool. Works with Claude Code, Claude Desktop, and any MCP client.
29.5% of teams do NO evaluation of AI outputs. (LangChain Survey) Knowledge workers spend 4.3 hours/week fact-checking AI outputs. (Microsoft 2025)
AgentDesk MCP fixes this. Add independent adversarial review to any AI pipeline in 30 seconds.
Quick Start
npm (recommended)
npx @ezark-publish/agentdesk-mcp
Claude Code
claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp
Claude Desktop
{
"mcpServers": {
"agentdesk-mcp": {
"command": "npx",
"args": ["-y", "@ezark-publish/agentdesk-mcp"],
"env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
}
}
}
HTTP Transport (Streamable HTTP)
Run as an HTTP server for remote access, Smithery hosting, or multi-client setups:
# Start with HTTP transport on port 3100
MCP_HTTP_PORT=3100 npx @ezark-publish/agentdesk-mcp
# Or use the --http flag (defaults to port 3100)
npx @ezark-publish/agentdesk-mcp --http
MCP endpoint: POST http://localhost:3100/mcp
Health check: GET http://localhost:3100/health
Install from GitHub (alternative)
npm install github:Rih0z/agentdesk-mcp
Requirements
ANTHROPIC_API_KEYenvironment variable (uses your own key — BYOK)
Tools
review_output
Adversarial quality review of any AI-generated output. An independent reviewer assumes the author made mistakes and actively looks for problems.
Input:
| Parameter | Required | Description |
|---|---|---|
output | Yes | The AI-generated output to review |
criteria | No | Custom review criteria |
review_type | No | Category: code, content, factual, translation, etc. |
model | No | Reviewer model (default: claude-sonnet-4-6) |
Output:
{
"verdict": "PASS | FAIL | CONDITIONAL_PASS",
"score": 82,
"issues": [
{
"severity": "high",
"category": "accuracy",
"description": "Claim about X is unsupported",
"suggestion": "Add citation or remove claim"
}
],
"checklist": [
{
"item": "Factual accuracy",
"status": "pass",
"evidence": "All statistics match cited sources"
}
],
"summary": "Overall assessment...",
"reviewer_model": "claude-sonnet-4-6"
}
review_dual
Dual adversarial review — two independent reviewers assess the output from different angles, then a merge agent combines findings.
- If either reviewer finds a critical issue → merged verdict is FAIL
- Takes the lower score
- Combines and deduplicates all issues
Use for high-stakes outputs where quality is critical.
Same parameters as review_output.
How It Works
- Adversarial prompting: The reviewer is instructed to assume mistakes were made. No benefit of the doubt.
- Evidence-based checklist: Every PASS item requires specific evidence. Items without evidence are automatically downgraded to FAIL.
- Anti-gaming validation: If >30% of checklist items lack evidence, the entire review is forced to FAIL with a capped score of 50.
- Structured output: Verdict + numeric score + categorized issues + checklist (not just "looks good").
Use Cases
- Code review: Check for bugs, security issues, performance problems
- Content review: Verify accuracy, readability, SEO, audience fit
- Factual verification: Validate claims in AI-generated text
- Translation quality: Check accuracy and naturalness
- Data extraction: Verify completeness and correctness
- Any AI output: Summaries, reports, proposals, emails, etc.
Why Not Just Ask the Same AI to Review?
Self-review has systematic leniency bias. An LLM reviewing its own output shares the same blind spots that created the errors. Research shows models are 34% more likely to use confident language when hallucinating.
AgentDesk uses a separate reviewer invocation with adversarial prompting — fundamentally different from self-review.
Comparison
| Feature | AgentDesk MCP | Manual prompt | Braintrust | DeepEval |
|---|---|---|---|---|
| One-tool setup | Yes | No | No | No |
| Adversarial review | Yes | DIY | No | No |
| Dual reviewer | Yes | DIY | No | No |
| Anti-gaming validation | Yes | No | No | No |
| No SDK required | Yes | Yes | No | No |
| MCP native | Yes | No | No | No |
Limitations
- Prompt injection: Like all LLM-as-judge systems, adversarial inputs could attempt to manipulate reviewer verdicts. The anti-gaming validation layer mitigates superficial gaming, but determined adversarial inputs remain a challenge. For high-stakes use cases, combine with deterministic validation.
- BYOK cost: Each
review_outputcall makes 1 LLM API call;review_dualmakes 3. Factor this into your pipeline costs.
Hosted API (Separate Product)
For teams that prefer HTTP integration, a hosted REST API with additional features (agent marketplace, context learning, workflows) is available at agentdesk.usedevtools.com.
Development
git clone https://github.com/Rih0z/agentdesk-mcp.git
cd agentdesk-mcp
npm install
npm test # 35 tests
npm run build
License
MIT
Built by EZARK Consulting | Web Version
เซิร์ฟเวอร์ที่เกี่ยวข้อง
Alpha Vantage MCP Server
ผู้สนับสนุนAccess financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more
Flow MCP
A set of tools for interacting with the Flow blockchain through the Model Context Protocol.
Agent Passport System
Cryptographic identity, scoped delegation, values governance, and deliberative consensus for AI agents. 11 tools, Ed25519 signatures, zero blockchain.
Knowledge Graph Memory Server
Enables persistent memory for Claude using a local knowledge graph of entities, relations, and observations.
Cisco SSH MCP Server
Connect to, configure, and monitor Cisco network devices like routers and switches via SSH.
CotForce MCP
MCP server that enforces step-by-step Chain-of-Thought — turns 4B models into methodical reasoners.
Tauri Development MCP Server
Build, test, and debug mobile and desktop apps with the Tauri framework faster with automated UI interaction, screenshots, DOM state, and console logs from your app under development.
portkey-admin-mcp
Full MCP server for the https://portkey.ai AI Gateway Admin API with 151 tools across 18 domains.
Projet MCP Server-Client
An implementation of the Model Context Protocol (MCP) for communication between AI models and external tools, featuring server and client examples in Python and Spring Boot.
Remote MCP Server (Authless)
An example of a remote MCP server deployable on Cloudflare Workers, without authentication.
MCP Server Executable
An executable server for running MCP services, featuring tool chaining, multi-service management, and plugin support.