MCP Test Utils

Desktop UI automation for AI agents: screenshots, window management, mouse, keyboard, UI Automation tree, OCR

MCP Test Utils

100% AI Code · Human Reviewed

version: 3.10.1 tools: 19 AI generated: 100%

MCP server for automated desktop UI testing. A single binary — no runtime, no dependencies, no installation.

Windows x64 only. macOS and Linux support is planned.

Gives AI agents eyes and hands: screenshots, window management, mouse, keyboard, UI Automation, OCR, file search.

Why

AI agents can trigger actions in applications but can't see the screen. This server bridges that gap:

Agent triggers action → takes screenshot → sees the result →
switches window → clicks a button → verifies → writes report

Fully autonomous, no user involvement required.

Demo

10 tasks. One take. Watch on YouTube →

MCP Test Utils — Full Demo

MCP Test Utils vs Anthropic Computer Use

Claude Cowork now includes built-in Computer Use — Claude takes screenshots and clicks through interfaces visually. It works with zero setup. MCP Test Utils takes a different approach: instead of guessing where to click from a screenshot, it reads the actual UI structure through Windows APIs.

MCP Test UtilsComputer Use
Click precisionExact — UI Automation APIVisual estimate from screenshot
Speed & token costFast, low cost — text responsesSlower, costly — image on every step
UI structureFull tree: roles, states, coordinatesNot available
OCRWord-level coordinates, multi-languageNot available (model vision only)
Window managementAPI-based, window-relative coordsVisual navigation
File searchRipgrep engine built-inNot available
Session loggingJSONL + screenshotsNot available
Visual analysis✅ Same Claude model, full-res 1:1✅ Same Claude model
SetupDownload binary, add to configBuilt-in, one toggle
Mobile / Dispatch✅ Tasks from phone
Cross-platformWindows (macOS/Linux planned)macOS + Windows

MCP Test Utils is faster, more precise, and cheaper per action. Computer Use is easier to start and works across platforms. They complement each other.

Platforms

PlatformStatus
Windows x64✅ Full support
macOS arm64⏳ Planned
Linux x64⏳ Planned

Tools (19)

Vision

ToolDescription
take_screenshotScreenshot of the entire desktop with configurable quality
take_window_screenshotScreenshot of a specific window (screen or window capture mode)
read_screen_textOCR the entire screen (Windows.Media.Ocr)
read_region_textOCR a screen region with precise word coordinates

Window Management

ToolDescription
list_windowsList windows with id, title, app, position, size, minimized, focused
focus_windowBring a window to front, restore if minimized

Input

ToolDescription
mouse_clickClick (left / right / middle) at screen or window-relative coordinates
mouse_moveMove cursor to a point
mouse_dragDrag from point A to point B
mouse_scrollScroll the mouse wheel
keyboard_typeType text (full Unicode — Latin, Cyrillic, CJK, emoji)
keyboard_pressPress a key (Enter, Tab, F1–F12, arrows, etc.)
keyboard_shortcutKey combinations (Ctrl+S, Alt+F4, Ctrl+Shift+P, etc.)

Structured UI Access

ToolDescription
list_ui_elementsUI Automation tree — buttons, fields, menus with exact coordinates

File Search

ToolDescription
search_in_filesSearch text or regex in files within allowed directories (like VS Code Find in Files)
find_filesFind files and directories by name pattern (glob), like "Go to File"

Agent Guide

ToolDescription
get_usage_guideCompact workflow guide for LLM agents — precision clicking, coordinate metadata, quality tips

Session Logging

ToolDescription
enable_loggingStart recording tool calls to JSONL + screenshots (opt-in)
disable_loggingStop recording, get session stats

Installation

  1. Download the binary from Releases.
  2. Add it to your MCP client config. Example below is for Claude Desktop — for other clients, refer to their documentation.

Claude Desktop: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe"
    }
  }
}
  1. Restart Claude Desktop.
  2. In chat, try: "Take a screenshot" — the agent will return an image of your desktop.

With Logging and File Search (optional)

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe",
      "env": {
        "MCP_LOG_DIR": "D:\\path\\to\\logs",
        "MCP_LOG_MAX_MB": "500",
        "MCP_LOG_RETAIN_DAYS": "30",
        "MCP_SEARCH_DIRS": "D:\\Projects\\app1;D:\\Projects\\app2"
      }
    }
  }
}

Quality Presets

Screenshots support configurable quality to balance detail and token cost:

PresetScaleFormatUse Case
full100%JPEG q90Maximum detail
standard50%JPEG q70Balanced (default)
compact50%PNGWhen PNG is needed
minimal25%GrayscaleLowest token cost
custom10–100%JPEG / PNG / GrayscaleFull control

Environment Variables

VariableDescriptionDefault
MCP_LOG_DIRPath for log sessions. Without it, logging tools are hidden
MCP_LOG_MAX_MBSession size limit (warning on exceed)500
MCP_LOG_RETAIN_DAYSAuto-delete sessions older than N days. 0 to disable30
MCP_SEARCH_DIRSAllowed directories for search_in_files (; on Windows, : on macOS/Linux). Without it, the tool is hidden

How It Works

MCP Test Utils is a JSON-RPC 2.0 server communicating over stdin/stdout. Any MCP-compatible client launches the binary, sends tool calls, and receives structured responses (text, base64 images). Tested with Claude Desktop.

The server uses native Windows APIs directly — Win32 GDI for screenshots, SendInput for mouse and keyboard, UI Automation COM API for element inspection, WinRT Windows.Media.Ocr for text recognition. File search uses the ripgrep engine (grep-regex, grep-searcher, ignore) — cross-platform, no external dependencies. No PowerShell, no external tools, no network access.

Use Cases

  • Automated QA — agent navigates the app, clicks through flows, takes screenshots at each step, writes a test report
  • Desktop automation — fill forms, copy data between windows, run workflows
  • Accessibility audit — scan UI Automation tree for missing labels or roles
  • Visual regression — screenshot comparison across releases
  • Data extraction — OCR text from applications that don't expose APIs
  • Code search — find patterns across multiple projects without leaving the agent session

Security

  • Responds only to requests from the MCP client
  • Opens no network ports
  • Writes nothing to disk (except opt-in logging)
  • Sends no data externally
  • Screenshots capture the entire screen — make sure no sensitive information is visible
  • File search is sandboxed — only directories in MCP_SEARCH_DIRS are accessible

Support us

Free and unrestricted. If you find it useful — jeenyjai.github.io

License

Copyright 2026 JeenyJAI. All rights reserved.


🚀 Created with Claude

Related Servers

NotebookLM Web Importer

Import web pages and YouTube videos to NotebookLM with one click. Trusted by 200,000+ users.

Install Chrome Extension