MCP Test Utils

Desktop UI automation for AI agents: screenshots, window management, mouse, keyboard, UI Automation tree, OCR

GitHub

MCP Test Utils

100% AI Code · Human Reviewed

MCP server for automated desktop UI testing. A single binary — no runtime, no dependencies, no installation.

Windows x64 only. macOS and Linux support is planned.

Gives AI agents eyes and hands: screenshots, window management, mouse, keyboard, UI Automation, OCR, file search.

Why

AI agents can trigger actions in applications but can't see the screen. This server bridges that gap:

Agent triggers action → takes screenshot → sees the result →
switches window → clicks a button → verifies → writes report

Fully autonomous, no user involvement required.

Demo

10 tasks. One take. Watch on YouTube →

MCP Test Utils vs Anthropic Computer Use

Claude Cowork now includes built-in Computer Use — Claude takes screenshots and clicks through interfaces visually. It works with zero setup. MCP Test Utils takes a different approach: instead of guessing where to click from a screenshot, it reads the actual UI structure through Windows APIs.

	MCP Test Utils	Computer Use
Click precision	Exact — UI Automation API	Visual estimate from screenshot
Speed & token cost	Fast, low cost — text responses	Slower, costly — image on every step
UI structure	Full tree: roles, states, coordinates	Not available
OCR	Word-level coordinates, multi-language	Not available (model vision only)
Window management	API-based, window-relative coords	Visual navigation
File search	Ripgrep engine built-in	Not available
Session logging	JSONL + screenshots	Not available
Visual analysis	✅ Same Claude model, full-res 1:1	✅ Same Claude model
Setup	Download binary, add to config	Built-in, one toggle
Mobile / Dispatch	—	✅ Tasks from phone
Cross-platform	Windows (macOS/Linux planned)	macOS + Windows

MCP Test Utils is faster, more precise, and cheaper per action. Computer Use is easier to start and works across platforms. They complement each other.

Platforms

Platform	Status
Windows x64	✅ Full support
macOS arm64	⏳ Planned
Linux x64	⏳ Planned

Tools (19)

Vision

Tool	Description
`take_screenshot`	Screenshot of the entire desktop with configurable quality
`take_window_screenshot`	Screenshot of a specific window (screen or window capture mode)
`read_screen_text`	OCR the entire screen (Windows.Media.Ocr)
`read_region_text`	OCR a screen region with precise word coordinates

Window Management

Tool	Description
`list_windows`	List windows with id, title, app, position, size, minimized, focused
`focus_window`	Bring a window to front, restore if minimized

Input

Tool	Description
`mouse_click`	Click (left / right / middle) at screen or window-relative coordinates
`mouse_move`	Move cursor to a point
`mouse_drag`	Drag from point A to point B
`mouse_scroll`	Scroll the mouse wheel
`keyboard_type`	Type text (full Unicode — Latin, Cyrillic, CJK, emoji)
`keyboard_press`	Press a key (Enter, Tab, F1–F12, arrows, etc.)
`keyboard_shortcut`	Key combinations (Ctrl+S, Alt+F4, Ctrl+Shift+P, etc.)

Structured UI Access

Tool	Description
`list_ui_elements`	UI Automation tree — buttons, fields, menus with exact coordinates

File Search

Tool	Description
`search_in_files`	Search text or regex in files within allowed directories (like VS Code Find in Files)
`find_files`	Find files and directories by name pattern (glob), like "Go to File"

Agent Guide

Tool	Description
`get_usage_guide`	Compact workflow guide for LLM agents — precision clicking, coordinate metadata, quality tips

Session Logging

Tool	Description
`enable_logging`	Start recording tool calls to JSONL + screenshots (opt-in)
`disable_logging`	Stop recording, get session stats

Installation

Download the binary from Releases.
Add it to your MCP client config. Example below is for Claude Desktop — for other clients, refer to their documentation.

Claude Desktop: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe"
    }
  }
}

Restart Claude Desktop.
In chat, try: "Take a screenshot" — the agent will return an image of your desktop.

With Logging and File Search (optional)

{
  "mcpServers": {
    "test-utils": {
      "command": "D:\\path\\to\\mcp-test-utils.exe",
      "env": {
        "MCP_LOG_DIR": "D:\\path\\to\\logs",
        "MCP_LOG_MAX_MB": "500",
        "MCP_LOG_RETAIN_DAYS": "30",
        "MCP_SEARCH_DIRS": "D:\\Projects\\app1;D:\\Projects\\app2"
      }
    }
  }
}

Quality Presets

Screenshots support configurable quality to balance detail and token cost:

Preset	Scale	Format	Use Case
`full`	100%	JPEG q90	Maximum detail
`standard`	50%	JPEG q70	Balanced (default)
`compact`	50%	PNG	When PNG is needed
`minimal`	25%	Grayscale	Lowest token cost
`custom`	10–100%	JPEG / PNG / Grayscale	Full control

Environment Variables

Variable	Description	Default
`MCP_LOG_DIR`	Path for log sessions. Without it, logging tools are hidden	—
`MCP_LOG_MAX_MB`	Session size limit (warning on exceed)	`500`
`MCP_LOG_RETAIN_DAYS`	Auto-delete sessions older than N days. `0` to disable	`30`
`MCP_SEARCH_DIRS`	Allowed directories for `search_in_files` (`;` on Windows, `:` on macOS/Linux). Without it, the tool is hidden	—

How It Works

MCP Test Utils is a JSON-RPC 2.0 server communicating over stdin/stdout. Any MCP-compatible client launches the binary, sends tool calls, and receives structured responses (text, base64 images). Tested with Claude Desktop.

The server uses native Windows APIs directly — Win32 GDI for screenshots, SendInput for mouse and keyboard, UI Automation COM API for element inspection, WinRT Windows.Media.Ocr for text recognition. File search uses the ripgrep engine (grep-regex, grep-searcher, ignore) — cross-platform, no external dependencies. No PowerShell, no external tools, no network access.

Use Cases

Automated QA — agent navigates the app, clicks through flows, takes screenshots at each step, writes a test report
Desktop automation — fill forms, copy data between windows, run workflows
Accessibility audit — scan UI Automation tree for missing labels or roles
Visual regression — screenshot comparison across releases
Data extraction — OCR text from applications that don't expose APIs
Code search — find patterns across multiple projects without leaving the agent session

Security

Responds only to requests from the MCP client
Opens no network ports
Writes nothing to disk (except opt-in logging)
Sends no data externally
Screenshots capture the entire screen — make sure no sensitive information is visible
File search is sandboxed — only directories in MCP_SEARCH_DIRS are accessible

Support us

Free and unrestricted. If you find it useful — jeenyjai.github.io

License

🚀 Created with Claude

Serveurs connexes

Alpha Vantage MCP Server

sponsor

Access financial market data: realtime & historical stock, ETF, options, forex, crypto, commodities, fundamentals, technical indicators, & more

Contract Inspector

Retrieve on-chain information for EVM contracts locally using an Ethereum RPC node and Etherscan API.

MCPOmni Connect

A universal command-line interface (CLI) gateway to the MCP ecosystem, integrating multiple MCP servers, AI models, and transport protocols.

Eterna MCP

Managed MCP server for Bybit perpetual futures trading. Isolated sub-accounts, built-in risk management, 12 trading tools.

Unreal-Blender MCP

A unified server to control Blender and Unreal Engine via AI agents.

ADB MCP Server

Interact with Android devices using the Android Debug Bridge (ADB).

Image Tools MCP

Retrieve image dimensions and compress images from URLs or local files using Tinify and Figma APIs.

MCP Server Starter

A TypeScript starter project for building Model Context Protocol (MCP) servers with Bun.

Remote MCP Server (Authless)

An example of a remote MCP server deployable on Cloudflare Workers, featuring customizable tools and no authentication.

OpenOcean Finance

An MCP server for executing token swaps across multiple decentralized exchanges using OpenOcean's aggregation API

Model Context Protocol servers

A collection of reference implementations for the Model Context Protocol (MCP), showcasing various MCP servers implemented with TypeScript and Python SDKs.

MCP Test Utils

MCP Test Utils

100% AI Code · Human Reviewed

Why

Demo

MCP Test Utils vs Anthropic Computer Use

Platforms

Tools (19)

Vision

Window Management

Input

Structured UI Access

File Search

Agent Guide

Session Logging

Installation

With Logging and File Search (optional)

Quality Presets

Environment Variables

How It Works

Use Cases

Security

Support us

License

Serveurs connexes

Alpha Vantage MCP Server

Contract Inspector

MCPOmni Connect

Eterna MCP

Unreal-Blender MCP

ADB MCP Server

Image Tools MCP

MCP Server Starter

Remote MCP Server (Authless)

OpenOcean Finance

Model Context Protocol servers

NotebookLM Web Importer