omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export
instead of set
.)
(If you want langchain_example.py
to work, uv sync --extra langchain
instead.)
- Add this to your
claude_desktop_config.json
:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp
with the directory you cloned.)
env
allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1
. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000
.
The server can be started withuv run omniparserserver
. -
SSE_HOST
,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH
,CAPTION_MODEL_NAME
,CAPTION_MODEL_PATH
,OMNI_PARSER_DEVICE
,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Related Servers
MCP Server on Raspi
A simple note storage system with a custom note:// URI scheme, allowing users to add and summarize notes.
Work Memory MCP Server
Manages work memories and shares context between AI tools using a local SQLite database.
MCP Character Counter
Analyzes text to provide detailed character counts, including letters, numbers, and symbols.
AI Collaboration MCP Server
An MCP server for AI-to-AI collaboration, enabling autonomous workflows and role-based task management between different AI models.
Browser Use
An AI-driven server for browser automation using natural language commands, implementing the Model Context Protocol (MCP).
ShipBoss
An intelligent shipping assistant for managing shipments, requiring a ShipBoss API token.
TimeMCP
A server for time and timezone conversion tools.
Obsidian Semantic MCP Server
An AI-optimized MCP server for Obsidian that consolidates over 21 tools into 5 intelligent operations with contextual workflow hints.
MCP Hub
A manager server for MCP servers that handles process management and tool routing.
Taskade
Connect to the Taskade platform via MCP. Access tasks, projects, workflows, and AI agents in real-time through a unified workspace and API.