omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Serveurs connexes
Kone.vc
sponsorMonetize your AI agent with contextual product recommendations
OnlyBots.Exchange
AI agent skill marketplace — 38 pre-built skills across 13 categories. Discovery API, MCP server, and npm SDK for programmatic integration.
MCP MD2PDF Server
Convert Markdown documents to PDF with support for Mermaid diagrams.
ClearPolicy
ClearPolicy is a document signing and compliance tracking tool for organizations. Once connected, your AI assistant can import documents, send signature requests, track who has and hasn't signed, and manage your contacts — all by prompt.
Routine
MCP server to interact with Routine: calendars, tasks, notes, etc.
Netdata-Claude
Connect to Netdata MCP using Claude desktop
VAP media MCP
: MCP server for AI media generation (imagesflux, videosveo3.1, music suno v5, with deterministic cost control using reserve-burn-refund billing
Shortcut
Access and search tickets on Shortcut.com.
vidmagik-mcp
An un-official moviepy mcp server giving your Agents the abillity to edit,master, & re-master Video, Slideshows, and Gif's
Productboard MCP Server
Integrates with the Productboard API, offering 49 specialized tools to manage all major Productboard functionalities.
Rework
Integrate AI applications with the Rework platform to manage projects, tasks, workflows, and jobs.