omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
Servidores relacionados
Project Handoffs
Manages AI session handoffs and tracks next steps for projects.
Google Calendar
Integrates with the Google Calendar API to read, create, update, and delete calendar events.
Vynn
Self-improving AI workflows with natural language backtesting. 21 MCP tools for creating workflows, backtesting trading strategies, parameter sweeps, portfolio optimization, prompt optimization, cron scheduling, and webhook triggers. Install: pip install vynn-mcp
Taiwan Holiday
Provides Taiwan national holidays and compensatory workday information. Data is fetched and cached automatically.
Atlassian-mcp-server
MCP server for Atlassian Cloud (Confluence & Jira) with seamless OAuth 2.0 authentication.
Problem Solving MCP Server
An intelligent problem-solving server that automatically forms multi-role teams and uses the Eisenhower matrix for efficient task management and collaboration.
Lenny's Podcast Transcripts
Search 286 episodes of product management wisdom from Lenny Rachitsky. Semantic search across 300+ hours of transcripts.
TaskWarrior MCP Server
An MCP server for managing tasks with the command-line tool TaskWarrior.
MockFlow IdeaBoard MCP
Turn AI conversations into professional visualizations - flowcharts, mindmaps, architecture diagrams, charts, Kanban boards - with MockFlow IdeaBoard MCP Server.
Google Tag Manager
Manage Google Tag Manager accounts, containers, and tags via its API, with built-in Google OAuth.