omniparser-autogui-mcp
An MCP server that analyzes the screen with OmniParser to automate GUI operations.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)
- Add this to your
claude_desktop_config.json:
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)
env allows for the following additional configurations:
-
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1. -
TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen. -
OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000.
The server can be started withuv run omniparserserver. -
SSE_HOST,SSE_PORT
If specified, communication will be done via SSE instead of stdio. -
SOM_MODEL_PATH,CAPTION_MODEL_NAME,CAPTION_MODEL_PATH,OMNI_PARSER_DEVICE,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
İlgili Sunucular
Kone.vc
sponsorMonetize your AI agent with contextual product recommendations
Summarization Functions
An MCP server for intelligent text summarization, configurable with various AI providers.
Mila
AI-native office suite MCP server. Create, read, update docs, spreadsheets, and slides with 23 tools.
Brand24
Social listening and brand monitoring
secedgar-mcp-server
SEC EDGAR filings and financials
Sequential Thinking Tools
Guides problem-solving by breaking down complex problems and recommending the best MCP tools for each step.
MeshSeeks
A multi-agent mesh network designed for completing AI tasks in parallel.
Google Calendar
Create and manage Google Calendar events with AI assistants.
MCP Resume Server
Fetches resume data from a GitHub gist to provide professional background context to LLMs.
floor plan generator
BuildFloorPlan is an AI floor plan generator for homeowners, interior designers, builders, and small planning teams who need to move from rough input to a reviewable layout faster. It turns short briefs, sketches, images, and PDFs into clearer floor plan outputs in seconds, supports technical 2D layouts, colored presentation-ready plans, and quick 3D previews, and helps users compare layout directions before renovation, client presentation, or internal review. It is designed for fast first drafts, supports editing and refinement workflows, and does not require CAD experience. You can start free with starter credits, and paid plans add more credits, longer history, and commercial usage options.
Anki MCP Server
Connects to Anki via AnkiConnect to retrieve leech-tagged flashcards for use in Claude Desktop.