Desktop Automation
Automate desktop actions and interact with your local environment using LLM applications.
Desktop Automation MCP Server
A Model Context Protocol (MCP) server that exposes desktop automation capabilities, allowing LLM applications to interact with your desktop environment through standardized tools.
Overview
This MCP server provides a bridge between LLM applications and desktop automation functionality. It exposes mouse, keyboard, and screen automation capabilities through the MCP protocol, enabling AI assistants to:
- Click, right-click, and double-click at specific coordinates
- Move the mouse cursor with smooth or instant movement
- Type text with optional character delays
- Press individual keys or key combinations
- Get current mouse cursor position
- Capture screenshots of the desktop
- Get screen dimensions
Features
Mouse Automation
- Click: Left click at specified coordinates
- Right Click: Right click at specified coordinates
- Double Click: Double click at specified coordinates
- Move Mouse: Move cursor with optional smooth animation
- Get Position: Retrieve current mouse coordinates
Keyboard Automation
- Type Text: Type text at current cursor position
- Press Key: Press individual keys or key combinations
- Delayed Typing: Type with configurable delays between characters
Screen Automation
- Take Screenshot: Capture the full screen and save to file
- Get Screen Size: Retrieve screen dimensions
Installation
Prerequisites
- Go 1.23.0 or later
- Task runner (optional, for using Taskfile commands)
# Install Task (optional)
go install github.com/go-task/task/v3/cmd/task@latest
Build from Source
# Clone the repository (if not already available)
git clone <repository-url>
cd desktop-automation-mcp
# Download dependencies
go mod download
go mod tidy
# Build the server
go build -o mcp-server ./cmd/mcp-server
# Or using Task
task build
Usage
Running the Server
The server uses stdio transport for communication:
# Run directly
./mcp-server
# Or using Task
task run
Integration with LLM Applications
Configure your LLM application (Claude Desktop, etc.) to connect to this MCP server:
{
"mcpServers": {
"desktop-automation": {
"command": "/path/to/mcp-server"
}
}
}
Available Tools
click
Click at specified screen coordinates.
Parameters:
x(number, required): X coordinatey(number, required): Y coordinate
right_click
Right click at specified screen coordinates.
Parameters:
x(number, required): X coordinatey(number, required): Y coordinate
double_click
Double click at specified screen coordinates.
Parameters:
x(number, required): X coordinatey(number, required): Y coordinate
move_mouse
Move mouse cursor to specified coordinates.
Parameters:
x(number, required): X coordinatey(number, required): Y coordinatesmooth(boolean, optional): Use smooth movement animationduration(number, optional): Duration for smooth movement in seconds (default: 1.0)
get_mouse_position
Get current mouse cursor position.
Parameters: None
type_text
Type text at current cursor position.
Parameters:
text(string, required): Text to typedelay(number, optional): Delay between characters in milliseconds
press_key
Press a key or key combination.
Parameters:
key(string, required): Key to press (e.g., 'enter', 'space', 'ctrl')modifiers(array, optional): Modifier keys (e.g., ['ctrl', 'shift'])
take_screenshot
Capture a screenshot of the screen.
Parameters:
path(string, optional): Path to save the screenshot (if not provided, saves to temp directory)
get_screen_size
Get the screen dimensions.
Parameters: None
Architecture
desktop-automation-mcp/
├── cmd/
│ └── mcp-server/ # MCP server entry point
│ └── main.go
├── internal/
│ └── automation/ # Desktop automation logic (copied from desktop-automation)
│ ├── keyboard.go
│ ├── mouse.go
│ └── screen.go
├── go.mod # Go module definition
├── Taskfile.yml # Task runner configuration
├── .gitignore # Git ignore rules
└── README.md # This file
Development
Task Commands
# Build the server
task build
# Run the server
task run
# Clean build artifacts
task clean
# Download and tidy dependencies
task deps
# Run tests
task test
# Install to GOPATH/bin
task install
Manual Commands
# Build
go build -o mcp-server ./cmd/mcp-server
# Run
./mcp-server
# Test
go test ./...
# Clean
go clean
rm -f mcp-server
Dependencies
- github.com/mark3labs/mcp-go: MCP protocol implementation for Go
- github.com/go-vgo/robotgo: Cross-platform desktop automation library
Safety Considerations
- Screen Bounds: All coordinate inputs are validated against screen dimensions
- Input Validation: Negative coordinates and invalid parameters are rejected
- Error Handling: Comprehensive error handling with descriptive messages
- Recovery: Built-in panic recovery for robust operation
Platform Support
This server supports the same platforms as robotgo:
- Windows
- macOS
- Linux
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project follows the same license as the parent desktop-automation project.
Related Projects
- desktop-automation: The original CLI-based desktop automation tool
- mcp-go: Go implementation of the Model Context Protocol
Verwandte Server
ProductPlan MCP Server
Query ProductPlan roadmaps with AI. Access OKRs, ideas, launches, and timeline data through natural language.
Timergy MCP Server
Create scheduling polls and find the perfect meeting time. No auth required.
Video Editor
Add, analyze, search, and edit videos using the Video Jungle API. Also supports local video search on macOS.
Eventbrite
Manage events, reporting, and analytics on Eventbrite.
Excel MCP Server
Interact with Microsoft Excel to read data, edit cells, execute VBA code, and manage worksheets.
Sheet-Cello
A specialized Google Sheets integration server that allows the LLM to read, write, and manage spreadsheet data in real-time. This server supports cell-level manipulation, bulk range updates, and full worksheet retrieval, enabling the model to perform data analysis, logging, and automated reporting directly within Google Worksheets.If you have functions which take range value then first read the sheet and decide where user is asking to add data and define range by your own.Provides 46 tools for Gsheet
ATLAS: Task Management System
A task management system for LLM agents to manage projects, tasks, and knowledge using a Neo4j database for complex workflow automation.
Twenty CRM
Interact with the Twenty CRM API through chat-based tools.
MCP Server for Bring! Shopping
Interact with the Bring! shopping list API via a local MCP server.
Home Assistant
Interact with Home Assistant to control smart home devices, query states, manage automations, and troubleshoot your smart home setup.