Installation

macOS Installation Guide

Prerequisites

Install uv (Python package manager):
```
brew install uv
```

Install the Vast.ai MCP Server

Clone the repository:

git clone https://github.com/your-repo/vastai-mcp.git
cd vastai-mcp

Install dependencies using uv:
```
uv sync
```

Install the server as a tool:

uv tool install -e .   # Install from current directory

Configuration

Get your Vast.ai API key:
- Log in to console.vast.ai
- Go to Account > API Keys
- Create or copy your API key

Configure MCP client (for Claude Desktop or other MCP clients):

Update your MCP configuration file (~/.cursor/mcp.json for Cursor):

{
  "mcpServers": {
     "vast-ai": {
       "command": "uv",
       "args": [
           "run",
           "vast-mcp-server"
       ],
       "env": {
           "VAST_API_KEY": "your_vast_api_key_here",
           "SSH_KEY_FILE": "~/.ssh/id_rsa",
           "SSH_KEY_PUBLIC_FILE": "~/.ssh/id_rsa.pub"
       }
     }
  }
}

Verify Installation

Test the server directly:
```
uv tool run vast-mcp-server --help
```

SSH Key Setup (Recommended)

For full functionality, ensure you have SSH keys set up:

Generate SSH key pair (if you don't have one):

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Verify key files exist:
```
ls -la ~/.ssh/id_rsa*
```
You should see both id_rsa (private) and id_rsa.pub (public) files.

Troubleshooting

Permission denied: Make sure your SSH key has correct permissions:
```
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
```
API key issues: Verify your API key is correct and has proper permissions on Vast.ai
Network issues: Ensure you can reach console.vast.ai from your network

Vast.ai MCP Server Usage Guide

This document describes how to use the Vast.ai MCP (Model Context Protocol) server to interact with the Vast.ai GPU cloud platform.

Available Tools

This server provides 23 tools for managing Vast.ai GPU instances:

1. show_user_info()

Show current user information and account balance.

Returns:

Username, email, account balance, user ID
SSH key information (if available)
Total spent amount

2. show_instances(owner: str = "me")

Show user's instances (running, stopped, etc.)

Parameters:

owner (optional): Owner of instances to show (default: "me")

Returns:

List of all instances with their details:
- Instance ID and status
- Label and machine ID
- GPU type and specifications
- Hourly cost
- Docker image
- Public IP address (if available)
- Creation date

3. search_offers(query: str = "", limit: int = 20, order: str = "score-")

Search for available GPU offers/machines to rent.

Parameters:

query (optional): Search query in key=value format (e.g., "gpu_name=RTX_4090 num_gpus=2")
limit (optional): Maximum number of results to return (default: 20)
order (optional): Sort order, append '-' for descending (default: "score-")

Returns:

List of available offers with:
- Offer ID
- GPU specifications (name, count)
- CPU and RAM details
- Storage space
- Hourly cost
- Location and reliability score
- CUDA version
- Internet speeds

Example queries:

"gpu_name=RTX_4090" - Search for RTX 4090 GPUs
"num_gpus=2 cpu_ram>=32" - Search for dual GPU setups with 32GB+ RAM

4. create_instance(offer_id: int, image: str, disk: float = 10.0, ssh: bool = False, jupyter: bool = False, direct: bool = False, env: str = "", label: str = "", bid_price: float = None)

Create a new instance from an offer.

Parameters:

offer_id: ID of the offer to rent (from search_offers)
image: Docker image to run (e.g., "pytorch/pytorch:latest")
disk (optional): Disk size in GB (default: 10.0)
ssh (optional): Enable SSH access (default: False)
jupyter (optional): Enable Jupyter notebook (default: False)
direct (optional): Use direct connections (default: False)
env (optional): Environment variables as dict (default: None)
label (optional): Label for the instance
bid_price (optional): Bid price for interruptible instances

Returns:

Success message with instance ID or error details

Example:

create_instance(
    offer_id=12345,
    image="pytorch/pytorch:latest",
    disk=40.0,
    ssh=True,
    direct=True,
    env={"JUPYTER_ENABLE_LAB": "yes"},
    label="My PyTorch Training"
)

5. destroy_instance(instance_id: int)

Destroy an instance, completely removing it from the system. Don't need to stop it first.

Parameters:

instance_id: ID of the instance to destroy

Returns:

Success/failure message

6. start_instance(instance_id: int)

Start a stopped instance.

Parameters:

instance_id: ID of the instance to start

Returns:

Success/failure message

7. stop_instance(instance_id: int)

Stop a running instance (without destroying it).

Parameters:

instance_id: ID of the instance to stop

Returns:

Success/failure message

8. search_volumes(query: str = "", limit: int = 20)

Search for available storage volume offers.

Parameters:

query (optional): Search query in key=value format
limit (optional): Maximum number of results to return (default: 20)

Returns:

List of available volume offers with:
- Volume offer ID
- Storage capacity
- Cost per GB per month
- Location and reliability
- Disk bandwidth
- Internet speeds

9. label_instance(instance_id: int, label: str)

Set a label on an instance for easier identification.

Parameters:

instance_id: ID of the instance to label
label: Label text to set

Returns:

Success/failure message

10. launch_instance_workflow(gpu_name: str, num_gpus: int, image: str, region: str = "", disk: float = 16.0, ssh: bool = True, jupyter: bool = False, direct: bool = True, label: str = "")

Launch the top instance from search offers based on given parameters (streamlined alternative to create_instance).

Parameters:

gpu_name: Name of GPU model (e.g., "RTX_4090")
num_gpus: Number of GPUs required
image: Docker image to run
region (optional): Geographical region preference
disk (optional): Disk size in GB (default: 16.0)
ssh (optional): Enable SSH access (default: True)
jupyter (optional): Enable Jupyter notebook (default: False)
direct (optional): Use direct connections (default: True)
label (optional): Label for the instance

Returns:

Success message with instance details or error

Example:

launch_instance_workflow(
    gpu_name="RTX_4090",
    num_gpus=2,
    image="pytorch/pytorch:latest",
    region="North_America",
    disk=40.0,
    ssh=True,
    direct=True,
    label="My Training Job"
)

11. prepay_instance(instance_id: int, amount: float)

Deposit credits into a reserved instance for discounted rates.

Parameters:

instance_id: ID of the instance to prepay
amount: Amount of credits to deposit

Returns:

Details about discount rate and coverage period

12. reboot_instance(instance_id: int)

Reboot an instance (stop/start) without losing GPU priority.

Parameters:

instance_id: ID of the instance to reboot

Returns:

Success/failure message

13. recycle_instance(instance_id: int)

Recycle an instance (destroy/create from newly pulled image) without losing GPU priority.

Parameters:

instance_id: ID of the instance to recycle

Returns:

Success/failure message

14. show_instance(instance_id: int)

Show detailed information about a specific instance.

Parameters:

instance_id: ID of the instance to show

Returns:

Detailed instance information including:
- Status and specifications
- Connection details (IP, SSH, Jupyter)
- Cost and runtime information
- Configuration details

15. logs(instance_id: int, tail: str = "1000", filter_text: str = "", daemon_logs: bool = False)

Get logs for an instance.

Parameters:

instance_id: ID of the instance to get logs for
tail (optional): Number of lines from end of logs (default: "1000")
filter_text (optional): Grep filter for log entries
daemon_logs (optional): Get daemon system logs instead of container logs

Returns:

Instance logs text or status message

16. attach_ssh(instance_id: int)

Attach an SSH key to an instance for secure access.

Parameters:

instance_id: ID of the instance to attach SSH key to

Returns:

Success/failure message

Examples:

# Attach SSH key from configured public key file
attach_ssh(12345)

Notes:

Uses the SSH public key file configured in SSH_KEY_PUBLIC_FILE environment variable
Only public SSH keys are accepted (not private keys)
SSH key must start with 'ssh-' prefix (e.g., ssh-rsa, ssh-ed25519)
After attaching, you can SSH to the instance using the corresponding private key

17. search_templates()

Search for available templates on Vast.ai.

Parameters:

None

Returns:

List of available templates with:
- Template ID and name
- Docker image
- Description (if available)
- Environment variables
- Run type configuration
- SSH and Jupyter settings

Example:

# Get all available templates
search_templates()

Notes:

Templates are pre-configured environments that simplify instance creation
Templates may include specific Docker images, environment setups, and startup scripts

18. execute_command(instance_id: int, command: str)

Execute a (constrained) remote command only available on stopped instances. Use ssh to run commands on running instances.

Parameters:

instance_id: ID of the instance to execute command on
command: Command to execute (limited to ls, rm, du)

Returns:

Command output or status message

Available commands:

ls: List directory contents
rm: Remove files or directories
du: Summarize device usage for a set of files

Examples:

# List directory contents
execute_command(12345, "ls -l -o -r")

# Remove files
execute_command(12345, "rm -r home/delete_this.txt")

# Check disk usage
execute_command(12345, "du -d2 -h")

Notes:

Only works on stopped instances
For running instances, use ssh_execute_command instead
Limited to specific safe commands for security

19. ssh_execute_command(remote_host: str, remote_user: str, remote_port: int, command: str)

Execute a command on a remote host via SSH.

Parameters:

remote_host: The hostname or IP address of the remote server
remote_user: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user')
remote_port: The SSH port number (typically 22 or custom port like 34608)
command: The command to execute on the remote host

Returns:

Command output with exit status, stdout, and stderr

Example:

# Execute a command on a running instance
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root", 
    remote_port=26378,
    command="nvidia-smi"
)

Notes:

Works with any SSH-accessible server, not just Vast.ai instances
Uses the SSH private key file configured in SSH_KEY_FILE environment variable
Automatically handles different SSH key types (RSA, Ed25519, ECDSA, DSS)
Returns detailed output including exit status and both stdout/stderr

20. ssh_execute_background_command(remote_host: str, remote_user: str, remote_port: int, command: str, task_name: str = None)

Execute a long-running command in the background on a remote host via SSH using nohup.

Parameters:

remote_host: The hostname or IP address of the remote server
remote_user: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user')
remote_port: The SSH port number (typically 22 or custom port like 34608)
command: The command to execute in the background
task_name (optional): Optional name for the task (for easier identification)

Returns:

Task information including task ID, process ID, and log file path

Example:

# Start a long-running training job
ssh_execute_background_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="python train.py --epochs 100",
    task_name="training_job"
)

Notes:

Returns task_id and process_id for monitoring
Creates log files on the remote server for output capture
Use ssh_check_background_task to monitor progress
Use ssh_kill_background_task to stop if needed

21. ssh_check_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int, tail_lines: int = 50)

Check the status of a background SSH task and get its output.

Parameters:

remote_host: The hostname or IP address of the remote server
remote_user: The username to connect as
remote_port: The SSH port number
task_id: The task ID returned by ssh_execute_background_command
process_id: The process ID returned by ssh_execute_background_command
tail_lines (optional): Number of recent log lines to show (default: 50)

Returns:

Status report with process status, log output, and progress information

Example:

# Check on a background task
ssh_check_background_task(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    task_id="training_job_a1b2c3d4",
    process_id=12345,
    tail_lines=100
)

Notes:

Shows whether the task is still running or completed
Displays recent log output from the task
Provides total log line count for progress indication

22. ssh_kill_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int)

Kill a running background SSH task.

Parameters:

remote_host: The hostname or IP address of the remote server
remote_user: The username to connect as
remote_port: The SSH port number
task_id: The task ID returned by ssh_execute_background_command
process_id: The process ID returned by ssh_execute_background_command

Returns:

Status of the kill operation and cleanup results

Example:

# Stop a background task
ssh_kill_background_task(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    task_id="training_job_a1b2c3d4",
    process_id=12345
)

Notes:

Attempts graceful termination first, then force kill if necessary
Automatically cleans up temporary log and PID files
Safe to call even if the process has already completed

23. disable_sudo_password(remote_host: str, remote_user: str, remote_port: int)

Disable sudo password requirement for the sudo group on a remote host via SSH.

This function safely modifies the sudoers file to allow passwordless sudo access for users in the sudo group.

Parameters:

remote_host: The hostname or IP address of the remote server
remote_user: The username to connect as (e.g., 'root', 'ubuntu', 'ec2-user')
remote_port: The SSH port number (typically 22 or custom port like 34608)

Returns:

Detailed status of the sudoers modification including:
- Previous and new sudo configuration
- Validation results
- Backup file location

Example:

# Disable sudo password on a running instance
disable_sudo_password(
    remote_host="ssh1.vast.ai",
    remote_user="root", 
    remote_port=26378
)

Safety Features:

Creates automatic backup of sudoers file before modification
Validates sudoers syntax with visudo -c after changes
Automatically restores backup if validation fails
Shows before/after configuration for verification

Notes:

Modifies sudoers to: %sudo ALL=(ALL) NOPASSWD: ALL
Requires the connecting user to have sudo privileges
Backup files are timestamped for safety
Works with any SSH-accessible Linux system
Test the change with sudo -l after execution

24. configure_mcp_rules(auto_attach_ssh: bool = None, auto_label: bool = None, wait_for_ready: bool = None, label_prefix: str = None)

Configure MCP automation rules that control automatic behaviors during instance creation.

Parameters:

auto_attach_ssh (optional): Enable/disable automatic SSH key attachment for SSH/Jupyter instances
auto_label (optional): Enable/disable automatic instance labeling
wait_for_ready (optional): Enable/disable waiting for instance readiness after creation
label_prefix (optional): Set the prefix for automatic instance labels

Returns:

Current configuration status and any changes made

Example:

# Configure MCP rules
configure_mcp_rules(
    auto_attach_ssh=True,
    auto_label=True,
    label_prefix="my-project",
    wait_for_ready=True
)

# View current configuration
configure_mcp_rules()

Notes:

These rules affect the behavior of create_instance and launch_instance_workflow
Auto-attach SSH applies only when SSH or Jupyter is enabled
Auto-labeling creates timestamps labels when no label is provided
Wait for ready monitors instance status until it becomes "running"

Configuration

API Key Setup

The server requires a Vast.ai API key. You can configure it in several ways:

Environment Variable:

export VAST_API_KEY="your_api_key_here"

API Key File: Create ~/.vastai_api_key with your API key
Hardcoded (for development): The current server has a hardcoded API key for testing purposes

Running the Server

# Run with default settings (localhost:8000)
python vast_mcp_server.py

# Run with custom host and port
python vast_mcp_server.py --host 0.0.0.0 --port 9000

Common Workflows

1. Basic Instance Creation Workflow

# 1. Check your account
show_user_info()

# 2. Search for available offers
search_offers("gpu_name=RTX_4090", limit=10)

# 3. Create instance from an offer
create_instance(
    offer_id=12345, 
    image="pytorch/pytorch:latest",
    disk=20.0,
    ssh=True,
    direct=True
)

# 4. Check instance status
  show_instances()

2. Instance Management

# View all instances
show_instances()

# Stop an instance
stop_instance(instance_id=67890)

# Start it again later
start_instance(instance_id=67890)

# Permanently destroy when done
destroy_instance(instance_id=67890)

3. Finding Storage

# Search for storage volumes
search_volumes("disk_space>=100", limit=5)

4. Advanced Instance Management

# Launch instance with specific GPU requirements (streamlined approach)
launch_instance_workflow(
    gpu_name="RTX_4090",
    num_gpus=2,
    image="pytorch/pytorch:latest",
    region="North_America",
    disk=40.0,
    ssh=True,
    direct=True,
    label="Training Job"
)

# Get detailed information about an instance
show_instance(instance_id=12345)

# Set a label for easier identification
label_instance(instance_id=12345, label="Production Model")

# Get instance logs
logs(instance_id=12345, tail="500", filter_text="error")

# Reboot instance without losing GPU priority
reboot_instance(instance_id=12345)

5. Instance Monitoring and Maintenance

# Monitor instance logs with filtering
logs(instance_id=12345, filter_text="WARNING|ERROR", tail="100")

# Check instance details
show_instance(instance_id=12345)

# Recycle instance to update to latest image
recycle_instance(instance_id=12345)

# Prepay for discounted rates
prepay_instance(instance_id=12345, amount=50.0)

6. SSH Access Management

# Create instance with SSH enabled
create_instance(
    offer_id=12345,
    image="ubuntu:22.04",
    ssh=True,
    direct=True,
    label="SSH Server"
)

# Attach your SSH key for access
attach_ssh(instance_id=67890)

# Get instance details including SSH connection info
show_instance(instance_id=67890)

# Monitor instance through logs
logs(instance_id=67890, tail="50")

7. Template Browsing

# Browse available templates
search_templates()

8. Instance Command Execution

# For stopped instances, use constrained execute_command
stop_instance(instance_id=12345)

# Execute safe commands on stopped instance
execute_command(instance_id=12345, command="ls -la /workspace")
execute_command(instance_id=12345, command="du -sh /workspace")
execute_command(instance_id=12345, command="rm -rf /tmp/old_files")

# For running instances, use SSH commands
start_instance(instance_id=12345)

# Get instance connection details
instance_details = show_instance(instance_id=12345)
# Extract SSH host, port from the output

# Execute commands via SSH on running instance
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="nvidia-smi"
)

# Check system resources
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root", 
    remote_port=26378,
    command="df -h && free -h && ps aux"
)

9. Background Task Management

# Start a long-running training job in background
task_info = ssh_execute_background_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="python train.py --epochs 100 --batch-size 32",
    task_name="pytorch_training"
)

# Extract task_id and process_id from task_info output
# Format: "Task ID: pytorch_training_a1b2c3d4" and "Process ID: 12345"

# Monitor progress periodically
ssh_check_background_task(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    task_id="pytorch_training_a1b2c3d4",
    process_id=12345,
    tail_lines=100
)

# Stop the task if needed
ssh_kill_background_task(
    remote_host="116.43.148.85", 
    remote_user="root",
    remote_port=26378,
    task_id="pytorch_training_a1b2c3d4",
    process_id=12345
)

10. Complete ML Training Workflow

# 1. Find and create a GPU instance
search_offers("gpu_name=RTX_4090", limit=5)

create_instance(
    offer_id=12345,
    image="pytorch/pytorch:latest",
    disk=50.0,
    ssh=True,
    direct=True,
    env={},
    label="ML Training"
)

# 2. Get connection details
instance_details = show_instance(instance_id=67890)

# 3. Set up the environment
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="pip install wandb tensorboard"
)

# 4. Upload your training code (assume already done)
# 5. Start training in background
training_task = ssh_execute_background_command(
    remote_host="116.43.148.85",
    remote_user="root", 
    remote_port=26378,
    command="cd /workspace && python train.py --config config.yaml",
    task_name="main_training"
)

# 6. Monitor training progress
ssh_check_background_task(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    task_id="main_training_a1b2c3d4",
    process_id=12345,
    tail_lines=50
)

# 7. Check GPU utilization
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="nvidia-smi"
)

# 8. When training is complete, save results
ssh_execute_command(
    remote_host="116.43.148.85",
    remote_user="root",
    remote_port=26378,
    command="tar -czf model_results.tar.gz /workspace/outputs"
)

# 9. Clean up
destroy_instance(instance_id=67890)

Query Syntax

When searching for offers or volumes, you can use these operators:

= or == - Equal to
!= - Not equal to
> - Greater than
>= - Greater than or equal to
< - Less than
<= - Less than or equal to

Example queries:

"gpu_name=RTX_4090 num_gpus>=2" - RTX 4090 with 2 or more GPUs
"cpu_ram>64 reliability2>=99" - High RAM and reliability
"dph_total<=1.0" - Cost under $1/hour

Error Handling

All methods include error handling and will return descriptive error messages if:

API key is missing or invalid
Network connectivity issues occur
Invalid parameters are provided
Vast.ai API returns errors

Check the server logs for detailed error information during development.

Vast.ai

Installation

macOS Installation Guide

Prerequisites

Install the Vast.ai MCP Server

Configuration

Verify Installation

SSH Key Setup (Recommended)

Troubleshooting

Vast.ai MCP Server Usage Guide

Available Tools

1. show_user_info()

2. show_instances(owner: str = "me")

3. search_offers(query: str = "", limit: int = 20, order: str = "score-")

4. create_instance(offer_id: int, image: str, disk: float = 10.0, ssh: bool = False, jupyter: bool = False, direct: bool = False, env: str = "", label: str = "", bid_price: float = None)

5. destroy_instance(instance_id: int)

6. start_instance(instance_id: int)

7. stop_instance(instance_id: int)

8. search_volumes(query: str = "", limit: int = 20)

9. label_instance(instance_id: int, label: str)

10. launch_instance_workflow(gpu_name: str, num_gpus: int, image: str, region: str = "", disk: float = 16.0, ssh: bool = True, jupyter: bool = False, direct: bool = True, label: str = "")

11. prepay_instance(instance_id: int, amount: float)

12. reboot_instance(instance_id: int)

13. recycle_instance(instance_id: int)

14. show_instance(instance_id: int)

15. logs(instance_id: int, tail: str = "1000", filter_text: str = "", daemon_logs: bool = False)

16. attach_ssh(instance_id: int)

17. search_templates()

18. execute_command(instance_id: int, command: str)

19. ssh_execute_command(remote_host: str, remote_user: str, remote_port: int, command: str)

20. ssh_execute_background_command(remote_host: str, remote_user: str, remote_port: int, command: str, task_name: str = None)

21. ssh_check_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int, tail_lines: int = 50)

22. ssh_kill_background_task(remote_host: str, remote_user: str, remote_port: int, task_id: str, process_id: int)

23. disable_sudo_password(remote_host: str, remote_user: str, remote_port: int)

24. configure_mcp_rules(auto_attach_ssh: bool = None, auto_label: bool = None, wait_for_ready: bool = None, label_prefix: str = None)

Configuration

API Key Setup

Running the Server

Common Workflows

1. Basic Instance Creation Workflow

2. Instance Management

3. Finding Storage

4. Advanced Instance Management

5. Instance Monitoring and Maintenance

6. SSH Access Management

7. Template Browsing

8. Instance Command Execution

9. Background Task Management

10. Complete ML Training Workflow

Query Syntax

Error Handling

Related Servers

mcp-k8s-go

Microsoft Entra ID MCP Server

Remote MCP Server (Authless)

Dokploy

Vidu MCP

Hugging Face

Vitally

Remote MCP Server (Authless)

OSDU MCP Server

AWS Documentation MCP Server