Vision MCP Server

MCP server for image analysis - gives LLMs vision via external visual API (Qwen, GPT-4o, etc.)

Documentation

🖼️ Vision MCP

让没有原生识图能力的模型(如 DeepSeek)也能"看图"——通过调用外部视觉 API 获取图片的文字描述。

Provide image recognition for models without native vision (e.g. DeepSeek) — by calling an external visual API.


中文文档

MCP Server(推荐)

工具

工具说明注解
vision_analyze分析图片 — 本地路径或 URL,1-5 张,自定义 prompt只读
vision_get_config查看当前 API 配置(Key 脱敏)只读
vision_update_config更新 API Key / 模型 / 地址

快速开始

# 安装依赖(本地开发)
npm install
npm run build

# 或直接通过 npx 运行(无需安装)
npx -y @guorui99/vision-mcp

配置 MCP(在 Claude Desktop / claude.json 中添加):

{
  "mcpServers": {
    "vision": {
      "command": "npx",
      "args": ["-y", "@guorui99/vision-mcp"]
    }
  }
}

首次配置

启动后,通过 MCP 工具配置 API Key:

vision_update_config api_key="你的API Key"

或用环境变量(不写磁盘):

VISION_API_KEY=你的Key npx -y @guorui99/vision-mcp

支持的环境变量:

变量说明默认值
VISION_API_KEYAPI Key
VISION_BASE_URLAPI 地址https://dashscope.aliyuncs.com/compatible-mode/v1
VISION_MODEL模型名qwen3.5-omni-plus
VISION_LANGUAGE语言zh

支持的视觉服务

服务模型备注
阿里云百炼(推荐)qwen3.5-omni-plus新用户 100 万 token 免费
阿里云百炼qwen-vl-max同上
OpenAIgpt-4o-mini需海外支付
其他任何 OpenAI 兼容格式base_url + model 名

支持格式

jpg, jpeg, png, gif, webp, bmp(单次最多 5 张,单文件最大 20MB)

工作原理

  1. 读取图片 → base64 编码
  2. 调用 OpenAI 兼容视觉 API
  3. 返回文字描述

CLI 模式(legacy)

# 单张图片
node scripts/vision.cjs photo.jpg "描述这张图片"

# 网络图片
node scripts/vision.cjs --url https://example.com/img.png "这是什么?"

# 多张图片
node scripts/vision.cjs img1.jpg img2.jpg "比较这两张图"

# 配置
node scripts/vision.cjs --setup
node scripts/vision.cjs --config

项目结构

vision-mcp/
├── src/                 # MCP Server 源码 (TypeScript)
├── dist/                # 编译产物
├── scripts/vision.cjs   # CLI 工具 (legacy)
├── config.json          # 共享配置文件
├── package.json
└── tsconfig.json

环境要求

  • Node.js >= 18
  • 视觉 API 的 Key(阿里云百炼 / OpenAI 等)

English Documentation

MCP Server (Recommended)

Tools

ToolDescriptionAnnotation
vision_analyzeAnalyze image(s) — local path or URL, 1-5 images, custom promptRead-only
vision_get_configView current API configuration (key masked)Read-only
vision_update_configUpdate API key / model / base URL

Quick Start

# Local development
npm install
npm run build

# Or run via npx directly (no install needed)
npx -y @guorui99/vision-mcp

Add to Claude Desktop / claude.json:

{
  "mcpServers": {
    "vision": {
      "command": "npx",
      "args": ["-y", "@guorui99/vision-mcp"]
    }
  }
}

First Time Setup

Set your API key via the MCP tool:

vision_update_config api_key="your-api-key"

Or use environment variables (no disk writes):

VISION_API_KEY=your-key npx -y @guorui99/vision-mcp

Environment variables:

VariableDescriptionDefault
VISION_API_KEYAPI key for vision service
VISION_BASE_URLAPI base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
VISION_MODELModel nameqwen3.5-omni-plus
VISION_LANGUAGEResponse language (zh or en)zh

Supported Vision Services

ServiceModelNotes
Alibaba Cloud Bailian (recommended)qwen3.5-omni-plus1M free tokens for new users
Alibaba Cloud Bailianqwen-vl-maxSame as above
OpenAIgpt-4o-miniRequires overseas payment
OthersAny OpenAI-compatibleChange base_url + model name

Supported Formats

jpg, jpeg, png, gif, webp, bmp (max 5 images per request, max 20MB per file)

How It Works

  1. Read image → base64 encode
  2. Call OpenAI-compatible vision API
  3. Return text description

CLI Mode (legacy)

# Single image
node scripts/vision.cjs photo.jpg "Describe this image"

# Remote image
node scripts/vision.cjs --url https://example.com/img.png "What is this?"

# Multiple images
node scripts/vision.cjs img1.jpg img2.jpg "Compare these two"

# Setup / Config
node scripts/vision.cjs --setup
node scripts/vision.cjs --config

Project Structure

vision-mcp/
├── src/                 # MCP Server source (TypeScript)
├── dist/                # Compiled output
├── scripts/vision.cjs   # CLI tool (legacy)
├── config.json          # Shared config file
├── package.json
└── tsconfig.json

Requirements

  • Node.js >= 18
  • API key for a vision service (Alibaba Cloud Bailian / OpenAI / etc.)