VisionSqueezer

LLM-native image optimization MCP server

VisionSqueezer Logo

VisionSqueezer

Build crates.io npm License

LLM-native image optimization middleware & MCP server. Reduces vision model token consumption by preprocessing images into tile-boundary-aligned, padding-free formats.

Works with any agent or editor that speaks MCP — Claude, GPT, Gemini, Codex, or your own.


Install

Claude Code (one-liner)

claude mcp add vision-squeezer -- npx -y vision-squeezer

Claude Desktop

Add to ~/.config/claude/claude_desktop_config.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Cursor

cursor --add-mcp '{"name":"vision-squeezer","type":"stdio","command":"npx","args":["-y","vision-squeezer"]}'

Or add to .cursor/mcp.json:

{
  "servers": {
    "vision-squeezer": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}
Click here to view installation instructions for 10+ other IDEs and Agents (VS Code, JetBrains, Windsurf, Zed, etc.)

VS Code Copilot

Add to .vscode/mcp.json:

{
  "servers": {
    "vision-squeezer": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

JetBrains (IntelliJ, WebStorm, PyCharm)

Open Tools → GitHub Copilot → Model Context Protocol (MCP) → Configure, then add:

{
  "servers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Gemini CLI

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Codex CLI

codex mcp add vision-squeezer -- npx -y vision-squeezer

Or add to ~/.codex/config.toml:

[mcp_servers.vision-squeezer]
command = "npx"
args = ["-y", "vision-squeezer"]

Qwen Code

qwen mcp add vision-squeezer -- npx -y vision-squeezer

Zed

Add to ~/.config/zed/settings.json:

{
  "context_servers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Kiro

Add to .kiro/settings/mcp.json (workspace) or ~/.kiro/settings/mcp.json (global):

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Antigravity

MCP-only, no hooks needed. Configure via the Antigravity MCP settings:

{
  "mcpServers": {
    "vision-squeezer": {
      "command": "npx",
      "args": ["-y", "vision-squeezer"]
    }
  }
}

Manual install (Rust binary)

# From crates.io
cargo install vision-squeezer

# Or from source
git clone https://github.com/eralpozcan/vision-squeezer && cd vision-squeezer
make install   # builds → ~/.local/bin/

Then use the binary path directly in any config above instead of npx:

{ "command": "vision-squeezer-mcp" }

Tip: Run npx -y vision-squeezer --setup to print ready configs with auto-detected paths.


CLI Usage

vision-squeezer path/to/image.jpg \
  --mode auto|ocr|standard \      # default: auto (detects text/grayscale)
  --format jpeg|webp|avif \       # default: jpeg
  --quality 85 \                  # output quality 1-100 (default: 75)
  --tile-size 256 \               # patch size in px (default: 512)
  --no-crop \                     # disable padding removal
  --smart-crop \                  # edge-energy crop (vs corner-tolerance)
  --auto-quality 0.95 \           # binary-search quality to hit SSIM target
  --bg-tolerance 25 \             # background detection 0-255 (default: 15)
  --model claude|gpt4o|gpt5|gemini \ # model-aware resizing
  --max-tiles 20 \                # hard cap on tile count
  --json \                        # machine-readable JSON output
  --dry-run                       # run pipeline, skip disk write

Batch mode

Pass a directory instead of a file:

vision-squeezer ./screenshots --recursive --output-dir ./optimized
vision-squeezer ./screenshots --recursive --json > report.json

The Math: How Vision Models Bill You in 2026

If you send raw images to an LLM, you are leaking tokens. Modern vision models do not care about your file size (MB/KB); they only care about pixel dimensions, but each provider calculates costs completely differently. vision-squeezer simulates these algorithms to find the mathematical minimum size that drops your token usage without losing visual context.

1. Claude (Area-Based)

As of 2026 (Claude 3.5 / 4.5+), Anthropic uses an area-based formula: Tokens ≈ (Width × Height) / 750. Every single pixel of solid background or padding costs you tokens.

  • The Fix: vision-squeezer aggressively crops padding (removing solid color borders). A 1025×1025 screenshot shrinks just enough to drop from 1,400 tokens to 1,024 tokens (%26 savings).

2. GPT-4.5 / GPT-4o (Tiling & Short-side Scaling)

OpenAI scales your image to fit inside a 2048px box, then rescales it again so the shortest side is exactly 768px. Finally, it chops the image into a grid of 512×512 tiles. Each tile costs 170 tokens.

  • The Fix: If your image's shortest side ends up being 769px, OpenAI will spill over into an entirely new row of 512×512 tiles, doubling your cost. vision-squeezer simulates this exact math and snaps the image down by a few pixels so it fits perfectly into the minimum number of tiles.

3. Gemini 2.0 / 3.0 (Massive Tiles)

Gemini uses a massive 768×768 tile system (if the image is > 384px). Each tile is a flat 258 tokens.

  • The Fix: An 800×600 image will trigger a 2×1 tile grid (1,032 tokens). vision-squeezer snaps it down slightly to fit exactly inside a 768×768 box, dropping the cost to 258 tokens (%75 savings).

Pipeline

Input image
  → crop_padding               remove solid-color borders
  → calculate_optimal_dims     snap to tile boundary (always down)
  → [enforce_max_tiles]        optional tile budget cap
  → resize_exact               Lanczos3
  → [binarize]                 OCR mode only: Otsu threshold
  → JPEG/WebP encode           configurable quality & format

🦀 Why Rust?

VisionSqueezer is a performance-critical middleware. We chose Rust for three uncompromising reasons:

  • Invisible Latency: Image processing should never be the bottleneck. Rust ensures that snapping, cropping, and encoding happen in milliseconds, making the optimization layer truly invisible to the developer's workflow.
  • Minimal Footprint: As an MCP server running in the background of your IDE, VisionSqueezer is designed to be ultra-lightweight, consuming near-zero CPU and RAM when idle.
  • Wasm-Ready: Rust's first-class support for WebAssembly allows us to bring the same high-performance optimization to the browser and the Edge (Cloudflare Workers), enabling client-side squeezing before the image even hits the network.

Case Study 1: Standard Image (istanbul.jpg)

To demonstrate the impact on standard images, here is the run on a 2400×1670 image (4 MP, 0.5 MB) across the three scenarios:

Example 1: Agnostic Optimization (Default)

When no target model is specified, Squeezer reduces the file size and mathematically optimizes boundaries to be generally efficient across all models.

vision-squeezer data/istanbul.jpg
Input:  2400×1670  (0.5 MB)
Output: 2048×1536  (0.3 MB, JPG q75)
File:   28.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     4194     1150 (21.5%)
GPT-4o           1105      765      340 (30.8%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     1548     1548 (50.0%)
────────────────────────────────────────────────────────────
View Advanced Model-Targeted Optimizations (GPT-4o & Claude)

Example 2: Model-Targeted Optimization (GPT-4o)

If you tell Squeezer the target model, it reverses the model's exact internal calculation (e.g. GPT-4.5's 768px short-side scaling algorithm) and mathematically shrinks the image just enough to fit the absolute minimum tile grid.

vision-squeezer data/istanbul.jpg --model gpt4o
Input:  2400×1670  (0.5 MB)
Output: 2399×1200  (0.3 MB, JPG q75)
File:   33.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     3838     1506 (28.2%)
GPT-4o           1105     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     2064     1032 (33.3%)
────────────────────────────────────────────────────────────

Notice how targeting gpt4o perfectly fits the image into a solid 6-tile boundary (2399x1200) mathematically calculated backwards from OpenAI's short-side scaling algorithm. It maximizes resolution exactly up to the point where an extra tile would be billed.

Example 3: Model-Targeted Optimization (Claude)

Since Claude uses an area-based calculation (W × H / 750), Squeezer primarily focuses on aggressively cropping solid-color borders and padding to shrink the pixel area without drastically downscaling the core visual detail.

vision-squeezer data/istanbul.jpg --model claude
Input:  2400×1670  (0.5 MB)
Output: 2304×1536  (0.4 MB, JPG q75)
File:   21.3% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude           5344     4718      626 (11.7%)
GPT-4o           1105     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           3096     1548     1548 (50.0%)
────────────────────────────────────────────────────────────

Claude benefits tremendously from even minor dimension reductions. By snapping the width and height slightly downwards, we immediately shaved off over 600 tokens while preserving the massive 2304×1536 resolution.

Case Study 2: 12-Megapixel High-Res Image (istanbul2.jpg)

To demonstrate the impact on massive images, here is the run on a 4096×3072 image (12 MP, 2.2 MB) across the three scenarios:

Example 1: Agnostic Optimization

vision-squeezer data/istanbul2.jpg
Input:  4096×3072  (2.2 MB)
Output: 3584×2560  (1.3 MB, JPG q75)
File:   39.6% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    12233     4544 (27.1%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     5160     1032 (16.7%)
────────────────────────────────────────────────────────────

(Notice the OpenAI Aspect Ratio Anomaly: Squeezer removed heavy letterboxing (padding) from this image. By removing the padding, the image became "wider". Because OpenAI's API forces the new short side to 768px, the wide aspect ratio pushed the long side into a 3rd tile grid column! This is a fascinating edge case where cropping padding mathematically INCREASES your GPT-4o token cost. If you specifically use --model gpt4o on this image, Squeezer will detect this paradox and use a different grid constraint).

View Advanced Model-Targeted Optimizations (GPT-4o & Claude)

Example 2: Model-Targeted Optimization (GPT-4o)

vision-squeezer data/istanbul2.jpg --model gpt4o
Input:  4096×3072  (2.2 MB)
Output: 4095×2048  (1.2 MB, JPG q75)
File:   43.2% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    11182     5595 (33.3%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     4644     1548 (25.0%)
────────────────────────────────────────────────────────────

(By explicitly targeting gpt4o, Squeezer optimizes the boundaries such that the new aspect ratio is safely contained. While GPT-4o still bills for the 6-tile layout due to the image's inherent width, Squeezer shrinks the file footprint by 43% without sacrificing high-resolution details.)

Example 3: Model-Targeted Optimization (Claude)

vision-squeezer data/istanbul2.jpg --model claude
Input:  4096×3072  (2.2 MB)
Output: 3840×2816  (1.5 MB, JPG q75)
File:   31.5% smaller

── Token Estimates ─────────────────────────────────────────
Model          Before    After      Saved
------------------------------------------
Claude          16777    14417     2360 (14.1%)
GPT-4o            765     1105        0 (0.0%)
GPT-5            1536     1536        0 (0.0%)
Gemini           6192     5160     1032 (16.7%)
────────────────────────────────────────────────────────────

(Claude's area-based formula again allows massive token savings simply by trimming to the 3840×2816 boundary, preventing you from paying for over 2,300 tokens of pure padding while retaining 10+ megapixels of fidelity).

💡 FAQ: Wait, why did targeting gpt4o save 33% of Claude tokens, but targeting claude only saved 14%? Because of the Quality vs. Aggression trade-off. OpenAI enforces a strict maximum internal resolution (2048px). When you target gpt4o, Squeezer must aggressively squash the massive 4096px image down to fit OpenAI's constraints (4095x2048). This massive loss in total pixel area mathematically translates to a huge token drop for Claude. However, Claude has no such maximum limits. When you explicitly target claude, Squeezer knows it doesn't need to destroy your image's resolution. It carefully keeps the massive 3840x2816 size to preserve ultra-fine detail, only trimming the absolute minimum padding to give you the most cost-efficient lossless version possible.


Benchmark / Savings

Real-world token consumption before and after vision-squeezer (using standard photos and screenshots without --max-tiles). Calculations use updated 2026 billing formulas.

Original SizeModelTokens BeforeTokens AfterSaved
1025 × 1025
(Screenshot)
Claude 4.5+
GPT-4.5
Gemini 2.0+
1,400
425
1,032
1,024
255
258
26.8%
40.0%
75.0%
4032 × 3024
(Phone Camera)
Claude 4.5+
GPT-4.5
Gemini 2.0+
16,257
2,125
6,192
12,232
1,745
4,128
24.8%
17.9%
33.3%
800 × 600
(Web Image)
Claude 4.5+
GPT-4.5
Gemini 2.0+
640
255
1,032
341
255
258
46.7%
0.0%
75.0%

(Note: GPT-5's high limits mean it rarely requires tiling optimization unless the image exceeds 6000px, but vision-squeezer will still crop padding and compress the file size dramatically).


MCP Tool: optimize_image

ArgumentTypeRequiredDefault
image_base64string
mode"auto" | "standard" | "ocr""auto"
output_format"jpeg" | "webp""jpeg"
qualityinteger 1–10075
tile_sizeinteger512
cropbooleantrue
bg_toleranceinteger 0–25515
max_tilesinteger
target_model"claude" | "gpt4o" | "gpt5" | "gemini"

Response:

{
  "optimized_base64": "...",
  "savings_report": {
    "tiles_before": 48,
    "tiles_after": 35,
    "tiles_saved": 13,
    "token_reduction_pct": "27.1",
    "size_reduction_pct": "58.9"
  }
}

Config Reference

ParameterTypeDefaultDescription
qualityu8 1–10075JPEG/WebP output quality
tile_sizeu32512Model patch size (512 = Claude/GPT, 256 = Gemini)
cropbooltrueRemove solid-color padding borders
bg_toleranceu8 0–25515Max channel delta for background detection
output_formatjpeg/webpjpegOutput encoding. WebP is ~30-50% smaller
max_tilesu32Hard cap on tile count (progressive downscale)
target_modelstringModel-aware: claude, gpt4o, gpt5, gemini

Supported Models

ModelTile SizePre-scalingToken Formula
Claude 3.5/4.5/4.7N/ANoneTokens ≈ (W × H) / 750
GPT-4o / GPT-4.5512×512fit 2048px → scale short 768px85 + tiles × 170
GPT-5/5.5512×512fit 6000px / 10.24M pxmin(85 + tiles × 170, 1536)
Gemini 2.0/3.0768×768> 384x384 → fit 4096px258 per tile (flat 258 if small)

Advanced Features

Persistence & Analytics

VisionSqueezer tracks every optimization locally in ~/.vision-squeezer/stats.db.

vision-squeezer stats
── VisionSqueezer Analytics ────────────────────────────────
Total Optimizations: 42
Total Tokens Saved:  842,500
Total Bytes Saved:   156.40 MB
Estimated USD Saved: $2.11
────────────────────────────────────────────────────────────

Shell Hook & Claude Code Skills

Add to .zshrc / .bashrc:

eval "$(vision-squeezer setup-hook)"

This installs two things at once:

  • squeeze alias — optimize and capture the output path in one step:
    img_path=$(squeeze data/logo.png --model gemini)
    
  • Claude Code skills — written to ~/.claude/skills/ automatically on first run

Available Skills

SkillTriggerWhat it does
vision-stats/vision-statsShow cumulative token & byte savings — reads local stats.db, zero MCP overhead
vision-doctor/vision-doctorCheck installed version vs latest npm release, show update command if outdated
vision-upgrade/vision-upgradeDetect install method (cargo/npm/npx) and run the correct upgrade command

Example output:

/vision-stats
── VisionSqueezer Analytics ────────────────────────────────
Total Optimizations: 42
Total Tokens Saved:  842,500
Total Bytes Saved:   156.40 MB
Estimated USD Saved: $2.11
────────────────────────────────────────────────────────────

/vision-doctor
## VisionSqueezer Doctor
- [x] Binary found: /usr/local/bin/vision-squeezer
- [x] Installed version: 0.1.8
- [x] Latest version (npm): 0.1.8
- [x] Status: Up to date

Marketplace install (alternative to setup-hook): Add to ~/.claude/settings.json:

{
  "extraKnownMarketplaces": {
    "vision-squeezer": {
      "source": { "source": "github", "repo": "eralpozcan/vision-squeezer" }
    }
  }
}

Then run /plugins add vision-stats@vision-squeezer, /plugins add vision-doctor@vision-squeezer, or /plugins add vision-upgrade@vision-squeezer in Claude Code.

Sandbox Mode: "Think in Code"

Execute atomic operations locally before the image ever reaches an LLM. The agent decides how to process the image; you pay zero tokens for the intermediate steps.

CLI:

vision-squeezer screenshot.png --ops '[{"op":"crop","x":10,"y":20,"width":500,"height":500},{"op":"binarize"}]'

MCP tool — sandbox_execute:

ArgumentTypeDescription
image_base64stringInput image
operationsarrayOrdered list of ops to apply

Supported ops: crop, grayscale, binarize, resize, contrast, brightness.

Crawler Integration

Optimize images on the fly in any Playwright/Puppeteer scraping pipeline — zero API token waste on raw screenshots.

await page.route('**/*.{png,jpg}', async (route) => {
  const response = await route.fetch();
  const body = await response.body();
  const optimized = await squeeze(body); // pipe through vision-squeezer
  route.fulfill({ body: optimized });
});

Contributing

PRs welcome. See CONTRIBUTING.md for setup and guidelines. Please follow our Code of Conduct.

License

Elastic License 2.0 (ELv2) — See LICENSE for details.

Related Servers