x402-vision-cropper MCP Server
MCP server that lets AI agents crop screenshots to exact pixel regions before vision LLM inference, paying 0.0005 USDC per crop on Base L2 via x402 — no API key required.
Documentation
x402 AI Vision Sub-Element Cropper
llms.txt — machine-readable service contract for autonomous AI agents
Spec: https://llmstxt.org
Stateless image cropping API for AI agents. Submit a base64 screenshot and pixel bounding-box coordinates, receive a cropped sub-element as base64 PNG. Gated by a 0.0005 USDC micropayment on Base L2. Designed to reduce vision LLM token costs by isolating only the screen region of interest before inference.
Service Identity
- Name: x402 AI Vision Sub-Element Cropper
- Version: 1.0.0
- Protocol: HTTP/1.1, JSON
- Authentication: x402 micropayment (no API keys, no accounts)
- Statefulness: Stateless — no session, no stored data, no database
Payment Protocol
This API uses the HTTP 402 Payment Required pattern for autonomous machine-to-machine payments.
Step 1 — Trigger the challenge
Make any request to POST /crop without a payment header. You will receive a 402 response containing payment instructions in both the response body and HTTP headers.
Step 2 — Pay on-chain
Transfer exactly the required USDC amount to the recipient wallet on the specified network. The required values are machine-readable in the 402 response headers:
- x-payment-price-usdc: amount in USDC (e.g. 0.0005)
- x-payment-recipient: destination wallet address
- x-payment-network: network name (e.g. base)
- x-payment-chain-id: EVM chain ID (e.g. 8453 for Base mainnet)
- x-payment-token: always USDC
- x-payment-token-contract: USDC contract address on that network
- x-payment-submit-header: the header name to use when resubmitting
Step 3 — Resubmit with proof
Retry the identical POST /crop request, adding the transaction hash header:
x-payment-tx-hash: 0x
The server verifies the transaction on-chain (checks receipt status, USDC Transfer log, recipient address, and amount), then executes the crop and returns the result.
Payment rules
- One payment per request. Each tx hash is single-use within a 60-second window.
- Do not reuse tx hashes across concurrent requests — the server will reject duplicates.
- If the server returns a 5xx error after accepting your payment, the tx hash is released and you may retry with the same hash.
- If the server returns a 4xx error after accepting your payment (e.g. invalid coordinates), the tx hash is consumed. Pay again for a new request.
Endpoints
POST /crop
Crops a rectangular region from a base64-encoded image.
Request headers:
- Content-Type: application/json (required)
- x-payment-tx-hash: 0x (required after payment)
Request body (JSON):
- image (string, required): Base64-encoded source image. Do NOT include a data URI prefix (no "data:image/png;base64," — strip it first). Supports PNG, JPEG, WebP, AVIF, TIFF, GIF.
- x (integer, required): Left edge of crop region in pixels. Must be >= 0.
- y (integer, required): Top edge of crop region in pixels. Must be >= 0.
- width (integer, required): Width of crop region in pixels. Must be >= 1.
- height (integer, required): Height of crop region in pixels. Must be >= 1.
Coordinates map directly to getBoundingClientRect() output from browser automation tools. Integer truncation is applied if floats are passed.
Success response (200):
{
"success": true,
"data": {
"base64": "<PNG image as base64 string>",
"mime": "image/png",
"width": 640,
"height": 80,
"bytes": 14821,
"clamped": false
},
"meta": {
"tx_hash": "0x...",
"crop_input": { "x": 120, "y": 45, "width": 640, "height": 80 }
}
}
Output is always PNG (lossless). Suitable for direct embedding in vision LLM prompts as a base64 data URI: prepend "data:image/png;base64," to the returned base64 string.
The "clamped" field is true if the requested crop region extended beyond the image boundaries and was automatically reduced to fit. Recalibrate your coordinate source if you receive clamped: true.
402 response (no payment or invalid payment):
{
"error": "Payment Required",
"code": 402,
"payment": {
"amount_usdc": "0.0005",
"recipient_wallet": "0x...",
"network": "base",
"chain_id": 8453,
"token": "USDC",
"token_contract": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913",
"instructions": "Transfer 0.0005 USDC to 0x... on base (chain 8453), then retry with the transaction hash in the 'x-payment-tx-hash' header."
}
}
Error responses:
- 400: Malformed request body or invalid image data
- 402: Payment required, invalid, or replayed
- 413: Request body exceeds size limit (~14 MB)
- 422: Crop region is entirely outside image bounds
- 500: Server-side processing error (tx hash is released; safe to retry)
GET /health
Liveness check. No payment required.
Response (200):
{
"status": "ok",
"service": "x402-vision-cropper",
"ts": "2025-01-01T00:00:00.000Z",
"replay_guard": { "tracked_hashes": 0 }
}
Agent Integration Pattern
1. GET /health → confirm service is live
2. POST /crop (no payment header) → receive 402 + payment.instructions
3. Parse x-payment-recipient, x-payment-price-usdc, x-payment-chain-id
4. Execute USDC transfer on Base L2
5. Wait for transaction confirmation (1+ block)
6. POST /crop (same body + x-payment-tx-hash header) → receive cropped PNG
7. Prepend "data:image/png;base64," to data.base64
8. Pass to vision LLM as image input
Recommended Use Cases
- OCR on specific UI elements (buttons, labels, input fields, table cells)
- Visual regression testing on isolated components
- Extracting price, stock, or data fields from financial dashboards
- Reading CAPTCHA sub-regions before passing to specialist solvers
- Any task where a full-screenshot vision call is wasteful or inaccurate
Constraints and Limits
- Maximum image input: ~10 MB decoded (14 MB base64)
- Maximum crop dimension: 4000px per side
- Concurrent requests: optimised for 10–13 simultaneous requests
- No persistent storage: all data is discarded after each response
- Replay window: tx hashes are locked for 60 seconds, max 1000 tracked simultaneously
- Output format: always PNG, regardless of input format
Network Details
- Payment network: Base (Base mainnet, chain ID 8453)
- USDC contract on Base: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
- Gas: near-zero (Base L2)
- Settlement: typically 1–2 seconds
Optional: Full llms-full.txt
See /llms-full.txt for extended examples including multi-step agent pseudocode, error recovery flows, and coordinate extraction patterns from common browser automation frameworks (Playwright, Puppeteer, Selenium).