PDF to Markdown MCP Server

Hosted MCP server that converts PDFs into clean, LLM-ready Markdown with tables, formulas (LaTeX) and OCR. Own engines (MinerU + Docling), not an LLM wrapper.

View on pdf2md.dev

Documentation

Developer hub

Build with the PDF to Markdown API

One predictable job lifecycle over a REST API and an equivalent hosted MCP – create a job, wait for ready, fetch the Markdown, free the slot. The API and MCP never bypass product limits.

View pricing Jump to the API OpenAPI

At a glance

Two surfaces, one engine

Pick the integration that fits. Both call the same conversion engine and obey the same slots, limits and retention.

REST API

HTTPS endpoints with a bearer API key. Stable DTOs, predictable errors, idempotent create.

See the lifecycle

Hosted MCP

A managed Model Context Protocol endpoint exposing conversion as agent tools – a thin wrapper over the same API.

Connect MCP

Custom GPT Actions

Import a reduced OpenAPI spec into a ChatGPT Custom GPT so it can convert PDFs as a built-in tool.

Set up the action

Authentication

Bearer API keys over HTTPS

The API and MCP use bearer API keys – distinct from the device-signed path the Chrome extension uses. A free Google account is required to generate keys.

Get a key

Sign in with Google (free account).
Generate an API key in your account; it is shown once.
Send it as Authorization: Bearer p2m_… on every request.
Keys are secrets: store them server-side, rotate and revoke any time.

Honest defaults

Keys, not passwords. The extension stays anonymous and device-signed; API/MCP keys are a separate, account-bound credential.

HTTPS only. Always send keys over TLS; never embed a key in client-side code shipped to users.

Idempotent create. An optional Idempotency-Key on create lets you retry safely without duplicate jobs.

Scopes. Each API key carries scopes: jobs:create, jobs:read, jobs:download, jobs:delete (the defaults), plus settings:read / settings:write. Mint least-privilege keys; the REST API and MCP tools both enforce the key's scopes.

REST API

Create a job, wait, fetch Markdown, clean the slot

One predictable lifecycle, two ways to drive it: call the REST API from your own code, or use the equivalent hosted MCP tools. Never claim a result before status=ready.

REST API Hosted MCP

Create the job

POST a PDF URL or upload bytes. Get back a job id and slot. Idempotency-Key is honored but optional.

POST /api/v2/jobsmcp · pdf_to_markdown_create_job_from_url

Check status

Poll the job until ready or error, or register a signed webhook on paid tiers instead of polling.

GET /api/v2/jobs/{id}mcp · pdf_to_markdown_get_job

Fetch the Markdown

Download the result once ready. Read truncated and pages to know if a long document was returned partially.

GET /api/v2/jobs/{id}/downloadmcp · pdf_to_markdown_get_markdown

Delete / clean the slot

Free a slot when you're done. Deleting queued or processing jobs is destructive – confirm it in user-facing clients.

DELETE /api/v2/jobs/{id}mcp · pdf_to_markdown_delete_job

# 1. create a job from a PDF URL
curl -X POST https://pdf2md.dev/api/v2/jobs \
  -H "Authorization: Bearer p2m_…" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/report.pdf"}'
# → { "job_id": "job_9f3c…", "status": "queued" }

# 2. poll status
curl https://pdf2md.dev/api/v2/jobs/job_9f3c… \
  -H "Authorization: Bearer p2m_…"
# → { "status": "ready", "pages": 24, "truncated": false }

# 3. fetch the Markdown
curl https://pdf2md.dev/api/v2/jobs/job_9f3c…/download \
  -H "Authorization: Bearer p2m_…"

# 4. free the slot
curl -X DELETE https://pdf2md.dev/api/v2/jobs/job_9f3c… \
  -H "Authorization: Bearer p2m_…"

Errors. Responses use stable shapes and predictable HTTP codes (400 bad input, 401 auth, 404 unknown job, 409 no free slot / slots_full, 413 too large, 429 rate limited). The full schema lives in the OpenAPI spec.

More job endpoints

Create from file (multipart)POST /api/v2/jobs

List your jobsGET /api/v2/jobs

Batch create (paid)POST /api/v2/jobs/batch

Create accepts a JSON url or a multipart file, plus optional file_name, external_id, tags and callback_url / callback_secret for a per-job webhook. Batch create is all-or-nothing and must fit your free slots.

The job object

job_idstring

statusqueued · processing · ready · error

pages · output_sizeinteger

truncatedboolean

error_code · error_messagereason (when error)

download_urlstring (when ready)

external_id · tagsyour metadata

slot_usage · tierquota context

Account & usage. Check your tier, limits and usage at runtime with GET /api/v2/me, /api/v2/limits and /api/v2/usage; manage keys at /api/v2/api-keys and webhooks at /api/v2/webhooks.

Quickstart

Convert a PDF to Markdown in Python

The same four calls from any language. Here it is with requests. For a full step-by-step walkthrough with error handling, see the Python tutorial.

# pip install requests
import time, requests

API = "https://pdf2md.dev/api/v2"
H = {"Authorization": "Bearer p2m_…"}

# 1. create a job from a PDF URL (or post a file with files={"file": ...})
job = requests.post(f"{API}/jobs", headers=H,
    json={"url": "https://example.com/report.pdf"}).json()
jid = job["job_id"]

# 2. poll until ready (or register a webhook instead)
while True:
    j = requests.get(f"{API}/jobs/{jid}", headers=H).json()
    if j["status"] in ("ready", "error"):
        break
    time.sleep(3)

# 3. download the Markdown
md = requests.get(f"{API}/jobs/{jid}/download", headers=H).text
print(md)

More recipes: file upload & Node

Create from a local file (curl)

# multipart upload of a local PDF
curl -X POST https://pdf2md.dev/api/v2/jobs \
  -H "Authorization: Bearer p2m_…" \
  -F "[email protected]" \
  -F "file_name=document.pdf"

Node 18+ (global fetch)

// create from URL, poll, download
const API = "https://pdf2md.dev/api/v2";
const H = { Authorization: "Bearer p2m_…" };

let job = await (await fetch(`${API}/jobs`, {
  method: "POST",
  headers: { ...H, "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://example.com/report.pdf" })
})).json();

while (job.status === "queued" || job.status === "processing") {
  await new Promise(s => setTimeout(s, 2000));
  job = await (await fetch(`${API}/jobs/${job.job_id}`, { headers: H })).json();
}

if (job.status === "ready") {
  const md = await (await fetch(`${API}/jobs/${job.job_id}/download`, { headers: H })).text();
  console.log(md);
}

Webhook signature verification is in the Webhooks section; the MCP client config is in the MCP section. Full schema: OpenAPI.

Hosted MCP

Conversion as agent tools

Connect a compatible agent to our managed MCP endpoint. The tools are a thin wrapper over the REST API, so every call obeys the same slots, limits and retention.

Point the agent at the endpoint

JSON-RPC 2.0 over Streamable HTTP with your API key as the bearer token. No local server to run. Methods: initialize, tools/list, tools/call, ping.

POST https://pdf2md.dev/api/v2/mcp

Call the tools

The same lifecycle plus limits, exposed as seven tools. Each respects the key's scopes; tools/call responses include slot_usage and tier.

create_job_from_url · create_job_from_upload (jobs:create)list_jobs · get_job (jobs:read)get_markdown (jobs:download) · delete_job (jobs:delete)get_limits

Respect the rules

Wait for ready before using output; confirm before deleting queued/processing jobs; handle truncated and 429 Retry-After. (Tool names are prefixed pdf_to_markdown_.)

// MCP client config (hosted, no local process)
{
  "mcpServers": {
    "pdf2md": {
      "url": "https://pdf2md.dev/api/v2/mcp",
      "headers": {
        "Authorization": "Bearer p2m_…"
      }
    }
  }
}

OpenAPI & Custom GPT Actions

Import the spec, get a built-in tool

We publish two specs: the full OpenAPI for developers, and a reduced action spec with the safe, minimal subset for AI clients and ChatGPT Custom GPT Actions.

Full OpenAPI

The complete contract: every endpoint, parameter, DTO and error. Generate clients or explore it in your tooling.

Open the full spec

Reduced spec for Custom GPT

A minimal action subset (create, status, fetch) for ChatGPT Custom GPT Actions. Import the URL, set your API key as the auth, and your GPT converts PDFs natively.

Open the reduced spec

The reduced spec is a convenience for AI clients, not a security boundary – the same auth, scopes and limits apply as on the full API.

Limits & rate limits

Per-tier limits, applied to API and MCP alike

Limits come from your tier and apply identically across every surface. Live values are on the pricing page.

Free tier (with an account)

Active slots (queue depth)3

Max PDF size10 MB

Time budget per document15 min

Ready result retention1 hour

Paid tiers raise slots, file size, time budget, retention and rate limits, and add webhooks and batch create. Compare plans →

Rate limits & backpressure

Per-tier rate limits. Requests are rate limited per key; exceed them and you get 429 with a Retry-After header – back off and retry.

Slot pressure. If all slots are busy, create returns 409. Free a slot with delete, or wait for a job to finish.

Priority on paid. Paid jobs run with higher queue priority on a dedicated paid conversion pool, so they don't wait behind the free backlog.

Webhooks

Get notified instead of polling

On paid tiers, register a signed webhook (or pass a per-job callback_url) and we POST you on every notable terminal event: job.ready, job.error, job.truncated and job.deleted. The event is a notification, not a delivery: it carries no document content, so fetch the Markdown over the API after you receive it.

Register an endpoint

POST an HTTPS URL (SSRF-guarded) and an optional events filter. The signing secret whsec_… is returned once. Or set callback_url + callback_secret on a single job.

POST /api/v2/webhooksGET /api/v2/webhooks/deliveries

Receive the event

We POST JSON with headers X-P2M-Event, X-P2M-Timestamp, X-P2M-Delivery and X-P2M-Signature.

Verify, then act

Recompute the signature, ack with 2xx, and be idempotent (deliveries can retry with backoff). Then download the Markdown.

# delivery → your endpoint
X-P2M-Event: job.ready
X-P2M-Timestamp: 1718900000
X-P2M-Signature: sha256=9a8b7c…

{
  "event": "job.ready",
  "job": {
    "job_id": "job_9f3c…",
    "status": "ready",
    "pages": 24, "truncated": false,
    "download_url": "/api/v2/jobs/job_9f3c…/download"
  }
}

# verify (Python): signature = sha256= + hex(HMAC(secret, "ts.rawbody"))
import hmac, hashlib
def verify(secret, ts, raw_body, sig):
    expected = "sha256=" + hmac.new(
        secret.encode(), f"{ts}.".encode() + raw_body,
        hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, sig)

Prompts & safety

Portable instructions for agents

Drop these rules into an agent's system prompt so it drives the tools correctly and never invents results.

Wait for ready

Never claim or summarize a result before status=ready. While queued or processing, keep polling or wait for the webhook.

Confirm deletes

Deleting a queued or processing job is destructive. Ask the user before calling pdf_to_markdown_delete_job on a non-finished job.

Handle truncation

If truncated=true, tell the user the document was returned partially up to the tier time budget, and offer a higher tier or splitting the file.

Respect 429

On 429, wait for Retry-After seconds before retrying. Don't hammer the queue.

Clean slots

Delete finished jobs you no longer need so you don't exhaust your slots.

Read discovery

Start from /llms.txt and the OpenAPI spec rather than guessing endpoints from prose.

Machine-readable discovery

Everything an agent needs to integrate without reading source code: a compact capabilities file, a detailed context file and the OpenAPI spec.

llms.txt llms-full.txt OpenAPI