GitPrism

GitPrism is a fast, token-efficient, stateless pipeline that converts public GitHub repositories into LLM-ready Markdown.

GitPrism

Dashboard

A fast, token-efficient, stateless pipeline that converts public GitHub repositories into LLM-ready Markdown. Deployed as a single Cloudflare Worker serving humans, AI agents, and MCP clients from one shared core engine.

                    ┌─────────────────────────────────────────────┐
                    │          Single Cloudflare Worker            │
                    │               (gitprism)                    │
                    │                                             │
   Humans ────────► │  /              → Astro Static UI           │
                    │                   (Workers Static Assets)   │
                    │                                             │
   AI Agents ─────► │  /ingest?...    → REST API                  │
                    │  /<github-url>  → URL Proxy (shorthand)     │
                    │                                             │
   MCP Clients ───► │  /mcp           → Stateless MCP Server      │
                    │                   (createMcpHandler)        │
                    │                                             │
                    │         ┌───────────────────┐               │
                    │         │   Core Engine      │               │
                    │         │  URL Parser        │               │
                    │         │  Zipball Fetch     │               │
                    │         │  fflate Decomp     │               │
                    │         │  Filter/Ignore     │               │
                    │         │  MD Formatter      │               │
                    │         └───────────────────┘               │
                    └─────────────────────────────────────────────┘
                                       │
                                       ▼
                              GitHub Zipball API
                          (authenticated via secret)

Usage

Web UI

Visit https://gitprism.cloudemo.org/ and paste any GitHub URL.

REST API

Canonical form (recommended for programmatic use):

GET /ingest?repo=owner/repo&ref=main&path=src&detail=full

URL-appended shorthand (human-friendly):

GET /https://github.com/owner/repo/tree/main/src

Branch, ref, and subdirectory are automatically extracted from the GitHub URL. Append a detail shorthand to control output:

GET /https://github.com/owner/repo?summary
GET /https://github.com/owner/repo/tree/main/src?file-list

Parameters (canonical form):

ParameterRequiredDefaultDescription
repoYes (canonical)owner/repo, e.g. cloudflare/workers-sdk
refNodefault branchBranch, tag, or commit SHA
pathNoSubdirectory to scope results to
detailNofullOutput level: summary, structure, file-list, or full
no-cacheNofalseSet to true to bypass response cache

Detail level shorthand — instead of ?detail=<level>, append the level as a bare key. Works on both the canonical and URL-proxy forms:

/ingest?repo=owner/repo&summary
/https://github.com/owner/repo?structure

Detail levels:

LevelShorthandReturns
summary?summaryYAML front-matter with repo name, ref, file count, total size
structure?structureSummary + ASCII directory tree
file-list?file-listStructure + table of every included file with byte size and line count
full?fullSummary + structure + complete file contents in fenced code blocks. Streamed.

Response headers:

HeaderDescription
Content-Typetext/markdown; charset=utf-8
X-Repoowner/repo
X-RefOriginal ref requested (branch, tag, or SHA)
X-Commit-ShaResolved commit SHA used for cache key
X-File-CountNumber of files included
X-Total-SizeTotal size of included files in bytes
X-Truncatedtrue if output was truncated
X-RateLimit-RemainingGitHub API rate limit remaining
X-RateLimit-ResetGitHub API rate limit reset timestamp
X-CacheHIT or MISS

Error responses (JSON):

StatusCondition
400Malformed input
404Repository not found or private
413Archive exceeds 50 MB limit
429Rate limited (30 req/min per IP)
502GitHub API error

MCP Tool

Connect any MCP-compatible client to https://gitprism.cloudemo.org/mcp.

Available tool: ingest_repo

ArgumentRequiredDefaultDescription
urlYesGitHub URL or owner/repo shorthand
detailNofullsummary, structure, file-list, or full
{
  "url": "https://github.com/owner/repo",
  "detail": "summary"
}

The tool is fully compatible with Code Mode agents — the strongly-typed Zod input schema and descriptive annotations allow client-side createCodeTool() to wrap it automatically.

Deployment

Option A — Workers Builds (recommended)

Workers Builds connects your GitHub repo to Cloudflare and deploys automatically on every push to main. The Astro UI is compiled during the build step; ui/dist/ is intentionally not committed to git.

Steps:

  1. Go to the Cloudflare dashboardWorkers & PagesCreateImport a Git repository

  2. Connect your GitHub account and select this repo

  3. Configure Build settings:

    SettingValue
    Branchmain
    Build commandnpm install && npm run build
    Deploy commandnpx wrangler deploy (default)
  4. Click Save and Deploy — the first build will run immediately

  5. Once deployed, go to your Worker → SettingsVariables and SecretsAdd a secret:

    NameValue
    GITHUB_TOKENFine-grained PAT with public repo read-only scope

    Without this secret the Worker still functions, but GitHub API rate limits drop from 5,000 to 60 requests/hour (shared across all requests from the Worker's outbound IP).

  6. Optional — Custom domain: Worker → SettingsCustom Domains → add your domain. This enables the Workers Cache API. Without a custom domain the Worker deploys to <name>.<subdomain>.workers.dev and caching silently no-ops (the code handles this gracefully). To enable routing once you have a domain, uncomment and update the routes block in wrangler.jsonc:

    "routes": [
      { "pattern": "yourdomain.com/*", "custom_domain": true }
    ],
    

Option B — Manual deploy (Wrangler CLI)

git clone https://github.com/cougz/gitprism.git
cd gitprism
npm install
npm run build          # builds ui/dist/
npx wrangler secret put GITHUB_TOKEN
npx wrangler deploy

Environment Variables

Configured in wrangler.jsonc under vars. Override in the Cloudflare dashboard under Worker → SettingsVariables and Secrets if needed:

VariableDefaultDescription
MAX_ZIP_BYTES52428800 (50 MB)Maximum zip archive size before rejecting with 413
MAX_OUTPUT_BYTES10485760 (10 MB)Maximum output size before truncation
MAX_FILE_COUNT5000Maximum file count before truncation
CACHE_TTL_SECONDS86400 (24 hours)Cache TTL for SHA-based cache keys

Secrets

SecretHow to setPurpose
GITHUB_TOKENDashboard → Secrets, or npx wrangler secret put GITHUB_TOKENFine-grained PAT, public repo read-only. Raises GitHub rate limit from 60 to 5,000 req/hr.

Why a build step is required

ui/dist/ (the compiled Astro frontend) is excluded from git. Wrangler reads assets.directory = "./ui/dist" from wrangler.jsonc and uploads those files as static assets during deploy. If that directory does not exist at deploy time, the Worker deploys with no UI. The npm run build step compiles the Astro source in ui/src/ into ui/dist/ before Wrangler runs.

Development

# Build the Astro UI (required before deploying or running wrangler dev)
npm run build

# Run tests (169 tests)
npm test

# Watch mode
npm run test:watch

# Type-check
npm run typecheck

# Local dev server (requires ui/dist/ to exist — run npm run build first)
npm run dev

Architecture

Project Structure

gitprism/
├── src/
│   ├── index.ts              # Worker entry point, routing
│   ├── types.ts              # Shared interfaces and error classes
│   ├── engine/
│   │   ├── parser.ts         # URL parsing and validation
│   │   ├── fetcher.ts        # GitHub zipball download + size check
│   │   ├── decompressor.ts   # fflate decompression + processing
│   │   ├── filter.ts         # Ignore lists, .gitignore, binary detection
│   │   ├── formatter.ts      # Markdown output generators (4 levels)
│   │   └── ingest.ts         # Shared pipeline (used by API + MCP)
│   ├── mcp/
│   │   └── server.ts         # createMcpHandler setup
│   ├── api/
│   │   ├── handler.ts        # REST API handler, streaming, caching
│   │   └── llmstxt.ts        # /llms.txt endpoint
│   └── utils/
│       ├── cache.ts          # Workers Cache API helpers
│       ├── ratelimit.ts      # Rate limiting helper
│       └── headers.ts        # Response header builder
├── test/                     # Vitest test files (169 tests)
├── ui/
│   ├── src/                  # Astro source
│   ├── dist/                 # Build output (gitignored)
│   └── astro.config.mjs
├── PLAN.md                   # Detailed implementation plan
└── wrangler.jsonc

Key Decisions

DecisionRationale
Single Worker (no Pages)Workers Static Assets is the recommended approach. No CORS, simpler deployment.
createMcpHandler() (no Durable Objects)Tool is stateless. No per-session state needed.
fflate over jszipStreaming decompression, smaller bundle, lower peak memory in V8 isolates.
Server-side GITHUB_TOKENRaises rate limit from 60 to 5,000 req/hr without user auth.
Pre-flight size checkPrevents OOM crashes from large repos.
Cache API from day oneIdentical repo+ref+detail produces identical output. Caching cuts latency and GitHub API usage.
Streaming TransformStream for fullReduces peak memory, improves time-to-first-byte.

File Filtering

Hardcoded Ignore List

The following are always excluded regardless of .gitignore:

Directories: node_modules/, vendor/, .git/, __pycache__/, .venv/, venv/, dist/, build/, .next/, .nuxt/, .svelte-kit/, .output/, .cache/, .parcel-cache/, coverage/, .tox/, .mypy_cache/

Files: package-lock.json, yarn.lock, pnpm-lock.yaml, bun.lockb, Cargo.lock, composer.lock, Gemfile.lock, go.sum, poetry.lock, *.min.js, *.min.css, *.map, *.wasm, *.pb.go, *.pyc, *.pyo

Binary extensions: .png, .jpg, .jpeg, .gif, .ico, .webp, .bmp, .tiff, .svg, .woff, .woff2, .ttf, .eot, .otf, .pdf, .zip, .tar, .gz, .bz2, .7z, .rar, .exe, .dll, .so, .dylib, .bin, .o, .a, .mp3, .mp4, .avi, .mov, .mkv, .flac, .wav, .ogg, .sqlite, .db, .DS_Store

Binary content detection: Files containing null bytes in their first 8 KB are skipped regardless of extension.

.gitignore Support

The root .gitignore of the repository is parsed and applied. Supports:

  • Wildcard patterns (*.log, **/*.tmp)
  • Directory patterns with trailing slash (logs/)
  • Rooted patterns (/build)
  • Negation patterns (!important.log)
  • Comments (# this line is ignored)

Limitation: Only the root .gitignore is evaluated. Nested .gitignore files (e.g., src/.gitignore) are not supported in v1.

Code Mode Compatibility

The ingest_repo MCP tool is compatible with Code Mode agents by design:

  • Clear, descriptive tool name (ingest_repo)
  • Multi-sentence description explaining all four detail levels
  • Strongly-typed Zod schemas with .describe() on every parameter
  • No server-side changes needed — standard MCP tools with typed schemas are inherently Code Mode compatible

Limits

LimitValueConfigurable
Max zip archive size50 MBMAX_ZIP_BYTES env var
Max output size10 MBMAX_OUTPUT_BYTES env var
Max file count5,000MAX_FILE_COUNT env var
Rate limit30 req/min per IPwrangler.jsonc ratelimits binding
Cache TTL24 hoursCACHE_TTL_SECONDS env var

Caching behavior:

Cache keys use resolved commit SHAs for automatic invalidation when repos update. Old cache entries expire naturally after TTL. If SHA resolution fails, caching is skipped and fresh data is always fetched.

License

MIT

Related Servers