GitPrism
GitPrism is a fast, token-efficient, stateless pipeline that converts public GitHub repositories into LLM-ready Markdown.
GitPrism

A fast, token-efficient, stateless pipeline that converts public GitHub repositories into LLM-ready Markdown. Deployed as a single Cloudflare Worker serving humans, AI agents, and MCP clients from one shared core engine.
┌─────────────────────────────────────────────┐
│ Single Cloudflare Worker │
│ (gitprism) │
│ │
Humans ────────► │ / → Astro Static UI │
│ (Workers Static Assets) │
│ │
AI Agents ─────► │ /ingest?... → REST API │
│ /<github-url> → URL Proxy (shorthand) │
│ │
MCP Clients ───► │ /mcp → Stateless MCP Server │
│ (createMcpHandler) │
│ │
│ ┌───────────────────┐ │
│ │ Core Engine │ │
│ │ URL Parser │ │
│ │ Zipball Fetch │ │
│ │ fflate Decomp │ │
│ │ Filter/Ignore │ │
│ │ MD Formatter │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────┘
│
▼
GitHub Zipball API
(authenticated via secret)
Usage
Web UI
Visit https://gitprism.cloudemo.org/ and paste any GitHub URL.
REST API
Canonical form (recommended for programmatic use):
GET /ingest?repo=owner/repo&ref=main&path=src&detail=full
URL-appended shorthand (human-friendly):
GET /https://github.com/owner/repo/tree/main/src
Branch, ref, and subdirectory are automatically extracted from the GitHub URL. Append a detail shorthand to control output:
GET /https://github.com/owner/repo?summary
GET /https://github.com/owner/repo/tree/main/src?file-list
Parameters (canonical form):
| Parameter | Required | Default | Description |
|---|---|---|---|
repo | Yes (canonical) | — | owner/repo, e.g. cloudflare/workers-sdk |
ref | No | default branch | Branch, tag, or commit SHA |
path | No | — | Subdirectory to scope results to |
detail | No | full | Output level: summary, structure, file-list, or full |
no-cache | No | false | Set to true to bypass response cache |
Detail level shorthand — instead of ?detail=<level>, append the level as a bare key. Works on both the canonical and URL-proxy forms:
/ingest?repo=owner/repo&summary
/https://github.com/owner/repo?structure
Detail levels:
| Level | Shorthand | Returns |
|---|---|---|
summary | ?summary | YAML front-matter with repo name, ref, file count, total size |
structure | ?structure | Summary + ASCII directory tree |
file-list | ?file-list | Structure + table of every included file with byte size and line count |
full | ?full | Summary + structure + complete file contents in fenced code blocks. Streamed. |
Response headers:
| Header | Description |
|---|---|
Content-Type | text/markdown; charset=utf-8 |
X-Repo | owner/repo |
X-Ref | Original ref requested (branch, tag, or SHA) |
X-Commit-Sha | Resolved commit SHA used for cache key |
X-File-Count | Number of files included |
X-Total-Size | Total size of included files in bytes |
X-Truncated | true if output was truncated |
X-RateLimit-Remaining | GitHub API rate limit remaining |
X-RateLimit-Reset | GitHub API rate limit reset timestamp |
X-Cache | HIT or MISS |
Error responses (JSON):
| Status | Condition |
|---|---|
| 400 | Malformed input |
| 404 | Repository not found or private |
| 413 | Archive exceeds 50 MB limit |
| 429 | Rate limited (30 req/min per IP) |
| 502 | GitHub API error |
MCP Tool
Connect any MCP-compatible client to https://gitprism.cloudemo.org/mcp.
Available tool: ingest_repo
| Argument | Required | Default | Description |
|---|---|---|---|
url | Yes | — | GitHub URL or owner/repo shorthand |
detail | No | full | summary, structure, file-list, or full |
{
"url": "https://github.com/owner/repo",
"detail": "summary"
}
The tool is fully compatible with Code Mode agents — the strongly-typed Zod input schema and descriptive annotations allow client-side createCodeTool() to wrap it automatically.
Deployment
Option A — Workers Builds (recommended)
Workers Builds connects your GitHub repo to Cloudflare and deploys automatically on every push to main. The Astro UI is compiled during the build step; ui/dist/ is intentionally not committed to git.
Steps:
-
Go to the Cloudflare dashboard → Workers & Pages → Create → Import a Git repository
-
Connect your GitHub account and select this repo
-
Configure Build settings:
Setting Value Branch mainBuild command npm install && npm run buildDeploy command npx wrangler deploy(default) -
Click Save and Deploy — the first build will run immediately
-
Once deployed, go to your Worker → Settings → Variables and Secrets → Add a secret:
Name Value GITHUB_TOKENFine-grained PAT with public repo read-only scope Without this secret the Worker still functions, but GitHub API rate limits drop from 5,000 to 60 requests/hour (shared across all requests from the Worker's outbound IP).
-
Optional — Custom domain: Worker → Settings → Custom Domains → add your domain. This enables the Workers Cache API. Without a custom domain the Worker deploys to
<name>.<subdomain>.workers.devand caching silently no-ops (the code handles this gracefully). To enable routing once you have a domain, uncomment and update theroutesblock inwrangler.jsonc:"routes": [ { "pattern": "yourdomain.com/*", "custom_domain": true } ],
Option B — Manual deploy (Wrangler CLI)
git clone https://github.com/cougz/gitprism.git
cd gitprism
npm install
npm run build # builds ui/dist/
npx wrangler secret put GITHUB_TOKEN
npx wrangler deploy
Environment Variables
Configured in wrangler.jsonc under vars. Override in the Cloudflare dashboard under Worker → Settings → Variables and Secrets if needed:
| Variable | Default | Description |
|---|---|---|
MAX_ZIP_BYTES | 52428800 (50 MB) | Maximum zip archive size before rejecting with 413 |
MAX_OUTPUT_BYTES | 10485760 (10 MB) | Maximum output size before truncation |
MAX_FILE_COUNT | 5000 | Maximum file count before truncation |
CACHE_TTL_SECONDS | 86400 (24 hours) | Cache TTL for SHA-based cache keys |
Secrets
| Secret | How to set | Purpose |
|---|---|---|
GITHUB_TOKEN | Dashboard → Secrets, or npx wrangler secret put GITHUB_TOKEN | Fine-grained PAT, public repo read-only. Raises GitHub rate limit from 60 to 5,000 req/hr. |
Why a build step is required
ui/dist/ (the compiled Astro frontend) is excluded from git. Wrangler reads assets.directory = "./ui/dist" from wrangler.jsonc and uploads those files as static assets during deploy. If that directory does not exist at deploy time, the Worker deploys with no UI. The npm run build step compiles the Astro source in ui/src/ into ui/dist/ before Wrangler runs.
Development
# Build the Astro UI (required before deploying or running wrangler dev)
npm run build
# Run tests (169 tests)
npm test
# Watch mode
npm run test:watch
# Type-check
npm run typecheck
# Local dev server (requires ui/dist/ to exist — run npm run build first)
npm run dev
Architecture
Project Structure
gitprism/
├── src/
│ ├── index.ts # Worker entry point, routing
│ ├── types.ts # Shared interfaces and error classes
│ ├── engine/
│ │ ├── parser.ts # URL parsing and validation
│ │ ├── fetcher.ts # GitHub zipball download + size check
│ │ ├── decompressor.ts # fflate decompression + processing
│ │ ├── filter.ts # Ignore lists, .gitignore, binary detection
│ │ ├── formatter.ts # Markdown output generators (4 levels)
│ │ └── ingest.ts # Shared pipeline (used by API + MCP)
│ ├── mcp/
│ │ └── server.ts # createMcpHandler setup
│ ├── api/
│ │ ├── handler.ts # REST API handler, streaming, caching
│ │ └── llmstxt.ts # /llms.txt endpoint
│ └── utils/
│ ├── cache.ts # Workers Cache API helpers
│ ├── ratelimit.ts # Rate limiting helper
│ └── headers.ts # Response header builder
├── test/ # Vitest test files (169 tests)
├── ui/
│ ├── src/ # Astro source
│ ├── dist/ # Build output (gitignored)
│ └── astro.config.mjs
├── PLAN.md # Detailed implementation plan
└── wrangler.jsonc
Key Decisions
| Decision | Rationale |
|---|---|
| Single Worker (no Pages) | Workers Static Assets is the recommended approach. No CORS, simpler deployment. |
createMcpHandler() (no Durable Objects) | Tool is stateless. No per-session state needed. |
fflate over jszip | Streaming decompression, smaller bundle, lower peak memory in V8 isolates. |
Server-side GITHUB_TOKEN | Raises rate limit from 60 to 5,000 req/hr without user auth. |
| Pre-flight size check | Prevents OOM crashes from large repos. |
| Cache API from day one | Identical repo+ref+detail produces identical output. Caching cuts latency and GitHub API usage. |
Streaming TransformStream for full | Reduces peak memory, improves time-to-first-byte. |
File Filtering
Hardcoded Ignore List
The following are always excluded regardless of .gitignore:
Directories: node_modules/, vendor/, .git/, __pycache__/, .venv/, venv/, dist/, build/, .next/, .nuxt/, .svelte-kit/, .output/, .cache/, .parcel-cache/, coverage/, .tox/, .mypy_cache/
Files: package-lock.json, yarn.lock, pnpm-lock.yaml, bun.lockb, Cargo.lock, composer.lock, Gemfile.lock, go.sum, poetry.lock, *.min.js, *.min.css, *.map, *.wasm, *.pb.go, *.pyc, *.pyo
Binary extensions: .png, .jpg, .jpeg, .gif, .ico, .webp, .bmp, .tiff, .svg, .woff, .woff2, .ttf, .eot, .otf, .pdf, .zip, .tar, .gz, .bz2, .7z, .rar, .exe, .dll, .so, .dylib, .bin, .o, .a, .mp3, .mp4, .avi, .mov, .mkv, .flac, .wav, .ogg, .sqlite, .db, .DS_Store
Binary content detection: Files containing null bytes in their first 8 KB are skipped regardless of extension.
.gitignore Support
The root .gitignore of the repository is parsed and applied. Supports:
- Wildcard patterns (
*.log,**/*.tmp) - Directory patterns with trailing slash (
logs/) - Rooted patterns (
/build) - Negation patterns (
!important.log) - Comments (
# this line is ignored)
Limitation: Only the root .gitignore is evaluated. Nested .gitignore files (e.g., src/.gitignore) are not supported in v1.
Code Mode Compatibility
The ingest_repo MCP tool is compatible with Code Mode agents by design:
- Clear, descriptive tool name (
ingest_repo) - Multi-sentence description explaining all four detail levels
- Strongly-typed Zod schemas with
.describe()on every parameter - No server-side changes needed — standard MCP tools with typed schemas are inherently Code Mode compatible
Limits
| Limit | Value | Configurable |
|---|---|---|
| Max zip archive size | 50 MB | MAX_ZIP_BYTES env var |
| Max output size | 10 MB | MAX_OUTPUT_BYTES env var |
| Max file count | 5,000 | MAX_FILE_COUNT env var |
| Rate limit | 30 req/min per IP | wrangler.jsonc ratelimits binding |
| Cache TTL | 24 hours | CACHE_TTL_SECONDS env var |
Caching behavior:
Cache keys use resolved commit SHAs for automatic invalidation when repos update. Old cache entries expire naturally after TTL. If SHA resolution fails, caching is skipped and fresh data is always fetched.
License
MIT
Related Servers
Bright Data
sponsorDiscover, extract, and interact with the web - one interface powering automated access across the public internet.
LinkedIn Profile Scraper
Fetches LinkedIn profile information using the Fresh LinkedIn Profile Data API.
Web Browser MCP Server
Provides advanced web browsing capabilities for AI applications.
Web Search
Performs web searches and extracts full page content from search results.
ScrAPI MCP Server
A server for scraping web pages using the ScrAPI API.
comet-mcp
Connect Claude Code to Perplexity Comet browser for agentic web browsing, deep research, and real-time task monitoring
Clawpage
Extract and structure any web page into clean JSON.
MCP YouTube Transcript Server
Retrieves transcripts from YouTube videos for content analysis and processing.
MCP-Puppeteer-Linux
Automate web browsers on Linux using Puppeteer. Enables LLMs to interact with web pages, take screenshots, and execute JavaScript.
Primp MCP Server
An MCP server for the Primp HTTP client, enabling browser impersonation for requests and file uploads.
Outscraper
Extract data from Google Maps, including places and reviews, using the Outscraper API.