PromptThin
The invisible savings layer for AI Agents. Save 70% on tokens with zero code changes
PromptThin
Reduce LLM API costs through caching, compression, and smart routing. Zero code changes.
PromptThin is a transparent proxy that sits between your AI agents and LLM providers. Two environment variables and you're done — every API call gets four compounding savings routes applied automatically.
Your app ──→ PromptThin ──→ OpenAI / Anthropic / Gemini / Groq
How much can I save?
Savings depend on your workload. The four routes compound — each one reduces what the next has to work with.
- High cache hit rate (repeated or similar queries): up to 90%+ reduction
- Long context agents (multi-turn, large prompts): 40–60% reduction from pruning + compression
- Mixed workloads (some unique, some repeated): typically 20–40% reduction
Check your actual savings anytime from the dashboard or API:
curl https://promptthin.tech/usage/summary -H "X-API-Key: ts_your_key"
Four savings routes
| Route | What it does | Saving |
|---|---|---|
| Semantic Cache | Returns cached answers for similar questions — even if worded differently | Up to 100% on repeated queries |
| Prompt Compression | Compresses verbose prompts with LLMLingua 2 before sending | Up to 50% on input tokens |
| Model Router | Automatically routes simple tasks to cheaper models in <1ms | Up to 90% per request |
| Context Pruning | Summarises long conversation history when it exceeds 8K tokens | Up to 60% on long threads |
All four routes run on every request. You control which to enable or skip per-request.
Get started in 2 minutes
Step 1 — Create a free account
curl -X POST https://promptthin.tech/auth/register
-H "Content-Type: application/json"
-d '{"email": "[email protected]", "password": "yourpassword"}'
Returns your API key (ts_xxx). Save it.
Or sign up at promptthin.tech and get your key from the dashboard.
Step 2 — Register your LLM provider key
OpenAI
curl -X POST https://promptthin.tech/keys/openai
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-your-openai-key"}'
Anthropic
curl -X POST https://promptthin.tech/keys/anthropic
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-ant-your-anthropic-key"}'
Gemini
curl -X POST https://promptthin.tech/keys/gemini
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "AIza-your-gemini-key"}'
Groq
curl -X POST https://promptthin.tech/keys/groq
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "gsk_your-groq-key"}'
Your provider keys are encrypted with AES-256 and never appear in logs or responses.
Step 3 — Point your app at PromptThin
.env — two lines, no other changes needed
OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key
Done. Every LLM call now routes through PromptThin and savings start immediately.
Integration examples
OpenAI SDK — Python
from openai import OpenAI
client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", )
Identical to regular OpenAI usage
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )
OpenAI SDK — JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });
const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });
Anthropic SDK — Python
import anthropic
client = anthropic.Anthropic( base_url="https://promptthin.tech", api_key="ts_your_key", )
Anthropic SDK — JavaScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ baseURL: "https://promptthin.tech", apiKey: "ts_your_key", });
LangChain — Python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", model="gpt-4o", )
LangChain — JavaScript
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ configuration: { baseURL: "https://promptthin.tech/v1" }, apiKey: "ts_your_key", });
AutoGen — Python
config_list = [{ "model": "gpt-4o", "base_url": "https://promptthin.tech/v1", "api_key": "ts_your_key", }]
assistant = AssistantAgent( name="assistant", llm_config={"config_list": config_list}, )
CrewAI
.env — CrewAI reads from environment automatically
OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai";
const openai = createOpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });
const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", });
LiteLLM
import litellm
litellm.api_base = "https://promptthin.tech/v1" litellm.api_key = "ts_your_key"
response = litellm.completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], )
Cursor / Continue.dev / Open WebUI
In settings, set:
- OpenAI API Base URL:
https://promptthin.tech/v1 - API Key:
ts_your_key
Supported models
PromptThin infers the provider from the model name — no extra config needed:
| Model prefix | Routes to |
|---|---|
| gpt-*, o1-*, o3-* | OpenAI |
| claude-* | Anthropic |
| gemini-* | Google Gemini |
| llama-*, mixtral-*, gemma-* | Groq |
MCP server
PromptThin exposes an MCP server for agents that support the Model Context Protocol (Claude Desktop, Claude Code, Cursor, Cline, Windsurf, Continue.dev).
Add to claude_desktop_config.json:
{ "mcpServers": { "promptthin": { "url": "https://promptthin.tech/mcp", "headers": { "X-API-Key": "ts_your_key" } } } }
Available tools your agent can call:
| Tool | What it does |
|---|---|
| get_usage_summary | Tokens saved, cache hit rate, cost saved |
| get_billing_status | Plan, requests remaining this month |
| flush_cache | Mark all cached responses as stale |
| get_recent_requests | Last N proxied requests with details |
Per-request controls
Skip specific savings routes on individual requests by adding headers:
Skip cache for this request only
client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the latest news?"}], extra_headers={"X-Cache-Control": "no-cache"}, )
| Header | Value | Effect |
|---|---|---|
| X-Cache-Control | no-cache | Skip cache lookup and storage |
| X-Prune-Control | no-prune | Skip context pruning |
| X-Compress-Control | no-compress | Skip prompt compression |
| X-Router-Control | no-route | Skip model routing |
Check your savings
curl https://promptthin.tech/usage/summary
-H "X-API-Key: ts_your_key"
Or open the live dashboard — shows total tokens saved, tokens you would have used, tokens actually used, and cache hit rate.
Pricing
| Plan | Price | Requests/month |
|---|---|---|
| Free | $0 | 500 req + 7-day unlimited trial |
| Pro | $4.99 first month, then $11.99/mo | 10,000 req |
| Enterprise | Custom | 100,000+ req, SLA, dedicated support |
Sign up free →
Security
- Provider keys are encrypted with AES-256 before storage — never in logs or responses
- All traffic is HTTPS only
- Keys are stored in GCP Secret Manager, not in the application database
- PromptThin never modifies response content — only compresses prompts and manages context
FAQ
**Do I need to change my code?**No. Set two environment variables and everything works automatically.
**Does PromptThin slow down my requests?**The semantic cache adds <2ms. Prompt compression and model routing add <1ms. Cache hits completely skip the LLM call, dramatically reducing latency.
**What if I want to use my own provider key instead of registering it?**Pass it directly in the Authorization header alongside your PromptThin key:
client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", default_headers={"Authorization": "Bearer sk-your-openai-key"}, )
PromptThin detects the provider key prefix and uses it directly.
**Can I use multiple providers?**Yes. Register keys for each provider. PromptThin routes to the right one based on the model name.
**Is this open source?**The integration documentation and examples are public. The proxy server is proprietary.
Contact
- Website: promptthin.tech
- Enterprise: [email protected]
- Issues: Open an issue on this repository
Related Servers
Notion
Integrate with Notion workspaces to manage databases, pages, and content.
Mowen Note
An MCP server for interacting with the Mowen Note API, enabling note management and file uploads within MCP clients.
Lazy Toggl MCP
Simple unofficial MCP server to track time via Toggl API
Spotify MCP Server
Control Spotify with natural language. Enables search, playback control, queue management, and device control using conversational commands.
MS-365 MCP Server
A containerized MCP server for Microsoft 365, featuring OAuth authentication and OpenTelemetry instrumentation for monitoring.
DAISYS
Generate high-quality text-to-speech and text-to-voice outputs using the DAISYS platform.
Bitly MCP Server
Turn your AI assistant into a digital marketing hub that creates, organizes, and analyzes links and QR Codes on demand.
Squad AI
Your AI Product Manager. Surface insights, build roadmaps, and plan strategy with 30+ tools.
Humanizer PRO
Humanizer PRO is an MCP server that transforms AI-generated text into natural, human-sounding content. It provides 4 tools: - humanize_text: Rewrite AI text to bypass detectors like GPTZero, Turnitin, Originality.ai, Copyleaks, and ZeroGPT. Three modes: Stealth (highest bypass rate), Academic (Turnitin-optimized), SEO (marketing content). - scan_ai_detection: Analyze text for AI patterns. Returns AI probability score, human-likeness percentage, and verdict. - check_word_balance: Check remaining word credits and subscription plan details. - get_subscription_plans: Browse plans - Free (500 words), Starter ($9.99/mo, 30K words), Creator ($14.99/mo, 100K words), Pro Annual ($119.88/yr, 100K words/mo). Authentication: OAuth 2.0. Works with ChatGPT, Claude, Cursor, and all MCP-compatible clients.
Doc Lib MCP
An MCP server for document ingestion, chunking, semantic search, and note management.