PromptThin
The invisible savings layer for AI Agents. Save 70% on tokens with zero code changes
PromptThin
Reduce LLM API costs through caching, compression, and smart routing. Zero code changes.
PromptThin is a transparent proxy that sits between your AI agents and LLM providers. Two environment variables and you're done — every API call gets four compounding savings routes applied automatically.
Your app ──→ PromptThin ──→ OpenAI / Anthropic / Gemini / Groq
How much can I save?
Savings depend on your workload. The four routes compound — each one reduces what the next has to work with.
- High cache hit rate (repeated or similar queries): up to 90%+ reduction
- Long context agents (multi-turn, large prompts): 40–60% reduction from pruning + compression
- Mixed workloads (some unique, some repeated): typically 20–40% reduction
Check your actual savings anytime from the dashboard or API:
curl https://promptthin.tech/usage/summary -H "X-API-Key: ts_your_key"
Four savings routes
| Route | What it does | Saving |
|---|---|---|
| Semantic Cache | Returns cached answers for similar questions — even if worded differently | Up to 100% on repeated queries |
| Prompt Compression | Compresses verbose prompts with LLMLingua 2 before sending | Up to 50% on input tokens |
| Model Router | Automatically routes simple tasks to cheaper models in <1ms | Up to 90% per request |
| Context Pruning | Summarises long conversation history when it exceeds 8K tokens | Up to 60% on long threads |
All four routes run on every request. You control which to enable or skip per-request.
Get started in 2 minutes
Step 1 — Create a free account
curl -X POST https://promptthin.tech/auth/register
-H "Content-Type: application/json"
-d '{"email": "[email protected]", "password": "yourpassword"}'
Returns your API key (ts_xxx). Save it.
Or sign up at promptthin.tech and get your key from the dashboard.
Step 2 — Register your LLM provider key
OpenAI
curl -X POST https://promptthin.tech/keys/openai
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-your-openai-key"}'
Anthropic
curl -X POST https://promptthin.tech/keys/anthropic
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-ant-your-anthropic-key"}'
Gemini
curl -X POST https://promptthin.tech/keys/gemini
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "AIza-your-gemini-key"}'
Groq
curl -X POST https://promptthin.tech/keys/groq
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "gsk_your-groq-key"}'
Your provider keys are encrypted with AES-256 and never appear in logs or responses.
Step 3 — Point your app at PromptThin
.env — two lines, no other changes needed
OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key
Done. Every LLM call now routes through PromptThin and savings start immediately.
Integration examples
OpenAI SDK — Python
from openai import OpenAI
client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", )
Identical to regular OpenAI usage
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )
OpenAI SDK — JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });
const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });
Anthropic SDK — Python
import anthropic
client = anthropic.Anthropic( base_url="https://promptthin.tech", api_key="ts_your_key", )
Anthropic SDK — JavaScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ baseURL: "https://promptthin.tech", apiKey: "ts_your_key", });
LangChain — Python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", model="gpt-4o", )
LangChain — JavaScript
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({ configuration: { baseURL: "https://promptthin.tech/v1" }, apiKey: "ts_your_key", });
AutoGen — Python
config_list = [{ "model": "gpt-4o", "base_url": "https://promptthin.tech/v1", "api_key": "ts_your_key", }]
assistant = AssistantAgent( name="assistant", llm_config={"config_list": config_list}, )
CrewAI
.env — CrewAI reads from environment automatically
OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai";
const openai = createOpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });
const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", });
LiteLLM
import litellm
litellm.api_base = "https://promptthin.tech/v1" litellm.api_key = "ts_your_key"
response = litellm.completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], )
Cursor / Continue.dev / Open WebUI
In settings, set:
- OpenAI API Base URL:
https://promptthin.tech/v1 - API Key:
ts_your_key
Supported models
PromptThin infers the provider from the model name — no extra config needed:
| Model prefix | Routes to |
|---|---|
| gpt-*, o1-*, o3-* | OpenAI |
| claude-* | Anthropic |
| gemini-* | Google Gemini |
| llama-*, mixtral-*, gemma-* | Groq |
MCP server
PromptThin exposes an MCP server for agents that support the Model Context Protocol (Claude Desktop, Claude Code, Cursor, Cline, Windsurf, Continue.dev).
Add to claude_desktop_config.json:
{ "mcpServers": { "promptthin": { "url": "https://promptthin.tech/mcp", "headers": { "X-API-Key": "ts_your_key" } } } }
Available tools your agent can call:
| Tool | What it does |
|---|---|
| get_usage_summary | Tokens saved, cache hit rate, cost saved |
| get_billing_status | Plan, requests remaining this month |
| flush_cache | Mark all cached responses as stale |
| get_recent_requests | Last N proxied requests with details |
Per-request controls
Skip specific savings routes on individual requests by adding headers:
Skip cache for this request only
client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the latest news?"}], extra_headers={"X-Cache-Control": "no-cache"}, )
| Header | Value | Effect |
|---|---|---|
| X-Cache-Control | no-cache | Skip cache lookup and storage |
| X-Prune-Control | no-prune | Skip context pruning |
| X-Compress-Control | no-compress | Skip prompt compression |
| X-Router-Control | no-route | Skip model routing |
Check your savings
curl https://promptthin.tech/usage/summary
-H "X-API-Key: ts_your_key"
Or open the live dashboard — shows total tokens saved, tokens you would have used, tokens actually used, and cache hit rate.
Pricing
| Plan | Price | Requests/month |
|---|---|---|
| Free | $0 | 500 req + 7-day unlimited trial |
| Pro | $4.99 first month, then $11.99/mo | 10,000 req |
| Enterprise | Custom | 100,000+ req, SLA, dedicated support |
Sign up free →
Security
- Provider keys are encrypted with AES-256 before storage — never in logs or responses
- All traffic is HTTPS only
- Keys are stored in GCP Secret Manager, not in the application database
- PromptThin never modifies response content — only compresses prompts and manages context
FAQ
**Do I need to change my code?**No. Set two environment variables and everything works automatically.
**Does PromptThin slow down my requests?**The semantic cache adds <2ms. Prompt compression and model routing add <1ms. Cache hits completely skip the LLM call, dramatically reducing latency.
**What if I want to use my own provider key instead of registering it?**Pass it directly in the Authorization header alongside your PromptThin key:
client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", default_headers={"Authorization": "Bearer sk-your-openai-key"}, )
PromptThin detects the provider key prefix and uses it directly.
**Can I use multiple providers?**Yes. Register keys for each provider. PromptThin routes to the right one based on the model name.
**Is this open source?**The integration documentation and examples are public. The proxy server is proprietary.
Contact
- Website: promptthin.tech
- Enterprise: [email protected]
- Issues: Open an issue on this repository
Serveurs connexes
stravacz-mcp
Order meals with strava.cz
OSP Marketing Tools for Node.js
A suite of tools for technical marketing content creation, optimization, and product positioning based on Open Strategy Partners' methodologies.
OSHA Compliance Assistant
Check workplace safety compliance against OSHA General Industry standards (29 CFR 1910) with cited regulation sections and corrective actions.
Obsidian iCloud MCP
Access and manage Obsidian notes stored in iCloud Drive.
SoftCroft Doc Server MCP
Manages BookStack documentation for the SoftCroft multi-agent system, aiding in Sage 200 to Odoo 17 migration.
Todoist
Integrates with the Todoist API to manage your tasks and projects.
Google Tag Manager
Manage Google Tag Manager accounts, containers, and tags via its API, with built-in Google OAuth.
Obsidian MCP Server
Interact with Obsidian vaults using the Local REST API plugin.
SlideSpeak
Create and automate PowerPoint presentations and slide decks using the SlideSpeak API. Requires an API key.
YouTube Uploader MCP
Upload videos to YouTube using OAuth2 authentication. Requires a Google OAuth 2.0 client secret file.