PromptThin

The invisible savings layer for AI Agents. Save 70% on tokens with zero code changes

PromptThin

Reduce LLM API costs through caching, compression, and smart routing. Zero code changes.

PromptThin is a transparent proxy that sits between your AI agents and LLM providers. Two environment variables and you're done — every API call gets four compounding savings routes applied automatically.

Your app ──→ PromptThin ──→ OpenAI / Anthropic / Gemini / Groq

Website Free tier 7-day trial

How much can I save?

Savings depend on your workload. The four routes compound — each one reduces what the next has to work with.

High cache hit rate (repeated or similar queries): up to 90%+ reduction
Long context agents (multi-turn, large prompts): 40–60% reduction from pruning + compression
Mixed workloads (some unique, some repeated): typically 20–40% reduction

Check your actual savings anytime from the dashboard or API:

curl https://promptthin.tech/usage/summary -H "X-API-Key: ts_your_key"

Four savings routes

Route	What it does	Saving
Semantic Cache	Returns cached answers for similar questions — even if worded differently	Up to 100% on repeated queries
Prompt Compression	Compresses verbose prompts with LLMLingua 2 before sending	Up to 50% on input tokens
Model Router	Automatically routes simple tasks to cheaper models in <1ms	Up to 90% per request
Context Pruning	Summarises long conversation history when it exceeds 8K tokens	Up to 60% on long threads

All four routes run on every request. You control which to enable or skip per-request.

Get started in 2 minutes

Step 1 — Create a free account

curl -X POST https://promptthin.tech/auth/register
-H "Content-Type: application/json"
-d '{"email": "[email protected]", "password": "yourpassword"}'

Returns your API key (ts_xxx). Save it.

Or sign up at promptthin.tech and get your key from the dashboard.

Step 2 — Register your LLM provider key

OpenAI

curl -X POST https://promptthin.tech/keys/openai
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-your-openai-key"}'

Anthropic

curl -X POST https://promptthin.tech/keys/anthropic
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-ant-your-anthropic-key"}'

Gemini

curl -X POST https://promptthin.tech/keys/gemini
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "AIza-your-gemini-key"}'

Groq

curl -X POST https://promptthin.tech/keys/groq
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "gsk_your-groq-key"}'

Your provider keys are encrypted with AES-256 and never appear in logs or responses.

Step 3 — Point your app at PromptThin

.env — two lines, no other changes needed

OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key

Done. Every LLM call now routes through PromptThin and savings start immediately.

Integration examples

OpenAI SDK — Python

from openai import OpenAI

client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", )

Identical to regular OpenAI usage

response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )

OpenAI SDK — JavaScript / TypeScript

import OpenAI from "openai";

const client = new OpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });

const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });

Anthropic SDK — Python

import anthropic

client = anthropic.Anthropic( base_url="https://promptthin.tech", api_key="ts_your_key", )

Anthropic SDK — JavaScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ baseURL: "https://promptthin.tech", apiKey: "ts_your_key", });

LangChain — Python

from langchain_openai import ChatOpenAI

llm = ChatOpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", model="gpt-4o", )

LangChain — JavaScript

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ configuration: { baseURL: "https://promptthin.tech/v1" }, apiKey: "ts_your_key", });

AutoGen — Python

config_list = [{ "model": "gpt-4o", "base_url": "https://promptthin.tech/v1", "api_key": "ts_your_key", }]

assistant = AssistantAgent( name="assistant", llm_config={"config_list": config_list}, )

CrewAI

.env — CrewAI reads from environment automatically

OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai";

const openai = createOpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });

const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", });

LiteLLM

import litellm

litellm.api_base = "https://promptthin.tech/v1" litellm.api_key = "ts_your_key"

response = litellm.completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], )

Cursor / Continue.dev / Open WebUI

In settings, set:

OpenAI API Base URL: https://promptthin.tech/v1
API Key: ts_your_key

Supported models

PromptThin infers the provider from the model name — no extra config needed:

Model prefix	Routes to
gpt-, o1-, o3-*	OpenAI
claude-*	Anthropic
gemini-*	Google Gemini
llama-, mixtral-, gemma-*	Groq

MCP server

PromptThin exposes an MCP server for agents that support the Model Context Protocol (Claude Desktop, Claude Code, Cursor, Cline, Windsurf, Continue.dev).

Add to claude_desktop_config.json:

{ "mcpServers": { "promptthin": { "url": "https://promptthin.tech/mcp", "headers": { "X-API-Key": "ts_your_key" } } } }

Available tools your agent can call:

Tool	What it does
get_usage_summary	Tokens saved, cache hit rate, cost saved
get_billing_status	Plan, requests remaining this month
flush_cache	Mark all cached responses as stale
get_recent_requests	Last N proxied requests with details

Per-request controls

Skip specific savings routes on individual requests by adding headers:

Skip cache for this request only

client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the latest news?"}], extra_headers={"X-Cache-Control": "no-cache"}, )

Header	Value	Effect
X-Cache-Control	no-cache	Skip cache lookup and storage
X-Prune-Control	no-prune	Skip context pruning
X-Compress-Control	no-compress	Skip prompt compression
X-Router-Control	no-route	Skip model routing

Check your savings

curl https://promptthin.tech/usage/summary
-H "X-API-Key: ts_your_key"

Or open the live dashboard — shows total tokens saved, tokens you would have used, tokens actually used, and cache hit rate.

Pricing

Plan	Price	Requests/month
Free	$0	500 req + 7-day unlimited trial
Pro	$4.99 first month, then $11.99/mo	10,000 req
Enterprise	Custom	100,000+ req, SLA, dedicated support

Security

Provider keys are encrypted with AES-256 before storage — never in logs or responses
All traffic is HTTPS only
Keys are stored in GCP Secret Manager, not in the application database
PromptThin never modifies response content — only compresses prompts and manages context

FAQ

**Do I need to change my code?**No. Set two environment variables and everything works automatically.

**Does PromptThin slow down my requests?**The semantic cache adds <2ms. Prompt compression and model routing add <1ms. Cache hits completely skip the LLM call, dramatically reducing latency.

**What if I want to use my own provider key instead of registering it?**Pass it directly in the Authorization header alongside your PromptThin key:

client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", default_headers={"Authorization": "Bearer sk-your-openai-key"}, )

PromptThin detects the provider key prefix and uses it directly.

**Can I use multiple providers?**Yes. Register keys for each provider. PromptThin routes to the right one based on the model name.

**Is this open source?**The integration documentation and examples are public. The proxy server is proprietary.

Contact

Website: promptthin.tech
Enterprise: [email protected]
Issues: Open an issue on this repository

Related Servers

Notion

Integrate with Notion workspaces to manage databases, pages, and content.

Mowen Note

An MCP server for interacting with the Mowen Note API, enabling note management and file uploads within MCP clients.

Lazy Toggl MCP

Simple unofficial MCP server to track time via Toggl API

Spotify MCP Server

Control Spotify with natural language. Enables search, playback control, queue management, and device control using conversational commands.

MS-365 MCP Server

A containerized MCP server for Microsoft 365, featuring OAuth authentication and OpenTelemetry instrumentation for monitoring.

DAISYS

Generate high-quality text-to-speech and text-to-voice outputs using the DAISYS platform.

Bitly MCP Server

Turn your AI assistant into a digital marketing hub that creates, organizes, and analyzes links and QR Codes on demand.

Squad AI

Your AI Product Manager. Surface insights, build roadmaps, and plan strategy with 30+ tools.

Humanizer PRO

Humanizer PRO is an MCP server that transforms AI-generated text into natural, human-sounding content. It provides 4 tools: - humanize_text: Rewrite AI text to bypass detectors like GPTZero, Turnitin, Originality.ai, Copyleaks, and ZeroGPT. Three modes: Stealth (highest bypass rate), Academic (Turnitin-optimized), SEO (marketing content). - scan_ai_detection: Analyze text for AI patterns. Returns AI probability score, human-likeness percentage, and verdict. - check_word_balance: Check remaining word credits and subscription plan details. - get_subscription_plans: Browse plans - Free (500 words), Starter ($9.99/mo, 30K words), Creator ($14.99/mo, 100K words), Pro Annual ($119.88/yr, 100K words/mo). Authentication: OAuth 2.0. Works with ChatGPT, Claude, Cursor, and all MCP-compatible clients.

Doc Lib MCP

An MCP server for document ingestion, chunking, semantic search, and note management.