PromptThin

Reduce LLM API costs through caching, compression, and smart routing. Zero code changes.

PromptThin is a transparent proxy that sits between your AI agents and LLM providers. Two environment variables and you're done — every API call gets four compounding savings routes applied automatically.

Your app ──→ PromptThin ──→ OpenAI / Anthropic / Gemini / Groq

Website Free tier 7-day trial

How much can I save?

Savings depend on your workload. The four routes compound — each one reduces what the next has to work with.

High cache hit rate (repeated or similar queries): up to 90%+ reduction
Long context agents (multi-turn, large prompts): 40–60% reduction from pruning + compression
Mixed workloads (some unique, some repeated): typically 20–40% reduction

Check your actual savings anytime from the dashboard or API:

curl https://promptthin.tech/usage/summary -H "X-API-Key: ts_your_key"

Four savings routes

Route	What it does	Saving
Semantic Cache	Returns cached answers for similar questions — even if worded differently	Up to 100% on repeated queries
Prompt Compression	Compresses verbose prompts with LLMLingua 2 before sending	Up to 50% on input tokens
Model Router	Automatically routes simple tasks to cheaper models in <1ms	Up to 90% per request
Context Pruning	Summarises long conversation history when it exceeds 8K tokens	Up to 60% on long threads

All four routes run on every request. You control which to enable or skip per-request.

Get started in 2 minutes

Step 1 — Create a free account

curl -X POST https://promptthin.tech/auth/register
-H "Content-Type: application/json"
-d '{"email": "[email protected]", "password": "yourpassword"}'

Returns your API key (ts_xxx). Save it.

Or sign up at promptthin.tech and get your key from the dashboard.

Step 2 — Register your LLM provider key

OpenAI

curl -X POST https://promptthin.tech/keys/openai
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-your-openai-key"}'

Anthropic

curl -X POST https://promptthin.tech/keys/anthropic
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "sk-ant-your-anthropic-key"}'

Gemini

curl -X POST https://promptthin.tech/keys/gemini
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "AIza-your-gemini-key"}'

Groq

curl -X POST https://promptthin.tech/keys/groq
-H "X-API-Key: ts_your_key"
-H "Content-Type: application/json"
-d '{"key": "gsk_your-groq-key"}'

Your provider keys are encrypted with AES-256 and never appear in logs or responses.

Step 3 — Point your app at PromptThin

.env — two lines, no other changes needed

OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key

Done. Every LLM call now routes through PromptThin and savings start immediately.

Integration examples

OpenAI SDK — Python

from openai import OpenAI

client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", )

Identical to regular OpenAI usage

response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] )

OpenAI SDK — JavaScript / TypeScript

import OpenAI from "openai";

const client = new OpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });

const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });

Anthropic SDK — Python

import anthropic

client = anthropic.Anthropic( base_url="https://promptthin.tech", api_key="ts_your_key", )

Anthropic SDK — JavaScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ baseURL: "https://promptthin.tech", apiKey: "ts_your_key", });

LangChain — Python

from langchain_openai import ChatOpenAI

llm = ChatOpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", model="gpt-4o", )

LangChain — JavaScript

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({ configuration: { baseURL: "https://promptthin.tech/v1" }, apiKey: "ts_your_key", });

AutoGen — Python

config_list = [{ "model": "gpt-4o", "base_url": "https://promptthin.tech/v1", "api_key": "ts_your_key", }]

assistant = AssistantAgent( name="assistant", llm_config={"config_list": config_list}, )

CrewAI

.env — CrewAI reads from environment automatically

OPENAI_BASE_URL=https://promptthin.tech/v1 OPENAI_API_KEY=ts_your_key

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai";

const openai = createOpenAI({ baseURL: "https://promptthin.tech/v1", apiKey: "ts_your_key", });

const { text } = await generateText({ model: openai("gpt-4o"), prompt: "Hello!", });

LiteLLM

import litellm

litellm.api_base = "https://promptthin.tech/v1" litellm.api_key = "ts_your_key"

response = litellm.completion( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], )

Cursor / Continue.dev / Open WebUI

In settings, set:

OpenAI API Base URL: https://promptthin.tech/v1
API Key: ts_your_key

Supported models

PromptThin infers the provider from the model name — no extra config needed:

Model prefix	Routes to
gpt-, o1-, o3-*	OpenAI
claude-*	Anthropic
gemini-*	Google Gemini
llama-, mixtral-, gemma-*	Groq

MCP server

PromptThin exposes an MCP server for agents that support the Model Context Protocol (Claude Desktop, Claude Code, Cursor, Cline, Windsurf, Continue.dev).

Add to claude_desktop_config.json:

{ "mcpServers": { "promptthin": { "url": "https://promptthin.tech/mcp", "headers": { "X-API-Key": "ts_your_key" } } } }

Available tools your agent can call:

Tool	What it does
get_usage_summary	Tokens saved, cache hit rate, cost saved
get_billing_status	Plan, requests remaining this month
flush_cache	Mark all cached responses as stale
get_recent_requests	Last N proxied requests with details

Per-request controls

Skip specific savings routes on individual requests by adding headers:

Skip cache for this request only

client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What's the latest news?"}], extra_headers={"X-Cache-Control": "no-cache"}, )

Header	Value	Effect
X-Cache-Control	no-cache	Skip cache lookup and storage
X-Prune-Control	no-prune	Skip context pruning
X-Compress-Control	no-compress	Skip prompt compression
X-Router-Control	no-route	Skip model routing

Check your savings

curl https://promptthin.tech/usage/summary
-H "X-API-Key: ts_your_key"

Or open the live dashboard — shows total tokens saved, tokens you would have used, tokens actually used, and cache hit rate.

Pricing

Plan	Price	Requests/month
Free	$0	500 req + 7-day unlimited trial
Pro	$4.99 first month, then $11.99/mo	10,000 req
Enterprise	Custom	100,000+ req, SLA, dedicated support

Security

Provider keys are encrypted with AES-256 before storage — never in logs or responses
All traffic is HTTPS only
Keys are stored in GCP Secret Manager, not in the application database
PromptThin never modifies response content — only compresses prompts and manages context

FAQ

**Do I need to change my code?**No. Set two environment variables and everything works automatically.

**Does PromptThin slow down my requests?**The semantic cache adds <2ms. Prompt compression and model routing add <1ms. Cache hits completely skip the LLM call, dramatically reducing latency.

**What if I want to use my own provider key instead of registering it?**Pass it directly in the Authorization header alongside your PromptThin key:

client = OpenAI( base_url="https://promptthin.tech/v1", api_key="ts_your_key", default_headers={"Authorization": "Bearer sk-your-openai-key"}, )

PromptThin detects the provider key prefix and uses it directly.

**Can I use multiple providers?**Yes. Register keys for each provider. PromptThin routes to the right one based on the model name.

**Is this open source?**The integration documentation and examples are public. The proxy server is proprietary.

Contact

Website: promptthin.tech
Enterprise: [email protected]
Issues: Open an issue on this repository

PromptThin

PromptThin

How much can I save?

Four savings routes

Get started in 2 minutes

Step 1 — Create a free account

Step 2 — Register your LLM provider key

OpenAI

Anthropic

Gemini

Groq

Step 3 — Point your app at PromptThin

.env — two lines, no other changes needed

Integration examples

OpenAI SDK — Python

Identical to regular OpenAI usage

OpenAI SDK — JavaScript / TypeScript

Anthropic SDK — Python

Anthropic SDK — JavaScript

LangChain — Python

LangChain — JavaScript

AutoGen — Python

CrewAI

.env — CrewAI reads from environment automatically

Vercel AI SDK

LiteLLM

Cursor / Continue.dev / Open WebUI

Supported models

MCP server

Per-request controls

Skip cache for this request only

Check your savings

Pricing

Security

FAQ

Contact

相關伺服器

Xero

writefreely-mcp-server

company-mcp

Cycles MCP Server

Dashform

Backlog Manager

Anki MCP Server

Wishfinity

docx-mcp

Screen View