Argus

Multi-provider search broker for AI agents. Routes across SearXNG, Brave, Serper, Tavily, and Exa with automatic fallback, RRF ranking, content extraction, and budget enforcement.

Argus

Python 3.11+ License: MIT PyPI CI MCP Server

Search companies give you free web searches — thousands per month across every major provider, and two of them are unlimited with no API key at all. Argus puts them all in one place and automatically picks the right one for each query so you don't waste credits.

Two Ways to Use Argus

1. No server, no Docker, no API keys

pip install argus-search
argus search -q "python web frameworks"

That's it. DuckDuckGo handles the search — no accounts, no keys, no containers. You get unlimited free search from your laptop right now. Add API keys whenever you want more providers, or don't.

Works on any machine with Python 3.11+: your laptop, a Mac Mini, a Raspberry Pi, a cloud VM. Nothing to host.

2. Full install on hardware you already have

Got a Raspberry Pi running Pi-hole? A Mac Mini on your desk? An old laptop? That's enough to run the full stack — SearXNG (your own private search engine) plus local JS-rendering content extraction that doesn't depend on anyone's API.

docker compose up -d    # SearXNG + Argus

What runs where:

What you haveWhat you getHow
Raspberry Pi 3 (1GB, probably running Pi-hole)SearXNG + search via all providersPi-hole uses ~100MB, SearXNG needs ~512MB (confirmed by maintainers). They fit together.
Raspberry Pi 4 (4GB)Everything — SearXNG, all providers, Crawl4AISame as above plus local JS-rendering extraction. Crawl4AI basic mode runs on Pi 4 (per their docs).
Mac Mini M1+ (8GB+)Everything, plus headroomFull stack with room for other services. Runs alongside whatever else is on there.
Any old laptop (4GB+)EverythingSame as Pi 4. Docker + Python = full Argus.
Free cloud VM (1GB, e.g. OCI/AWS free tier)SearXNG + search providersEnough for search. Skip Crawl4AI — external APIs (Jina, You.com, Wayback) handle extraction.
No machine at allDuckDuckGo + API providerspip install argus-search on your laptop. No server needed.

SearXNG is the most useful thing you're not running. It takes 512MB of RAM and gives you a private Google-style search engine that nobody can rate-limit, block, or charge for. It's the cheapest infrastructure upgrade you can make — it runs alongside Pi-hole on hardware millions of people already own.

Why Argus

You don't need one search API. You need all of them — and you need them free.

ProviderCredit typeFree capacitySetup
DuckDuckGoFree (scraped)UnlimitedNone
SearXNGFree (self-hosted)UnlimitedDocker
Brave SearchMonthly recurring2,000 queries/monthdashboard
TavilyMonthly recurring1,000 queries/monthsignup
ExaMonthly recurring1,000 queries/monthsignup
LinkupMonthly recurring1,000 queries/monthsignup
SerperOne-time signup2,500 creditssignup
Parallel AIOne-time signup4,000 creditssignup
You.comOne-time signup$20 creditplatform

5,000 free queries per month from the four recurring providers. Two providers need no API key at all (unlimited). Three more give you ~6,500 one-time credits for signing up. Argus routes to free providers first, monthly recurring next, one-time credits last. Budget-exhausted providers are skipped until they reset. When credits refresh, they come back online automatically.

What It Does

You ask Argus a question. It checks the free providers first — if DuckDuckGo or SearXNG returns enough good results, it stops there. No credits touched. If you need more, it moves on to the monthly-credit providers (Brave, Tavily, Exa, Linkup), and only reaches for the one-time signup credits as a last resort. When a provider runs out of budget, Argus skips it and tries the next one. You get back one ranked, deduplicated list of results — no idea which provider(s) actually answered, unless you look at the traces.

Content extraction — Found a result but need the full text? Argus tries up to eight different ways to get it: first the fast local methods (trafilatura, Crawl4AI, Playwright), then external APIs if those fail (Jina, You.com, Wayback Machine, archive.is). It checks each attempt for garbage output — paywall stubs, blank pages, error messages — and moves on if the quality isn't good enough.

Multi-turn sessions — Searching for something and want to refine? Pass a session_id and Argus remembers what you asked before. Your follow-up searches get context from prior queries automatically. Sessions persist to SQLite so they survive restarts.

Content Extractors

ExtractorNeeds hosting?CostNotes
trafilaturaNo (pure Python)FreePrimary, fast
Crawl4AIYes (4GB RAM min)FreeJS rendering, optional dep
PlaywrightYes (512MB per instance)FreeHeadless browser fallback
Jina ReaderNo (external API)Token-basedWorks without any server
You.com ContentsNo (external API)$1/1k pagesWorks without any server
Wayback MachineNo (external)FreeDead page recovery
archive.isNo (external)FreeDead page recovery

The first three require a local machine. The last four are external APIs that work in any deployment — including serverless.

How Routing Works

Argus picks which provider to ask based on two things: how much the provider costs (tier) and how well it handles that type of query (mode).

┌──────────────────────────────────────────────────────┐
│  Tier 0: FREE (SearXNG, DuckDuckGo)                  │  ← always first, unlimited
├──────────────────────────────────────────────────────┤
│  Tier 1: MONTHLY RECURRING                           │
│    Brave · Tavily · Exa · Linkup                     │  ← 5,000 free queries/mo
├──────────────────────────────────────────────────────┤
│  Tier 3: ONE-TIME CREDITS                            │
│    Serper · Parallel · You.com · SearchAPI           │  ← budget-enforced, last resort
└──────────────────────────────────────────────────────┘

The idea is simple: burn through the free stuff first. If that's not enough, dip into the monthly credits. Save the one-time signup credits for when you really need them. When a provider runs out, it gets skipped — and it comes back automatically when its budget resets.

Search Modes

Not all search providers are equally good at everything. Discovery mode favors Exa and Brave for finding related pages. Research mode leads with Tavily for broad retrieval. Recovery mode prioritizes Brave and Tavily for finding moved content. Tier sorting always applies first — within each tier, the mode picks who goes first.

ModeWhen to useRuntime order
discoveryRelated pages, canonical sourcesSearXNG → DuckDuckGo → Brave → Exa → Tavily → Linkup → Serper → Parallel → You
recoveryDead/moved URL recoverySearXNG → DuckDuckGo → Brave → Tavily → Exa → Linkup → Serper → Parallel → You
groundingFew live sources for fact-checkingSearXNG → DuckDuckGo → Brave → Linkup → Serper → Parallel → You
researchBroad exploratory retrievalSearXNG → DuckDuckGo → Tavily → Exa → Brave → Linkup → Serper → Parallel → You

Free providers (SearXNG, DuckDuckGo) always lead. Within each tier, the order reflects which provider handles that query type best.

Integration

HTTP API

All endpoints prefixed with /api. OpenAPI docs at http://localhost:8000/docs.

# Search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "python web frameworks", "mode": "discovery", "max_results": 5}'

# Multi-turn search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "what about async?", "session_id": "my-session"}'

# Extract content
curl -X POST http://localhost:8000/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Recover a dead URL
curl -X POST http://localhost:8000/api/recover-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/old-page", "title": "Example Article"}'

# Health & budgets
curl http://localhost:8000/api/health/detail
curl http://localhost:8000/api/budgets

CLI

argus search -q "python web framework"              # zero-config, uses DuckDuckGo
argus search -q "python web framework" --mode research -n 20
argus search -q "fastapi" --session my-session       # multi-turn context
argus extract -u "https://example.com/article"       # extract clean text
argus extract -u "https://example.com/article" -d nytimes.com  # auth extraction
argus recover-url -u "https://dead.link" -t "Title"
argus health                                         # provider status
argus budgets                                        # budget + token balances
argus set-balance -s jina -b 9833638                 # track token balance
argus test-provider -p brave                         # smoke-test a provider
argus serve                                          # start API server
argus mcp serve                                      # start MCP server

All commands support --json for structured output.

MCP

Add to your MCP client config:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve"]
    }
  }
}

Works with Claude Code, Cursor, VS Code, and any MCP-compatible client. For remote access via SSE:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve", "--transport", "sse", "--host", "127.0.0.1", "--port", "8001"]
    }
  }
}

Available tools: search_web, extract_content, recover_url, expand_links, search_health, search_budgets, test_provider, cookie_health

Python

from argus.broker.router import create_broker
from argus.models import SearchQuery, SearchMode
from argus.extraction import extract_url

broker = create_broker()

response = await broker.search(
    SearchQuery(query="python web frameworks", mode=SearchMode.DISCOVERY, max_results=10)
)
for r in response.results:
    print(f"{r.title}: {r.url} (score: {r.score:.3f})")

content = await extract_url(response.results[0].url)
print(content.title)
print(content.text)

Architecture

Caller (CLI / HTTP / MCP / Python)
  → SearchBroker
    → routing policy (tier-sorted, mode-specific within tiers)
      → provider executor (budget check → health check → search → early stop)
    → result pipeline (cache → dedupe → RRF ranking → response)
  → SessionStore (optional, per-request)
    → query refinement from prior context
  → Extractor (on demand)
    → SSRF → cache → rate limit → auth → QG →
      trafilatura → QG → crawl4ai → QG → playwright → QG →
      jina → QG → you_contents → QG → wayback → QG →
      archive.is → QG → return best
ModuleResponsibility
argus/broker/Tier-based routing, ranking, dedup, caching, health, budgets
argus/providers/Provider adapters (one per search API)
argus/extraction/8-step URL extraction fallback chain with quality gates
argus/sessions/Multi-turn session store and query refinement
argus/api/FastAPI HTTP endpoints
argus/cli/Click CLI commands
argus/mcp/MCP server for LLM integration
argus/persistence/PostgreSQL query/result storage

Configuration

All config via environment variables. See .env.example for the full list. Missing keys degrade gracefully — providers are skipped, not errors.

VariableDefaultDescription
ARGUS_SEARXNG_BASE_URLhttp://127.0.0.1:8080SearXNG endpoint
ARGUS_BRAVE_API_KEYBrave Search API key
ARGUS_SERPER_API_KEYSerper API key
ARGUS_TAVILY_API_KEYTavily API key
ARGUS_EXA_API_KEYExa API key
ARGUS_LINKUP_API_KEYLinkup API key
ARGUS_PARALLEL_API_KEYParallel AI API key
ARGUS_YOU_API_KEYYou.com API key
ARGUS_*_MONTHLY_BUDGET_USD0 (unlimited)Query-count budget per provider
ARGUS_CRAWL4AI_ENABLEDfalseEnable Crawl4AI extraction step
ARGUS_YOU_CONTENTS_ENABLEDfalseEnable You.com Contents API extraction
ARGUS_CACHE_TTL_HOURS168Result cache TTL

License

MIT

Related Servers

NotebookLM Web Importer

Import web pages and YouTube videos to NotebookLM with one click. Trusted by 200,000+ users.

Install Chrome Extension