Argus

Multi-provider search broker for AI agents. Routes across SearXNG, Brave, Serper, Tavily, and Exa with automatic fallback, RRF ranking, content extraction, and budget enforcement.

Argus

Python 3.11+ PyPI Version PyPI Downloads License: MIT CI MCP Registry Docker

Multi-provider web search broker for AI agents. Routes across SearXNG, DuckDuckGo, GitHub, Brave, Tavily, Exa, and more — using RRF fusion, content extraction, and budget-aware routing so you don't waste your free search credits.

Features at a glance:

  • Multi-provider search — 11 providers, one API, free-first tier routing
  • 5,000+ free queries/month — automatic budget tracking, exhausted providers skipped
  • Content extraction — 9-step fallback chain with quality gates (local + external)
  • Multi-turn sessions — pass session_id for conversational search refinement
  • 4 search modes — discovery, research, recovery, grounding
  • Dead URL recovery — first-class /recover-url endpoint with archive fallbacks
  • 4 integration paths — HTTP API, CLI, MCP server, Python SDK

Built for AI agent builders, RAG infra, and ops teams who don't want to hand-wire search APIs.

Contents

Quickstart

Mode 1: Local CLI (zero config)

pip install argus-search && argus search -q "python web frameworks"

That's it. DuckDuckGo handles the search — no accounts, no keys, no containers. You get unlimited free search from your laptop right now. Add API keys whenever you want more providers, or don't.

argus extract -u "https://example.com/article"       # extract clean text from any URL

Works on any machine with Python 3.11+ — laptop, Mac Mini, Raspberry Pi, cloud VM. Nothing to host.

For MCP (Claude Code, Cursor, VS Code):

pipx install argus-search[mcp] && argus mcp serve

Then add to your MCP config:

{"mcpServers": {"argus": {"command": "argus", "args": ["mcp", "serve"]}}}

Or install from the MCP Registry:

{
  "mcpServers": {
    "argus": {
      "registryType": "pypi",
      "identifier": "argus-search",
      "runtimeHint": "uvx"
    }
  }
}

One command to install, one JSON block to connect. No server to run, no keys to configure.

Mode 2: Full Stack Server

Got a Raspberry Pi running Pi-hole? A Mac Mini on your desk? An old laptop? That's enough to run the full stack — SearXNG (your own private search engine) plus local JS-rendering content extraction.

docker compose up -d    # SearXNG + Argus
What you haveWhat you get
Any machine with Python 3.11+DuckDuckGo + API providers (no server)
Raspberry Pi 4 / old laptop (4GB+)Everything — SearXNG, all providers, Crawl4AI
Mac Mini M1+ (8GB+)Full stack with headroom
Free cloud VM (1GB)SearXNG + search providers (skip Crawl4AI)

SearXNG takes 512MB of RAM and gives you a private Google-style search engine that nobody can rate-limit, block, or charge for. It runs alongside Pi-hole on hardware millions of people already own.

Providers

ProviderCredit typeFree capacitySetup
DuckDuckGoFree (scraped)UnlimitedNone
SearXNGFree (self-hosted)UnlimitedDocker
GitHubFree (API)UnlimitedNone (token for higher rate limit)
Brave SearchMonthly recurring2,000 queries/monthdashboard
TavilyMonthly recurring1,000 queries/monthsignup
ExaMonthly recurring1,000 queries/monthsignup
LinkupMonthly recurring1,000 queries/monthsignup
SerperOne-time signup2,500 creditssignup
Parallel AIOne-time signup4,000 creditssignup
You.comOne-time signup$20 creditplatform
ValyuOne-time signup$10 creditplatform

5,000 free queries/month from the four recurring providers. Three providers need no API key at all. Routing priority: Tier 0 (free: SearXNG, DuckDuckGo, GitHub) → Tier 1 (monthly: Brave, Tavily, Exa, Linkup) → Tier 2 (one-time: Serper, Parallel, You.com, Valyu, SearchAPI). Budget-exhausted providers are skipped automatically.

HTTP API

All endpoints prefixed with /api. OpenAPI docs at http://localhost:8000/docs.

# Search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "python web frameworks", "mode": "discovery", "max_results": 5}'

# Multi-turn search (conversational refinement)
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "what about async?", "session_id": "my-session"}'

# Extract content from a working URL
curl -X POST http://localhost:8000/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Recover a dead or moved URL
curl -X POST http://localhost:8000/api/recover-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/old-page", "title": "Example Article"}'

# Health & budgets
curl http://localhost:8000/api/health/detail
curl http://localhost:8000/api/budgets

Search modes

ModeUse forExample
discoveryRelated pages, canonical sources"Find the official docs for X"
researchBroad exploratory retrieval"Latest approaches to Y?"
recoveryFinding moved/dead content"This URL is 404"
groundingFact-checking with live sources"Verify this claim about Z"

Tier-based routing always applies first. Within each tier, the mode selects provider order.

Response format

{
  "query": "python web frameworks",
  "mode": "discovery",
  "results": [
    {"url": "https://fastapi.tiangolo.com", "title": "FastAPI", "snippet": "Modern Python web framework", "score": 0.942}
  ],
  "total_results": 1,
  "cached": false,
  "traces": [
    {"provider": "duckduckgo", "status": "success", "results_count": 5, "latency_ms": 312}
  ]
}

Each result includes url, title, snippet, domain, provider, and score. The traces array shows which providers were called and their outcomes.

Budgets

{
  "budgets": {
    "brave": {"remaining": 1847, "monthly_usage": 153, "usage_count": 153, "exhausted": false},
    "duckduckgo": {"remaining": 0, "monthly_usage": 0, "usage_count": 42, "exhausted": false}
  },
  "token_balances": {"jina": 9833638}
}

Each provider tracks usage per calendar month. When a provider hits its budget, Argus skips it and moves to the next tier. Free providers (SearXNG, DuckDuckGo, GitHub) have no limit. Set ARGUS_*_MONTHLY_BUDGET_USD to enforce custom limits per provider.

Integration

CLI

argus search -q "python web framework"              # zero-config, uses DuckDuckGo
argus search -q "python web framework" --mode research -n 20
argus search -q "fastapi" --session my-session       # multi-turn context
argus extract -u "https://example.com/article"       # extract clean text
argus extract -u "https://example.com/article" -d nytimes.com  # auth extraction
argus recover-url -u "https://dead.link" -t "Title"
argus health                                         # provider status
argus budgets                                        # budget + token balances
argus set-balance -s jina -b 9833638                 # track token balance
argus test-provider -p brave                         # smoke-test a provider
argus serve                                          # start API server
argus mcp serve                                      # start MCP server

All commands support --json for structured output.

How sessions work

Pass session_id to any search call. Argus stores each query and extracted URL in a SQLite-backed session. Reusing the same session_id gives the broker context from prior queries — follow-up searches are automatically refined using earlier conversation context. Sessions persist across restarts. Omit session_id for stateless, one-shot searches.

MCP

Add to your MCP client config:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve"]
    }
  }
}

Works with Claude Code, Cursor, VS Code, and any MCP-compatible client. For remote access via SSE:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve", "--transport", "sse", "--host", "127.0.0.1", "--port", "8001"]
    }
  }
}

Available tools: search_web, extract_content, recover_url, expand_links, search_health, search_budgets, test_provider, cookie_health, valyu_answer

Python

from argus.broker.router import create_broker
from argus.models import SearchQuery, SearchMode
from argus.extraction import extract_url

broker = create_broker()

response = await broker.search(
    SearchQuery(query="python web frameworks", mode=SearchMode.DISCOVERY, max_results=10)
)
for r in response.results:
    print(f"{r.title}: {r.url} (score: {r.score:.3f})")

content = await extract_url(response.results[0].url)
print(content.title)
print(content.text)

Content Extraction

Argus tries up to nine methods to extract content from any URL: first local (trafilatura, Crawl4AI, Playwright), then external APIs (Jina, Valyu Contents, Firecrawl, You.com, Wayback, archive.is). Each attempt is quality-checked for garbage output. See docs/providers.md for the full extractor comparison.

Extract gets the full text of a working URL. Recover-URL finds alternatives when a URL is dead, paywalled, or radically changed by querying archival sources (Wayback, archive.is) and running a question-guided extraction loop.

Architecture

Caller (CLI/HTTP/MCP/Python) → SearchBroker → tier-sorted providers → RRF ranking → response
                                     ↕ SessionStore (optional)
                            Extractor (on demand) → 9-step fallback chain with quality gates
ModuleResponsibility
argus/broker/Tier-based routing, ranking, dedup, caching, health, budgets
argus/providers/Provider adapters (one per search API)
argus/extraction/9-step URL extraction fallback chain with quality gates
argus/sessions/Multi-turn session store and query refinement
argus/api/FastAPI HTTP endpoints
argus/cli/Click CLI commands
argus/mcp/MCP server for LLM integration
argus/persistence/PostgreSQL query/result storage

Add new providers or extractors with a single adapter file. See CONTRIBUTING.md for the interface.

Configuration

All config via environment variables. See .env.example for the full list. Missing keys degrade gracefully — providers are skipped, not errors.

VariableDefaultDescription
ARGUS_SEARXNG_BASE_URLhttp://127.0.0.1:8080SearXNG endpoint
ARGUS_BRAVE_API_KEYBrave Search API key
ARGUS_SERPER_API_KEYSerper API key
ARGUS_TAVILY_API_KEYTavily API key
ARGUS_EXA_API_KEYExa API key
ARGUS_LINKUP_API_KEYLinkup API key
ARGUS_PARALLEL_API_KEYParallel AI API key
ARGUS_YOU_API_KEYYou.com API key
ARGUS_VALYU_API_KEYValyu API key (search, contents, answer)
ARGUS_FIRECRAWL_API_KEYFirecrawl API key (content extraction)
ARGUS_GITHUB_API_KEYGitHub token (higher rate limit)
ARGUS_*_MONTHLY_BUDGET_USD0 (unlimited)Query-count budget per provider
ARGUS_CRAWL4AI_ENABLEDfalseEnable Crawl4AI extraction step
ARGUS_YOU_CONTENTS_ENABLEDfalseEnable You.com Contents API extraction
ARGUS_CACHE_TTL_HOURS168Result cache TTL

FAQ

How is this different from calling Tavily/Serper directly? Argus calls them for you — plus 9 other providers. You get one ranked, deduplicated result set instead of managing multiple API keys and stitching results together. Free providers are tried first, so you only burn credits when needed.

Can I run only one provider? Yes. Set only the API key for the provider you want. All others are silently skipped. For zero-config, just install and go — DuckDuckGo handles everything with no keys.

Do I need Docker? No. pip install argus-search works immediately on any machine with Python 3.11+. Docker is only needed for SearXNG (self-hosted search) or Crawl4AI (local JS rendering).

License

MIT — see CHANGELOG.md for release history.

相关服务器

NotebookLM 网页导入器

一键将网页和 YouTube 视频导入 NotebookLM。超过 200,000 用户信赖。

安装 Chrome 扩展