BrowseAI Dev

Evidence-backed web research for AI agents. BM25+NLI claim verification, confidence scores, citations, contradiction detection. 12 MCP tools. Works with Claude Desktop, Cursor, Windsurf. Python SDK (pip install browseaidev), LangChain, CrewAI, LlamaIndex integrations. npx browseai-dev

BrowseAI Dev

npm PyPI LangChain License: MIT Discord

Research infrastructure for AI agents — real-time web search, evidence extraction, and structured citations. Every claim is backed by a URL. Every answer has a confidence score.

Agent → BrowseAI Dev → Internet → Verified answers + sources

Website · Playground · API Docs · Discord

Package names: npm: browseai-dev · PyPI: browseaidev · LangChain: langchain-browseaidev — Previously browse-ai and browseai. Old names still work and redirect automatically.


How It Works

search → fetch pages → neural rerank → extract claims → verify → cited answer (streamed)

Every answer goes through a multi-step verification pipeline. No hallucination. Every claim is backed by a real source.

Verification & Confidence Scoring

Confidence scores are evidence-based — not LLM self-assessed. After the LLM extracts claims and sources, a post-extraction verification engine checks every claim against the actual source page text:

  1. Atomic claim decomposition — Compound claims are auto-split into individual verifiable facts. "Tesla had $96B revenue and 1.8M deliveries" becomes two atomic claims, each verified independently.
  2. Neural re-ranking — Search results are re-scored by a cross-encoder model for semantic query-document relevance before page fetching. Then for each claim, BM25 finds the top-3 candidate sentences from source text. A DeBERTa-v3 NLI model reranks candidates by semantic entailment, picking the best supporting evidence — not just the best keyword match.
  3. Hybrid BM25 + NLI verification — Each claim is scored using BM25 lexical matching + NLI semantic entailment (30% BM25, 70% NLI). Catches paraphrased claims that keyword matching alone would miss, with contradiction penalties and paraphrase boosts.
  4. Multi-provider search — Parallel search across multiple providers for broader source diversity. More independent sources = stronger cross-reference = higher confidence.
  5. Domain authority scoring — 10,000+ domains across 5 tiers (institutional .gov/.edu → major news → tech journalism → community → low-quality), stored in Supabase with Majestic Million bulk import. Self-improving via Bayesian cold-start smoothing.
  6. Source quote verification — LLM-extracted quotes verified against actual page text using hybrid matching (exact substring → BM25 fallback).
  7. Cross-source consensus — Each claim verified against all available page texts. Claims supported by 3+ independent domains get "strong consensus". Single-source claims flagged as "weak".
  8. Contradiction detection — Claim pairs analyzed for semantic conflicts using topic overlap + NLI contradiction classification. Detected contradictions surfaced in the response and penalize confidence.
  9. Multi-pass consistency — In thorough mode, claims are cross-checked across independent extraction passes. Claims confirmed by both passes get boosted; inconsistent claims are penalized (SelfCheckGPT-inspired).
  10. Auto-calibrated confidence — 7-factor confidence formula auto-adjusts from user feedback using isotonic calibration curves. Predicted confidence aligns with actual accuracy over time. Factors: verification rate (25%), domain authority (20%), source count (15%), consensus (15%), domain diversity (10%), claim grounding (10%), citation depth (5%).

Claims include verified, verificationScore, consensusCount, consensusLevel, and optional nliScore fields. Sources include verified and authority. Detected contradictions (with optional nliConfidence) are returned at the top level. Agents can use these fields to make trust decisions programmatically.

NLI graceful fallback: When HF_API_KEY is not set, the system runs BM25-only verification — the same pipeline that shipped before NLI was added. No degradation, no errors. NLI is a transparent enhancement.

Depth Modes

Three depth levels control research thoroughness:

DepthBehaviorUse case
fast (default)Single search → extract → verify passQuick lookups, real-time agents
thoroughAuto-retries with rephrased query when confidence < 60%, multi-pass consistency checkingImportant research, fact-checking
deepPremium multi-step agentic research: iterative think-search-extract-evaluate cycles (up to 4 total steps). Gap analysis identifies missing info, generates follow-up queries. Claims/sources merged across steps with final re-verification. Target confidence: 0.85. Requires BAI key + sign-in. Falls back to thorough when quota exhausted.Complex research questions, comprehensive analysis
# Thorough mode
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "X-Tavily-Key: tvly-xxx" \
  -H "X-OpenRouter-Key: sk-or-xxx" \
  -d '{"query": "What is quantum computing?", "depth": "thorough"}'

# Deep mode (requires BAI key — uses premium features)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "Compare CRISPR approaches for sickle cell disease", "depth": "deep"}'

Deep mode runs iterative think-search-extract-evaluate cycles: each step performs gap analysis to identify what's missing, generates targeted follow-up queries, and merges claims/sources across steps with a final re-verification pass. It targets a confidence threshold of 0.85 (DEEP_CONFIDENCE_THRESHOLD) and runs up to 3 follow-up steps (MAX_FOLLOW_UP_STEPS, 4 total including the initial pass). Uses NLI reranking, multi-provider search, and multi-pass consistency. Each deep query costs 3x quota (100 deep queries/day). When quota is exhausted, deep mode gracefully falls back to thorough. Without a BAI key, deep mode also falls back to thorough.

Deep mode responses include reasoningSteps showing the multi-step research process (step number, query, gap analysis, claim count, confidence per step).

Streaming API

Get real-time progress with per-token answer streaming. The streaming endpoint sends Server-Sent Events (SSE) as each pipeline step completes. Deep mode steps are grouped by research pass for clean progress display:

curl -N -X POST https://browseai.dev/api/browse/answer/stream \
  -H "Content-Type: application/json" \
  -H "X-Tavily-Key: tvly-xxx" \
  -H "X-OpenRouter-Key: sk-or-xxx" \
  -d '{"query": "What is quantum computing?"}'

Events: trace (progress), sources (discovered early), token (streamed answer text), result (final answer), done.

Retry with Backoff

All external API calls (search providers, LLM, page fetching) automatically retry on transient failures (429 rate limits, 5xx server errors) with exponential backoff and jitter. Auth errors (401/403) fail immediately — no wasted retries.

Research Memory (Sessions)

Persistent research sessions that accumulate knowledge across multiple queries. Later queries automatically recall prior verified claims, building deeper understanding over time.

Sessions require a BrowseAI Dev API key (bai_xxx) for identity and ownership. BYOK users can use search/answer but cannot use sessions. Get a free key at browseai.dev/dashboard. For MCP, set BROWSE_API_KEY env var. For Python SDK, pass api_key="bai_xxx". For REST API, use Authorization: Bearer bai_xxx.

# Python SDK
session = client.session("quantum-research")
r1 = session.ask("What is quantum entanglement?")       # 13 claims stored
r2 = session.ask("How is entanglement used in computing?")  # 12 claims recalled!
knowledge = session.knowledge()  # Export all accumulated claims

# Share with other agents or humans
share = session.share()  # Returns shareId + URL
# Another agent forks and continues the research
forked = client.fork_session(share.share_id)
# REST API
curl -X POST https://browseai.dev/api/session \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"name": "my-research"}'
# Returns session ID, then:
curl -X POST https://browseai.dev/api/session/{id}/ask \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum entanglement?"}'

# Share a session publicly
curl -X POST https://browseai.dev/api/session/{id}/share \
  -H "Authorization: Bearer bai_xxx"

# Fork a shared session (copies all knowledge)
curl -X POST https://browseai.dev/api/session/share/{shareId}/fork \
  -H "Authorization: Bearer bai_xxx"

Each session response includes recalledClaims and newClaimsStored. Sessions can be shared publicly and forked by other agents — enabling collaborative, multi-agent research workflows.

Query Planning

Complex queries are automatically decomposed into focused sub-queries with intent labels (definition, evidence, comparison, counterargument, technical, historical). Each sub-query targets a different aspect of the question, maximizing source diversity. Simple factual queries skip planning entirely — no added latency.

Self-Improving Accuracy

The entire verification pipeline improves automatically with usage:

  • Domain authority — Bayesian cold-start smoothing adjusts domain trust scores as evidence accumulates. Static tier scores dominate initially, then real verification rates take over.
  • Adaptive BM25 thresholds — Claim verification thresholds tune per query type based on observed verification rates. Too strict? Loosens up. Too lenient? Tightens.
  • Consensus threshold tuning — Cross-source agreement thresholds adapt based on query type performance.
  • Confidence weight optimization — The 7-factor confidence formula rebalances weights per query type when user feedback indicates inaccuracy.
  • Page count optimization — Source fetch counts adjust based on confidence outcomes per query type.

Feedback Loop

Submit feedback on results to accelerate learning. Agents and users can rate results as good, bad, or wrong — this feeds directly into the adaptive threshold engine.

curl -X POST https://browseai.dev/api/browse/feedback \
  -H "Content-Type: application/json" \
  -d '{"resultId": "abc123", "rating": "good"}'
client.feedback(result_id="abc123", rating="good")
# Or flag a specific wrong claim:
client.feedback(result_id="abc123", rating="wrong", claim_index=2)

Quick Start

Python SDK

pip install browseaidev
from browseaidev import BrowseAIDev

client = BrowseAIDev(api_key="bai_xxx")

# Research with citations
result = client.ask("What is quantum computing?")
print(result.answer)
print(f"Confidence: {result.confidence:.0%}")
for source in result.sources:
    print(f"  - {source.title}: {source.url}")

# Thorough mode — auto-retries if confidence < 60%
thorough = client.ask("What is quantum computing?", depth="thorough")

# Deep mode — multi-step reasoning with gap analysis (requires BAI key)
deep = client.ask("Compare CRISPR approaches for sickle cell disease", depth="deep")
for step in deep.reasoning_steps or []:
    print(f"  Step {step.step}: {step.query} ({step.confidence:.0%})")

LangChain integration: (PyPI)

pip install langchain-browseaidev
from langchain_browseaidev import BrowseAIDevAnswerTool, BrowseAIDevSearchTool

# Use with any LangChain agent
tools = [
    BrowseAIDevAnswerTool(api_key="bai_xxx"),   # Verified search with citations
    BrowseAIDevSearchTool(api_key="bai_xxx"),    # Basic web search
]

# Standalone usage
tool = BrowseAIDevAnswerTool(api_key="bai_xxx")
result = tool.invoke({"query": "What is quantum computing?", "depth": "thorough"})

4 tools available: BrowseAIDevSearchTool, BrowseAIDevAnswerTool (verified), BrowseAIDevExtractTool, BrowseAIDevCompareTool.

MCP Server (Claude Desktop, Cursor, Windsurf)

npx browseai-dev setup

Or manually add to your MCP config:

{
  "mcpServers": {
    "browseai-dev": {
      "command": "npx",
      "args": ["-y", "browseai-dev"],
      "env": {
        "SERP_API_KEY": "your-search-key",
        "OPENROUTER_API_KEY": "your-llm-key",
        "BROWSE_API_KEY": "bai_xxx"
      }
    }
  }
}

BROWSE_API_KEY is optional for search/answer but required for Research Memory (sessions).

REST API

# With your own keys (BYOK — free, no limits)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "X-Tavily-Key: tvly-xxx" \
  -H "X-OpenRouter-Key: sk-or-xxx" \
  -d '{"query": "What is quantum computing?"}'

# Thorough mode (auto-retries if confidence < 60%)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "X-Tavily-Key: tvly-xxx" \
  -H "X-OpenRouter-Key: sk-or-xxx" \
  -d '{"query": "What is quantum computing?", "depth": "thorough"}'

# Deep mode (multi-step reasoning — requires BAI key)
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "Compare CRISPR approaches", "depth": "deep"}'

# With a BrowseAI Dev API key
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{"query": "What is quantum computing?"}'

Self-Host

git clone https://github.com/BrowseAI-HQ/BrowseAI-Dev.git
cd BrowseAI-Dev
pnpm install
cp .env.example .env
# Fill in: SERP_API_KEY, OPENROUTER_API_KEY
pnpm dev

API Keys

No account needed — MCP, Python SDK, and REST API all work with BYOK (bring your own keys) out of the box. No signup, no limits. Sign in for free to unlock premium verification features.

Four ways to authenticate:

MethodHowVerificationLimits
BrowseAI Dev API Key (Free)Authorization: Bearer bai_xxxFull premium — NLI, multi-provider, multi-pass consistencyGenerous quota with graceful BM25 fallback
BrowseAI Dev API Key (Pro)Authorization: Bearer bai_xxxFull premium — unlimited, no fallbackUnlimited + priority queue, managed keys, team seats
BYOK (MCP, SDK, API)X-Tavily-Key + X-OpenRouter-Key headersBM25 keyword verificationUnlimited, free (search/answer only — no sessions)
Demo (website)No auth neededBM25 keyword verification5 queries/hour per IP

Sign in at browseai.dev to create a free BAI key — it bundles your keys into one key and unlocks the premium verification pipeline (NLI semantic matching, multi-provider search, consistency checking) with a generous daily quota (100 premium queries/day, or ~33 deep queries/day at 3x cost each). When the quota is reached, queries gracefully fall back to BM25 keyword verification (or deep falls back to thorough) — still works, just basic matching. Quota resets every 24 hours. Pro removes all limits. BYOK works for all packages (MCP, Python SDK, REST API) without an account.

API responses include quota info when using a BAI key:

{
  "success": true,
  "result": { ... },
  "quota": { "used": 12, "limit": 100, "premiumActive": true }
}

Project Structure

/apps/api              Fastify API server (port 3001)
/apps/mcp              MCP server (stdio transport, npm: browseai-dev)
/packages/shared       Shared types, Zod schemas, constants
/packages/python-sdk   Python SDK (PyPI: browseaidev)
/src                   React frontend (Vite, port 8080)
/supabase              Database migrations

API Endpoints

EndpointDescription
POST /browse/searchSearch the web
POST /browse/openFetch and parse a page
POST /browse/extractExtract structured claims from a page
POST /browse/answerFull pipeline: search + extract + cite. depth: "fast", "thorough", or "deep"
POST /browse/answer/streamStreaming answer via SSE — real-time token streaming + progress events
POST /browse/compareCompare raw LLM vs evidence-backed answer
GET /browse/share/:idGet a shared result
GET /browse/statsTotal queries answered
GET /browse/sources/topTop cited source domains
GET /browse/analytics/summaryUsage analytics (authenticated)
POST /sessionCreate a research session
POST /session/:id/askResearch with session memory (recalls + stores claims)
POST /session/:id/recallQuery session knowledge without new search
GET /session/:id/knowledgeExport all session claims
POST /session/:id/shareShare a session publicly (returns shareId)
GET /session/share/:shareIdView a shared session (public, no auth)
POST /session/share/:shareId/forkFork a shared session into your account
GET /session/:idGet session details
GET /sessionsList your sessions (authenticated)
DELETE /session/:idDelete a session (authenticated)
POST /browse/feedbackSubmit feedback on a result (good/bad/wrong)
GET /browse/learning/statsSelf-learning engine stats
GET /user/statsYour query stats (authenticated)
GET /user/historyYour query history (authenticated)

MCP Tools

ToolDescription
browse_searchSearch the web for information on any topic
browse_openFetch and parse a web page into clean text
browse_extractExtract structured claims from a page
browse_answerFull pipeline: search + extract + cite. depth: "fast", "thorough", or "deep"
browse_compareCompare raw LLM vs evidence-backed answer
browse_session_createCreate a research session (persistent memory)
browse_session_askResearch within a session (recalls prior knowledge)
browse_session_recallQuery session knowledge without new web search
browse_session_shareShare a session publicly (returns share URL)
browse_session_knowledgeExport all claims from a session
browse_session_forkFork a shared session to continue the research
browse_feedbackSubmit feedback on a result to improve accuracy

Python SDK

MethodDescription
client.search(query)Search the web
client.open(url)Fetch and parse a page
client.extract(url, query=)Extract claims from a page
client.ask(query, depth=)Full pipeline with citations. depth: "fast", "thorough", or "deep"
client.compare(query)Raw LLM vs evidence-backed
client.session(name)Create a research session
session.ask(query, depth=)Research with memory recall
session.recall(query)Query session knowledge
session.knowledge()Export all session claims
session.share()Share session publicly (returns shareId + URL)
client.get_session(id)Resume an existing session by ID
client.list_sessions()List all your sessions
client.fork_session(share_id)Fork a shared session into your account
session.delete()Delete a session
client.feedback(result_id, rating)Submit feedback (good/bad/wrong) to improve accuracy

Async support: AsyncBrowseAIDev with the same API.

Enterprise Search Providers

Use BrowseAI Dev with your own data sources instead of — or alongside — public web search. Supports Elasticsearch, Confluence, and custom endpoints with optional zero data retention for compliance.

# Elasticsearch
result = client.ask("What is our refund policy?", search_provider={
    "type": "elasticsearch",
    "endpoint": "https://es.internal.company.com/kb/_search",
    "authHeader": "Bearer es-token-xxx",
    "index": "docs",
})

# Confluence
result = client.ask("PCI compliance process?", search_provider={
    "type": "confluence",
    "endpoint": "https://company.atlassian.net/wiki/rest/api",
    "authHeader": "Basic base64-creds",
    "spaceKey": "ENG",
})

# Zero data retention (nothing stored, cached, or logged)
result = client.ask("Patient protocols", search_provider={
    "type": "elasticsearch",
    "endpoint": "https://es.hipaa.company.com/medical/_search",
    "authHeader": "Bearer token",
    "dataRetention": "none",
})
# REST API — enterprise search
curl -X POST https://browseai.dev/api/browse/answer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer bai_xxx" \
  -d '{
    "query": "What is our refund policy?",
    "searchProvider": {
      "type": "elasticsearch",
      "endpoint": "https://es.internal.company.com/kb/_search",
      "authHeader": "Bearer es-token-xxx",
      "index": "docs"
    }
  }'

Response Structure

Every answer includes structured fields for programmatic trust decisions:

{
  "answer": "Quantum computing uses qubits...",
  "confidence": 0.82,
  "shareId": "abc123def456",
  "effectiveDepth": "thorough",
  "claims": [
    {
      "claim": "Qubits can exist in superposition",
      "sources": ["https://en.wikipedia.org/wiki/Qubit"],
      "verified": true,
      "verificationScore": 0.87,
      "consensusCount": 3,
      "consensusLevel": "strong",
      "nliScore": { "entailment": 0.92, "contradiction": 0.03, "neutral": 0.05, "label": "entailment" }
    }
  ],
  "sources": [
    {
      "url": "https://en.wikipedia.org/wiki/Qubit",
      "title": "Qubit - Wikipedia",
      "domain": "en.wikipedia.org",
      "quote": "A qubit is the basic unit of quantum information...",
      "verified": true,
      "authority": 0.70
    }
  ],
  "contradictions": [
    {
      "claimA": "Quantum computers are faster for all tasks",
      "claimB": "Quantum advantage only applies to specific problems",
      "topic": "quantum computing performance",
      "nliConfidence": 0.89
    }
  ],
  "reasoningSteps": [
    { "step": 1, "query": "quantum computing basics", "gapAnalysis": "Initial research pass", "claimCount": 8, "confidence": 0.65 },
    { "step": 2, "query": "quantum computing vs classical comparison", "gapAnalysis": "Missing classical vs quantum comparison", "claimCount": 14, "confidence": 0.82 }
  ],
  "trace": [
    { "step": "Search Web", "duration_ms": 423, "detail": "5 results" },
    { "step": "Fetch Pages", "duration_ms": 1205, "detail": "4 pages" }
  ],
  "quota": { "used": 12, "limit": 50, "premiumActive": true }
}

Key fields:

  • confidence — 7-factor evidence-based score (0-1), not LLM self-assessed
  • shareId — unique ID for sharing this result (use with /browse/share/:id)
  • effectiveDepth — actual depth used ("fast", "thorough", or "deep") — may differ from requested depth due to fallback
  • claims[].verified — whether the claim was verified against source text
  • claims[].consensusLevel"strong" (3+ sources), "moderate", or "weak"
  • claims[].nliScore — NLI semantic entailment breakdown (when HF_API_KEY set)
  • contradictions — detected conflicts between claims (with NLI confidence)
  • reasoningSteps — deep mode only: multi-step research iterations with gap analysis
  • trace — execution timeline for debugging and monitoring
  • quota — premium quota usage (BAI key users only): used, limit, premiumActive

Examples

See the examples/ directory for ready-to-run agent recipes:

Agent Recipes

ExampleDescription
research-agent.pySimple research agent with citations
deep-research-agent.pyMulti-step deep reasoning with gap analysis
streaming-agent.pyReal-time SSE streaming with progress events
contradiction-detector.pySurface contradictions across sources
enterprise-search.pyCustom data sources + zero retention mode
code-research-agent.pyResearch libraries/docs before writing code
hallucination-detector.pyCompare raw LLM vs evidence-backed answers
langchain-agent.pyBrowseAI Dev as a LangChain tool
crewai-research-team.pyMulti-agent research team with CrewAI
research-session.pyResearch sessions with persistent memory

Tutorials

TutorialWhat You'll Build
coding-agent/Agent that researches before writing code — never recommends deprecated libraries
support-agent/Agent that verifies answers before responding — escalates when confidence is low
content-agent/Agent that writes blog posts where every stat has a citation
fact-checker-bot/Discord bot that verifies any claim with !verify and !compare
is-this-true/Web app — paste any sentence, get a confidence score and sources
debate-settler/CLI tool — two claims battle it out, evidence decides the winner
docs-verifier/Verify every factual claim in your README or docs
podcast-prep/Research brief builder for podcast interviews

Environment Variables

VariableRequiredDescription
SERP_API_KEYYesWeb search API key (Tavily)
OPENROUTER_API_KEYYesLLM API key (OpenRouter)
KV_REST_API_URLNoVercel KV / Upstash Redis REST URL (falls back to in-memory cache)
KV_REST_API_TOKENNoVercel KV / Upstash Redis REST token
SUPABASE_URLNoSupabase project URL
SUPABASE_SERVICE_ROLE_KEYNoSupabase service role key
BRAVE_API_KEYNoBrave Search API key (adds source diversity)
HF_API_KEYNoHuggingFace API token (enables NLI semantic verification)
PORTNoAPI server port (default: 3001)

Tech Stack

  • API: Node.js, TypeScript, Fastify, Zod
  • Search: Multi-provider (parallel search across sources)
  • Parsing: @mozilla/readability + linkedom
  • AI: Gemini 2.5 Flash via OpenRouter
  • Caching: Redis or in-memory with intelligent TTL (time-sensitive queries get shorter TTL)
  • Frontend: React, Tailwind CSS, shadcn/ui, Framer Motion
  • Verification: Hybrid BM25 + NLI semantic entailment
  • MCP: @modelcontextprotocol/sdk
  • Python SDK: httpx, Pydantic
  • Database: Supabase (PostgreSQL)

Agent Skills

Pre-built skills that teach AI coding agents (Claude Code, Codex, Cursor, etc.) when and how to use BrowseAI Dev:

npx skills add BrowseAI-HQ/browseAIDev_Skills
SkillWhat it does
browse-researchEvidence-backed answers with citations and confidence
browse-fact-checkCompare raw LLM vs evidence-backed, verify claims
browse-extractStructured claim extraction from URLs
browse-sessionsMulti-query research with persistent knowledge

View all skills →

Community

Contributing

See CONTRIBUTING.md for setup instructions, coding conventions, and PR process.

License

MIT

Related Servers