MCP Research Friend
Research tools, including a Sqlite-backed document stash
Research Friend
A friendly helper for AI assistants that need to look things up on the web and manage a local research stash.
Research Friend is an MCP server that gives your AI tools the ability to fetch web pages and search the internet. It uses a real web browser behind the scenes, so it works even with modern websites that rely heavily on JavaScript. It also includes a local “stash” for storing documents, extracting text, and searching across your library.
To make use of all its features, you'll want an MCP client that supports prompts (common) and sampling (less common). We're building Research Friend alongside Chabeau, which supports both.
What can it do?
- Fetch web pages with a real browser (including JS-heavy sites)
- Fetch PDFs and extract their text content
- Search the web via DuckDuckGo or Google
- Maintain a local stash of documents for search, listing, and extraction
Getting started
You'll need Node.js version 20 or newer installed on your computer.
1. Install dependencies
Open a terminal in this folder and run:
npm install
2. Install browser support
Research Friend uses Playwright to control a web browser. After installing dependencies, you'll need to install the browser:
npx playwright install chromium
This downloads a copy of Chromium that Playwright will use. It's separate from any browsers you already have installed.
3. Start the server
node src/index.js
The server communicates over stdio (standard input/output), which is how MCP clients connect to it.
Adding to your MCP client
How you add Research Friend depends on which MCP client you're using. Here's a general example of what the configuration might look like:
[[mcp_servers]]
id = "research-friend"
command = "node"
args = ["/path/to/mcp-research-friend/src"]
transport = "stdio"
Replace /path/to/mcp-research-friend with the actual path to this folder on your computer.
Tools
Web tools
friendly_web_fetch
Fetches a web page and returns its content. By default, returns markdown with links preserved — ideal for LLMs. Uses Readability to extract the main content (stripping navigation, ads, etc.). For PDFs, pagination, or searching within content, use friendly_web_extract instead.
Parameters:
url(required) - The web address to fetchoutputFormat- Output format:markdown(default),text, orhtmlwaitMs- Extra time to wait after the page loads, in case content appears slowlytimeoutMs- How long to wait before giving up (default: 15 seconds)maxChars- Maximum amount of content to return (default: 40,000 characters)includeHtml- Set totrueto also return the raw HTML alongside the contentheadless- Set tofalseto see the browser window (useful for debugging)
Returns:
url- The URL that was requestedfinalUrl- The URL after any redirectstitle- The page titlecontent- The extracted content (in the requested format)html- Raw HTML (only ifincludeHtmlis true)meta- Page metadata (description, author, published time, etc.)fetchedAt- ISO timestamp of when the page was fetchedtruncated- Whether the content was truncated to fitmaxChars
friendly_search
Searches the web and returns a list of results.
Parameters:
query(required) - What to search forengine- Which search engine to use (duckduckgoorgoogle)maxResults- How many results to return (default: 10, maximum: 50)timeoutMs- How long to wait before giving up (default: 15 seconds)headless- Set tofalseto see the browser window
Returns:
query- The search query that was usedengine- Which search engine was usedresults- Array of results, each withtitle,url, andsnippetsearchedAt- ISO timestamp of when the search was performedfallback_result_html- Raw HTML of the page (only included if no results were found)debug_info- Diagnostic information about the search attempt
CAPTCHA handling:
If a CAPTCHA is detected while running in headless mode, the tool automatically retries with a visible browser window. This gives you a chance to solve the CAPTCHA manually. The debug_info.retried field indicates whether this fallback was used.
friendly_web_extract
Extracts content from a URL. Auto-detects whether the URL points to a PDF or a web page and handles each appropriately.
Parameters:
url(required) - The URL to fetch (PDF or web page)maxChars- Maximum amount of text to return (default: 40,000 characters)offset- Character position to start from (default: 0). Use this to paginate through large content.search- Search for a phrase and return matches with surrounding context instead of full contentcontextChars- Characters of context around each search match (default: 200)waitMs- Extra time to wait after page load for dynamic content (web pages only)timeoutMs- How long to wait before giving up (default: 15 seconds, web pages only)headless- Set tofalseto see the browser window (web pages only)
Returns (normal mode):
url- The URL that was requestedcontentType- Eitherpdforhtmltitle- The page/document titleauthor- The PDF author (PDFs only, if available)creationDate- When the PDF was created (PDFs only, if available)pageCount- Number of pages (PDFs only)totalChars- Total characters (use withoffsetto paginate)offset- The offset that was usedcontent- The extracted text contentfetchedAt- ISO timestamptruncated- Whether more content remains after this chunk
Returns (search mode):
url,contentType,title,totalChars,fetchedAt- Same as abovesearch- The search phrase that was usedmatchCount- Number of matches foundmatches- Array of matches, each withposition,context,prefix, andsuffix
friendly_web_ask
Fetches a URL (PDF or web page) and has an LLM answer questions about it. Auto-detects content type. The document is processed in a separate context, keeping your main conversation compact.
Parameters:
url(required) - The URL to fetch (PDF or web page)ask(required) - Question or instruction for the LLM (summarize, extract info, answer questions, etc.)askMaxInputTokens- Maximum input tokens per LLM call (default: 150,000)askMaxOutputTokens- Maximum output tokens per LLM call (default: 4,096)askTimeout- Timeout in milliseconds (default: 300,000 = 5 minutes)askSplitAndSynthesize- For large documents: split into chunks, process each, then synthesize results (default: false). Warning: consumes many tokens.waitMs- Extra time to wait after page load for dynamic content (web pages only)timeoutMs- How long to wait before giving up (default: 15 seconds, web pages only)headless- Set tofalseto see the browser window (web pages only)
Returns:
url- The URL that was requestedcontentType- Eitherpdforhtmltitle- The page/document titletotalChars- Total characters in the documentask- The instruction that was givenanswer- The LLM's responsemodel- The model that generated the responsechunksProcessed- Number of chunks processed (1 for small documents, more when usingaskSplitAndSynthesize)fetchedAt- ISO timestamp
Ask mode uses MCP sampling to have an LLM process the document with any instruction. This is useful for:
- Large documents that would overwhelm context
- Keeping token costs down on the main conversation
When askSplitAndSynthesize is enabled, documents exceeding askMaxInputTokens are automatically split into overlapping chunks. Each chunk is processed separately, and the results are synthesized into a single coherent answer. The final response is provided in the same language as your request, regardless of the document's language.
Document stash
The stash is a local, searchable library of documents. It supports PDFs, HTML files, and plaintext (Markdown/TXT). When you add a document, Research Friend stores the original file, extracts text (for PDFs/HTML), and saves metadata in a local database. Searches use ripgrep under the hood for fast, phrase-aware matching.
Stash location
The stash lives under ~/.research-friend/:
inbox/- Drop files here to be processedstore/- Organized document storage and extracted textstash.db- Metadata database
Supported file types
- PDF:
.pdf(text extracted) - HTML:
.html,.htm(text extracted) - Markdown:
.md,.markdown(stored as plaintext) - Text:
.txt(stored as plaintext)
Stash tools
stash_open_inbox
Open the stash inbox folder in your file manager for easier drag-and-drop.
Returns:
opened- Whether the folder open request was sentinboxPath- Absolute path to the inboxcommand- OS command usedargs- Command arguments used
stash_process_inbox
Process files in inbox/, classify them into topics, extract text, and store results.
For long documents, classification uses sampled sections (start/middle/end plus a few random chunks) to improve topic accuracy.
Returns:
processed- Array of filenames successfully processederrors- Any errors encountereddocuments- Array of created document records
reindex_stash
Regenerate summaries, re-allocate topics, and update store metadata for stashed documents. If ids is omitted or empty, all documents are reindexed.
Parameters:
ids- Document IDs to reindex (optional)
Returns:
reindexed- Document IDs reindexederrors- Any errors encountereddocuments- Array of updated document records
stash_list
List documents in the stash.
Parameters:
topic- Filter to a topic (optional)limit- Max results (default: 50)offset- Pagination offset (default: 0)
Returns:
type-allortopictotalDocuments- Total documents (only whentypeisall)count- Results returned after paginationoffset- Pagination offset usedlimit- Pagination limit usedtopics- Summary of known topics and doc countsdocuments- Document list with metadata (includesisPrimarywhen listing a topic)
stash_search
Search filenames and content across the stash. All search terms must be present (AND logic). Filename matches are listed first. Use quotes for exact phrases.
Parameters:
query(required) - Search terms. Use quotes for phrases:"sparkling wine"topic- Filter to a topic (optional)ids- Filter to specific document IDs (optional)limit- Max documents to return (default: 20)offset- Pagination offset (default: 0)maxMatchesPerDoc- Max matches per document (default: 50)context- Lines of context around each match (default: 1, max: 5). Controls both how close terms must appear to match AND how much surrounding text is returned.
Returns:
totalMatches- Total documents matched before paginationcount- Results returned after paginationresults- Array of documents, each with:id,filename,fileType,summary,charCount,createdAtmatchType-filename,content, orfilename+contentmatches- Array of{ line, context }for each match location
Use the line values with stash_extract to jump directly to match locations.
stash_extract
Extract content from a stashed document for reading. Use line numbers from stash_search results to jump directly to matches.
Parameters:
id(required) - Document ID fromstash_list/stash_searchmaxChars- Maximum amount of text to return (default: 40,000 characters)offset- Character position to start from (mutually exclusive withline)line- Line number to start from (mutually exclusive withoffset)
Returns:
id,filename,fileType,summary- Document metadatatotalChars- Total characters in the documentoffset- Character offset (included when usinglinefor reference)line- Line number (only whenlineparameter was used)content- The extracted text contenttruncated- Whether more content remains after this chunk
stash_ask
Have an LLM answer questions about a stashed document. The document is processed in a separate context, keeping your main conversation compact.
Parameters:
id(required) - Document ID fromstash_list/stash_searchask(required) - Question or instruction for the LLMaskMaxInputTokens- Maximum input tokens per LLM call (default: 150,000)askMaxOutputTokens- Maximum output tokens per LLM call (default: 4,096)askTimeout- Timeout in milliseconds (default: 300,000 = 5 minutes)askSplitAndSynthesize- For large documents: split into chunks, process each, then synthesize results (default: false)
Returns:
id,filename,fileType,summary- Document metadatatotalChars- Total characters in the documentask- The instruction that was givenanswer- The LLM's responsemodel- The model that generated the responsechunksProcessed- Number of chunks processed
Typical flow
- Drop files into
~/.research-friend/inbox/ - Run
stash_process_inbox - Use
stash_listto browse topics - Use
stash_searchto find relevant docs - Use
stash_extractto read a specific doc, orstash_askto ask questions about it
Troubleshooting
"Browser closed unexpectedly" or similar errors
Try reinstalling the browser:
npx playwright install chromium --force
On Linux, you might also need system dependencies:
npx playwright install-deps chromium
The server won't start
Make sure you're using Node.js 20 or newer:
node --version
If your version is older, visit nodejs.org to download a newer one.
License
MIT
Related Servers
Custom Elasticsearch
A simple MCP server for Elasticsearch, designed for cloud environments where your public key is already authorized.
Search1API
One API for Search, Crawling, and Sitemaps
Local RAG
Performs a local RAG search on your query using live web search for context extraction.
Academia MCP
Search for scientific publications across ArXiv, ACL Anthology, HuggingFace Datasets, and Semantic Scholar.
Perplexity Search
Access the Perplexity search API for real-time information and answers.
Facebook Ads Library
Get any answer from the Facebook Ads Library, conduct deep research including messaging, creative testing and comparisons in seconds.
RocketReach
Find emails, phone numbers, and enrich company data using the RocketReach API.
Greenbook
A lightweight Model Context Protocol (MCP) server that exposes Greenbook data and tools for market research professionals, analysts, and related workflows.
Semiconductor Supply Chain MCP Server
Access semiconductor B2B platforms like AnySilicon and DesignReuse for IP core and ASIC service procurement.
Carity MCP Server
Retrieve relevant data chunks from the Carity API based on search queries.