Research Friend

A friendly helper for AI assistants that need to look things up on the web and manage a local research stash.

Research Friend is an MCP server that gives your AI tools the ability to fetch web pages and search the internet. It uses a real web browser behind the scenes, so it works even with modern websites that rely heavily on JavaScript. It also includes a local “stash” for storing documents, extracting text, and searching across your library.

To make use of all its features, you'll want an MCP client that supports prompts (common) and sampling (less common). We're building Research Friend alongside Chabeau, which supports both.

What can it do?

Fetch web pages with a real browser (including JS-heavy sites)
Fetch PDFs and extract their text content
Search the web via DuckDuckGo or Google
Maintain a local stash of documents for search, listing, and extraction

Getting started

You'll need Node.js version 20 or newer installed on your computer.

1. Install dependencies

Open a terminal in this folder and run:

npm install

2. Install browser support

Research Friend uses Playwright to control a web browser. After installing dependencies, you'll need to install the browser:

npx playwright install chromium

This downloads a copy of Chromium that Playwright will use. It's separate from any browsers you already have installed.

3. Start the server

node src/index.js

The server communicates over stdio (standard input/output), which is how MCP clients connect to it.

Adding to your MCP client

How you add Research Friend depends on which MCP client you're using. Here's a general example of what the configuration might look like:

[[mcp_servers]]
id = "research-friend"
command = "node"
args = ["/path/to/mcp-research-friend/src"]
transport = "stdio"

Replace /path/to/mcp-research-friend with the actual path to this folder on your computer.

Tools

Web tools

friendly_web_fetch

Fetches a web page and returns its content. By default, returns markdown with links preserved — ideal for LLMs. Uses Readability to extract the main content (stripping navigation, ads, etc.). For PDFs, pagination, or searching within content, use friendly_web_extract instead.

Parameters:

url (required) - The web address to fetch
outputFormat - Output format: markdown (default), text, or html
waitMs - Extra time to wait after the page loads, in case content appears slowly
timeoutMs - How long to wait before giving up (default: 15 seconds)
maxChars - Maximum amount of content to return (default: 40,000 characters)
includeHtml - Set to true to also return the raw HTML alongside the content
headless - Set to false to see the browser window (useful for debugging)

Returns:

url - The URL that was requested
finalUrl - The URL after any redirects
title - The page title
content - The extracted content (in the requested format)
html - Raw HTML (only if includeHtml is true)
meta - Page metadata (description, author, published time, etc.)
fetchedAt - ISO timestamp of when the page was fetched
truncated - Whether the content was truncated to fit maxChars

friendly_search

Searches the web and returns a list of results.

Parameters:

query (required) - What to search for
engine - Which search engine to use (duckduckgo or google)
maxResults - How many results to return (default: 10, maximum: 50)
timeoutMs - How long to wait before giving up (default: 15 seconds)
headless - Set to false to see the browser window

Returns:

query - The search query that was used
engine - Which search engine was used
results - Array of results, each with title, url, and snippet
searchedAt - ISO timestamp of when the search was performed
fallback_result_html - Raw HTML of the page (only included if no results were found)
debug_info - Diagnostic information about the search attempt

CAPTCHA handling: If a CAPTCHA is detected while running in headless mode, the tool automatically retries with a visible browser window. This gives you a chance to solve the CAPTCHA manually. The debug_info.retried field indicates whether this fallback was used.

friendly_web_extract

Extracts content from a URL. Auto-detects whether the URL points to a PDF or a web page and handles each appropriately.

Parameters:

url (required) - The URL to fetch (PDF or web page)
maxChars - Maximum amount of text to return (default: 40,000 characters)
offset - Character position to start from (default: 0). Use this to paginate through large content.
search - Search for a phrase and return matches with surrounding context instead of full content
contextChars - Characters of context around each search match (default: 200)
waitMs - Extra time to wait after page load for dynamic content (web pages only)
timeoutMs - How long to wait before giving up (default: 15 seconds, web pages only)
headless - Set to false to see the browser window (web pages only)

Returns (normal mode):

url - The URL that was requested
contentType - Either pdf or html
title - The page/document title
author - The PDF author (PDFs only, if available)
creationDate - When the PDF was created (PDFs only, if available)
pageCount - Number of pages (PDFs only)
totalChars - Total characters (use with offset to paginate)
offset - The offset that was used
content - The extracted text content
fetchedAt - ISO timestamp
truncated - Whether more content remains after this chunk

Returns (search mode):

url, contentType, title, totalChars, fetchedAt - Same as above
search - The search phrase that was used
matchCount - Number of matches found
matches - Array of matches, each with position, context, prefix, and suffix

friendly_web_ask

Fetches a URL (PDF or web page) and has an LLM answer questions about it. Auto-detects content type. The document is processed in a separate context, keeping your main conversation compact.

Parameters:

url (required) - The URL to fetch (PDF or web page)
ask (required) - Question or instruction for the LLM (summarize, extract info, answer questions, etc.)
askMaxInputTokens - Maximum input tokens per LLM call (default: 150,000)
askMaxOutputTokens - Maximum output tokens per LLM call (default: 4,096)
askTimeout - Timeout in milliseconds (default: 300,000 = 5 minutes)
askSplitAndSynthesize - For large documents: split into chunks, process each, then synthesize results (default: false). Warning: consumes many tokens.
waitMs - Extra time to wait after page load for dynamic content (web pages only)
timeoutMs - How long to wait before giving up (default: 15 seconds, web pages only)
headless - Set to false to see the browser window (web pages only)

Returns:

url - The URL that was requested
contentType - Either pdf or html
title - The page/document title
totalChars - Total characters in the document
ask - The instruction that was given
answer - The LLM's response
model - The model that generated the response
chunksProcessed - Number of chunks processed (1 for small documents, more when using askSplitAndSynthesize)
fetchedAt - ISO timestamp

Ask mode uses MCP sampling to have an LLM process the document with any instruction. This is useful for:

Large documents that would overwhelm context
Keeping token costs down on the main conversation

When askSplitAndSynthesize is enabled, documents exceeding askMaxInputTokens are automatically split into overlapping chunks. Each chunk is processed separately, and the results are synthesized into a single coherent answer. The final response is provided in the same language as your request, regardless of the document's language.

Document stash

The stash is a local, searchable library of documents. It supports PDFs, HTML files, and plaintext (Markdown/TXT). When you add a document, Research Friend stores the original file, extracts text (for PDFs/HTML), and saves metadata in a local database. Searches use ripgrep under the hood for fast, phrase-aware matching.

Stash location

The stash lives under ~/.research-friend/:

inbox/ - Drop files here to be processed
store/ - Organized document storage and extracted text
stash.db - Metadata database

Supported file types

PDF: .pdf (text extracted)
HTML: .html, .htm (text extracted)
Markdown: .md, .markdown (stored as plaintext)
Text: .txt (stored as plaintext)

Stash tools

stash_open_inbox

Open the stash inbox folder in your file manager for easier drag-and-drop.

Returns:

opened - Whether the folder open request was sent
inboxPath - Absolute path to the inbox
command - OS command used
args - Command arguments used

stash_process_inbox

Process files in inbox/, classify them into topics, extract text, and store results. For long documents, classification uses sampled sections (start/middle/end plus a few random chunks) to improve topic accuracy.

Returns:

processed - Array of filenames successfully processed
errors - Any errors encountered
documents - Array of created document records

reindex_stash

Regenerate summaries, re-allocate topics, and update store metadata for stashed documents. If ids is omitted or empty, all documents are reindexed.

Parameters:

ids - Document IDs to reindex (optional)

Returns:

reindexed - Document IDs reindexed
errors - Any errors encountered
documents - Array of updated document records

stash_list

List documents in the stash.

Parameters:

topic - Filter to a topic (optional)
limit - Max results (default: 50)
offset - Pagination offset (default: 0)

Returns:

type - all or topic
totalDocuments - Total documents (only when type is all)
count - Results returned after pagination
offset - Pagination offset used
limit - Pagination limit used
topics - Summary of known topics and doc counts
documents - Document list with metadata (includes isPrimary when listing a topic)

stash_search

Search filenames and content across the stash. All search terms must be present (AND logic). Filename matches are listed first. Use quotes for exact phrases.

Parameters:

query (required) - Search terms. Use quotes for phrases: "sparkling wine"
topic - Filter to a topic (optional)
ids - Filter to specific document IDs (optional)
limit - Max documents to return (default: 20)
offset - Pagination offset (default: 0)
maxMatchesPerDoc - Max matches per document (default: 50)
context - Lines of context around each match (default: 1, max: 5). Controls both how close terms must appear to match AND how much surrounding text is returned.

Returns:

totalMatches - Total documents matched before pagination
count - Results returned after pagination
results - Array of documents, each with:
- id, filename, fileType, summary, charCount, createdAt
- matchType - filename, content, or filename+content
- matches - Array of { line, context } for each match location

Use the line values with stash_extract to jump directly to match locations.

stash_extract

Extract content from a stashed document for reading. Use line numbers from stash_search results to jump directly to matches.

Parameters:

id (required) - Document ID from stash_list/stash_search
maxChars - Maximum amount of text to return (default: 40,000 characters)
offset - Character position to start from (mutually exclusive with line)
line - Line number to start from (mutually exclusive with offset)

Returns:

id, filename, fileType, summary - Document metadata
totalChars - Total characters in the document
offset - Character offset (included when using line for reference)
line - Line number (only when line parameter was used)
content - The extracted text content
truncated - Whether more content remains after this chunk

stash_ask

Have an LLM answer questions about a stashed document. The document is processed in a separate context, keeping your main conversation compact.

Parameters:

id (required) - Document ID from stash_list/stash_search
ask (required) - Question or instruction for the LLM
askMaxInputTokens - Maximum input tokens per LLM call (default: 150,000)
askMaxOutputTokens - Maximum output tokens per LLM call (default: 4,096)
askTimeout - Timeout in milliseconds (default: 300,000 = 5 minutes)
askSplitAndSynthesize - For large documents: split into chunks, process each, then synthesize results (default: false)

Returns:

id, filename, fileType, summary - Document metadata
totalChars - Total characters in the document
ask - The instruction that was given
answer - The LLM's response
model - The model that generated the response
chunksProcessed - Number of chunks processed

Typical flow

Drop files into ~/.research-friend/inbox/
Run stash_process_inbox
Use stash_list to browse topics
Use stash_search to find relevant docs
Use stash_extract to read a specific doc, or stash_ask to ask questions about it

Troubleshooting

"Browser closed unexpectedly" or similar errors

Try reinstalling the browser:

npx playwright install chromium --force

On Linux, you might also need system dependencies:

npx playwright install-deps chromium

The server won't start

Make sure you're using Node.js 20 or newer:

node --version

If your version is older, visit nodejs.org to download a newer one.

License

MIT

MCP Research Friend

Research Friend

What can it do?

Getting started

1. Install dependencies

2. Install browser support

3. Start the server

Adding to your MCP client

Tools

Web tools

friendly_web_fetch

friendly_search

friendly_web_extract

friendly_web_ask

Document stash

Stash location

Supported file types

Stash tools

stash_open_inbox

stash_process_inbox

reindex_stash

stash_list

stash_search

stash_extract

stash_ask

Typical flow

Troubleshooting

"Browser closed unexpectedly" or similar errors

The server won't start

License

Related Servers

Custom Elasticsearch

Search1API

Local RAG

Academia MCP

Perplexity Search

Facebook Ads Library

RocketReach

Greenbook

Semiconductor Supply Chain MCP Server

Carity MCP Server