A RAG-based Q&A server using a vector store built from Gemini CLI documentation.
This project builds a standalone RAG service, transforming the static gemini-cli
documentation into a dynamic and queryable tool. This tool exposes knowledge via a protocol (like MCP), making it accessible to any integrated client. Therefore, environments like gemini-cli, VS Code, or Cursor can provide developers with instant, accurate answers in natural language, directly within their workflow. Accelerating learning and letting you intuitively leverage the tool's full potential.
This project integrates a RAG pipeline and it consists of three main components:
gemini-cli/docs
directory and sub-directories, process it, and create a vector store.SKLearnVectorStore
.The system is composed of the following parts:
extract.py
: This script walks through the gemini-cli/docs
directory, finds all .md
files, and concatenates their content into a single gemini_cli_docs.txt
file.create_vectorstore.py
: This script loads the gemini_cli_docs.txt
file, splits it into chunks, and creates a gemini_cli_vectorstore.parquet
file using HuggingFaceEmbeddings
and SKLearnVectorStore
.gemini_cli_mcp.py
: This script runs a FastMCP
server that loads the vector store and exposes two endpoints:
gemini_cli_query_tool(query: str)
: A tool that takes a user query, retrieves relevant documents from the vector store, and returns them.docs://gemini-cli/full
: A resource that returns the entire content of the gemini_cli_docs.txt
file.gemini-cli/
: The official Gemini CLI, which can be configured to use the MCP server.gemini-cli
installation. If you don't have it, you can clone the official repository:
git clone https://github.com/google-gemini/gemini-cli.git
Clone the repository:
git clone https://github.com/your-username/gemini-cli-rag-mcp.git
cd gemini-cli-rag-mcp
Install Python dependencies:
pip install -r requirements.txt
Prepare the documentation data:
Run the extract.py
script to gather all the markdown documentation into a single file.
python extract.py
Create the vector store:
Run the create_vectorstore.py
script to create the vector store from the documentation file.
python create_vectorstore.py
Before running with docker, try running the mcp in dev mode and test:
mcp dev gemini_cli_mcp.py
On Command
field type 'python' and on Arguments
type 'gemini_cli_mcp.py' and press Connect.
The most efficient way to run the MCP server is with Docker Compose. This starts a container in the background and keeps it ready for Gemini CLI to connect to.
docker-compose up -d
The container will keep running, but the Python MCP script itself will only be executed on-demand by Gemini CLI.
To make Gemini CLI aware of your local MCP server, you need to create a configuration file.
Inside the .gemini
directory add the following content to the settings.json
file:
{
"mcpServers": {
"local_rag_server": {
"command": "docker",
"args": [
"exec",
"-i",
"gemini-cli-mcp-container",
"python",
"gemini_cli_mcp.py"
]
}
}
}
This configuration tells Gemini CLI how to launch your MCP server using docker exec
.
Obs: To use it in VSCode, go to Settings
type 'mcp' and click on settings.json
. Then put on Agent mode and ask copilot to implement the gemini-cli-mcp server (give the json above as context).
After restarting terminal to changes make effect, simply run gemini
from your terminal. It will automatically discover the local_rag_server
and use its tools when needed.
Example:
How do I customize my gemini-cli?
or something more specific:
My gemini cli is not showing an interactive prompt when I run it on my build server, it just exits. I have a CI_TOKEN environment variable set. Why is this happening and how can I fix it?
The extract.py
script recursively finds all markdown files in the gemini-cli/docs
directory. It reads their content and combines it into a single text file, gemini_cli_docs.txt
.
The create_vectorstore.py
script then takes this text file and:
RecursiveCharacterTextSplitter
.HuggingFaceEmbeddings
(with the BAAI/bge-large-en-v1.5
model) to create embeddings for each chunk.SKLearnVectorStore
, which is persisted to gemini_cli_vectorstore.parquet
.The gemini_cli_mcp.py
script creates a FastMCP
server. This server defines a tool, gemini_cli_query_tool
, which can be called by the Gemini CLI or VSCode/Cursor/etc. When this tool is invoked, it:
SKLearnVectorStore
.The Gemini CLI is designed to be extensible through MCP servers. The CLI discovers available tools by connecting to servers defined in the mcpServers
object in a settings.json
file (either in the project's .gemini
directory or in the user's home ~/.gemini
directory).
Gemini CLI supports three transport mechanisms for communication:
stdin
and stdout
. This is the method used in this project, with the command
property in settings.json
.url
property.httpUrl
property.By using the docker exec
command, we are leveraging the stdio
transport to create a direct communication channel with the Python script inside the container.
extract.py
: Extracts documentation from markdown files.create_vectorstore.py
: Creates the vector store.gemini_cli_mcp.py
: Runs the MCP server.The main Python dependencies are listed in requirements.txt
:
langchain
: For text splitting, vector stores, and embeddings.tiktoken
: For token counting.sentence-transformers
: For the embedding model.scikit-learn
: For the vector store.mcp
: For the MCP server.fastapi
: For the MCP server.The project relies on the gemini-cli
package and its dependencies. See gemini-cli/package.json
for more details.
Interact with the Hyperliquid decentralized exchange by integrating its SDK.
A template for deploying a remote, auth-less MCP server on Cloudflare Workers.
A code observability MCP enabling dynamic code analysis based on OTEL/APM data to assist in code reviews, issues identification and fix, highlighting risky code etc.
An MCP server for accessing YAPI interface details, configured via environment variables.
Enables IDEs like Cursor and Windsurf to analyze large codebases using Gemini's 1M context window.
Execute developer-defined bash scripts in a Dockerized environment for coding agents.
Perform accessibility audits on webpages using the axe-core engine to identify and help fix a11y issues.
Advanced computer vision and object detection MCP server powered by Dino-X, enabling AI agents to analyze images, detect objects, identify keypoints, and perform visual understanding tasks.
Empowers LLMs with real-time network traffic analysis using tshark. Requires Wireshark's tshark to be installed.
Analyzes MicroShift test failures from Google Sheets to correlate them with specific MicroShift versions.