OneKGPd-MCP
Real-time access to 1000 Genomes Project dataset
1000 Genomes Project dataset MCP Server
Natural language access to 1000 Genomes Project dataset, hosted online in Dnaerys variant store
Dataset is sequenced & aligned to GRCh38 by New York Genome Center
- 2504 unrelated samples from the phase three panel
- additional 698 samples from 602 family trios
- 3202 samples total (1598 males, 1604 females)
- dataset details
Key Features
-
real-time access to 138 044 724 unique variants and about 442 billion individual genotypes in 3202 samples
-
variant, sample, and genotype selection based on coordinates, annotations, zygosity
-
filtering by VEP, ClinVar, gnomAD AF and AlphaMissense annotations
-
filtering by inheritance model (de novo, heterozygous dominant, homozygous recessive)
Deployments
Remote MCP service is available online via Streamable HTTP:
For local build with stdio transport see details below
Architecture
MCP Server is implemented as a Java EE service, accessing 1KGP dataset via gRPC calls to public Dnaerys variant store service.
- service implementation is based on Quarkus MCP Server
- provides MCP over Streamable HTTP, HTTP/SSE and STDIO transports
Examples
Many questions below were flagged by Opus 4.5’s safety filters and were left unanswered, hence Sonnet was used for most of them unless specified otherwise. Answers below are from Sonnet 4.5: some from multi-agent research system, some with extended thinking mode, and some from a single-agent system in normal mode.
Incomplete Penetrance & Genetic Resilience
Identify potential modifier variants for well-known pathogenic alleles in TTN - variants that consistently co-occur in the same haplotype block with pathogenic alleles and may alter severity or penetrance. Conduct research for pathogenic alleles documented in the literature. Use KGP dataset of healthy individuals to find potential modifier variants. Start with 100kb for "the same haplotype block" definition, then extend if required. Evaluate statistical significance for the best modifier candidates found. No initial constraints for modifier types.
- it feels unreal how easily this thing can pull not entirely nonsensical events from a dataset with p = 2.29×10⁻¹³ (vis). It makes one wonder what is possible with a proper study design, clinical and control cohorts, and a bit more dedication
- same task for KCNH2, SCN5A, CACNA1C, LMNA, SPAST and BMPR2
Identify samples in the KGP dataset that are homozygous for variants classified as 'Pathogenic' in ClinVar for severe autosomal recessive metabolic disorders. For these specific samples, scan their exomes for enrichment of variants in known suppressor genes or alternative metabolic pathways that might compensate for the primary defect. Propose a mechanism of compensation based on pathway analysis.
- reports for AATD - SERPINA1 - PiZZ and Cystic Fibrosis & Sickle Cell Disease
Select samples carrying known dominant-negative variants in KRT5 or KRT14 genes (Epidermolysis Bullosa) in the KGP. Search for potential cis- or trans-acting rescue modifiers. Specifically, check if these samples carry variants that promote the upregulation of the homologous KRT6 or KRT16 genes (paralog compensation). Can you detect a statistically significant enrichment of 'paralog-boosting' promoter variants in these resilient carriers ?
Structural Intolerance
Which regions in XXXX gene are most likely disease-critical, with strong purifying selection, based on available variation patterns across functional domains in KGP ? Do statistical evaluation.
In what cardiac related genes, e.g. ion channels, variants in KGP dataset near catalytic residues or ligand-binding pockets show strong depletion compared to flanking residues (±20 amino acids) ?
- results might be some
Reclassification & AlphaMissense Integration
Retrieve all variants in KGP dataset in the voltage-gated sodium channel gene family (SCN1A, SCN2A, SCN5A) currently classified as 'VUS' in ClinVar. Correlate their 'Likely Pathogenic' AlphaMissense classification with their frequency in this healthy cohort. Synthesize a reasoned argument to reclassify a subset of these as 'Likely Benign' based on the logic that pathogenic predictions by AlphaMissense are incompatible with the observed allele frequency in this healthy population.
Oligogenic Burden
Calculate the 'Ciliary Mutational Load' for every individual in the KGP dataset. Aggregate all rare, non-synonymous variants across the entire Bardet-Biedl Syndrome (BBS) gene panel (BBS1 through BBS21). Is there a clear 'cliff' or maximum mutational burden observed in healthy individuals ? Determine if the healthy cohort contains any 'triallelic' carriers (homozygous at one locus, heterozygous at another) and model why they do not display the BBS phenotype.
Protein-Protein Interactions
Analyze samples in the KGP dataset with missense variants located at the 'hinge' or 'head' domains in Cohesin complex genes (SMC1A, SMC3, RAD21). Perform a 'co-evolution' analysis - do samples with a destabilizing mutation in the SMC1A head domain tend to carry a complementary variant in the SMC3 head domain that restores electrostatic compatibility (e.g., a charge swap from Glu->Lys in one and Lys->Glu in the other) ?
- results might be some
More examples here
Available Tools
Description for 30 tools and parameters can be found here
Installation
Project can be run locally with MCP over stdio and/or http transports
Option A - build & run locally
- build the project and package it as a single über-jar:
- jar is located in
target/onekgpd-mcp-runner.jarand includes all dependencies
- jar is located in
./mvnw package -DskipTests -Dquarkus.package.jar.type=uber-jar
- run it locally with dev profile
- both stdio and http transports are enabled
- http transport is on quarkus.http.port
- project expects JRE 21 to be available at runtime
java -Dquarkus.profile=dev -jar <full path>/onekgpd-mcp-runner.jar
Option B - build & run in docker
-
in order to run in docker, stdio transport needs to be disabled to prevent application from stopping itself due to closed stdio in containers
- it's already configured in prod profile
- it's the default configuration overall
-
build with prod profile
docker build -f Dockerfile -t onekgpd-mcp .
- run as you prefer, e.g.
docker run -p 9000:9000 --name onekgpd-mcp --rm onekgpd-mcp
Connecting with MCP clients
-
to connect via http transport, remote or local, simply direct the client to an appropriate destination, e.g.
http://localhost:9000/mcporhttps://db.dnaerys.org:443/mcp -
to connect via stdio transport, MCP client should start application with dev profile and with a full path to the jar file
- e.g. for Claude Desktop and stdio transport add to
claude_desktop_config.json:
- e.g. for Claude Desktop and stdio transport add to
{
"mcpServers": {
"OneKGPd": {
"command": "java",
"args": ["-Dquarkus.profile=dev", "-jar", "/full/path/onekgpd-mcp-runner.jar"]
}
}
}
Verification
How many variants exist in 1000 Genome Project ?
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Related Servers
Video Still Capture MCP
An MCP server for accessing and controlling webcams using OpenCV.
Satellite MCP Server
Performs satellite orbital mechanics calculations using natural language, with a built-in world cities database for location lookup.
Soccerdata MCP Server
Provides real-time football match information from the SoccerDataAPI using natural language.
MCP Trakt
Access real-time entertainment data and personal viewing history from the Trakt.tv API.
Android-Mobile-MCP
This MCP server enabling AI agents to control Android devices.
SpaceTraders
An MCP server for interacting with the SpaceTraders API, a space-based trading and exploration game.
MCPlayerOne
An AI-powered, synthwave, maze-crawling, and world-building adventure game server.
UPS MCP Server
An MCP server for accessing UPS shipping and logistics services.
Plex
Provides AI assistants with comprehensive access to a Plex Media Server.
Firelinks.cc MCP
Create and manage short links for tracking and distributing traffic.