OneKGPd-MCP
Real-time access to 1000 Genomes Project dataset
1000 Genomes Project Dataset MCP Server
Natural language access to 1000 Genomes Project dataset, hosted online in Dnaerys variant store
Sequenced & aligned by New York Genome Center (GRCh38). 3202 samples: 2504 unrelated samples from phase three panel + 698 samples from 602 family trios - dataset details
Key Features
-
real-time access to 138 044 723 unique variants and ~442 billion individual genotypes
-
variant, sample and genotype selection based on coordinates, annotations, zygosity, population
-
filtering by VEP (impact, biotype, feature type, variant class, consequences), ClinVar Clinical Significance (202502), gnomADe + gnomADg 4.1, AlphaMissense Score & AlphaMissense Class annotations
- annotated with VEP 115 / GENCODE 49
- GENCODE Primary set transcripts
- full annotation composition
-
returned variants annotated with HGVSp, gnomADe + gnomADg, AlphaMissense score + cohort-wide statistics
- HGVSp annotations are for Canonical transcripts to reduce LLMs cognitive load
-
samples annotated with: familyId, gender, paternalId, maternalId, relationship, children, population, superpopulation, phase3 indicator
Online Service
Remote MCP service via Streamable HTTP:
Examples
Macromolecular structural complexes
Treat the 26S Proteasome as a mechanically redundant 3D machine and map every missense variant from the KGP individuals across all 33 subunits. Perform a spatial analysis to determine if pathogenic variation is statistically partitioned toward the distal 'Lid' (Zone C) rather than the more evolutionary constrained 'Core' (Zone A) or 'Gating' (Zone B) interfaces. Identify individuals with a high cumulative burden (2+ 'Likely Pathogenic' variants) to investigate inter-subunit compensation, searching for paired 'weakening' and 'stabilizing' mutations at protein-protein hinges. Finally, define the 'mechanical tolerance' of the proteasome by establishing the maximum cumulative structural disruption observed in a single healthy individual based on AlphaMissense scores and calculated ΔΔG values.
Case study: workflow, task reports, manuscripts drafts →
Macromolecular structural complexes
The MCM2-7 Complex (The "DNA Helicase Motor") is a molecular masterpiece. It’s a heterohexameric ring where each subunit is a distinct "gear" in the DNA-unzipping motor. Unlike homomeric rings (where every subunit is the same), this complex is asymmetric. Each interface between subunits is unique, and they don't all burn ATP at the same rate. The MCM2/5 interface is the "gate" that must physically open to allow DNA to enter the ring and then snap shut. This is a high-stress mechanical point.
Identify individuals in the KGP cohort carrying missense variants at the MCM2/5 interface. Specifically, look for 'charge-reversal' variants (e.g., Aspartate to Lysine). In these specific samples, analyze the 'compensatory coupling': do they carry a secondary, reciprocal charge-reversal variant on the opposing subunit interface that restores the electrostatic 'latch' ?
Identify individuals in the KGP cohort who carry high-pathogenicity variants in the Walker A or Walker B motifs (the ATP-burning heart) of any MCM subunit in MCM2-7 Complex. For these individuals, perform a 'Systemic Flux' analysis: look at their variants in the leading-strand polymerase (POLE) and the sliding clamp (PCNA). Do you detect a signature of 'Coordinated Deceleration' where the motor, the clamp, and the polymerase all carry variants that suggest a slower but highly-accurate replication fork ?
Macromolecular structural complexes
The human RNA Exosome (Exo-9 core) is a "dead machine" that acts as a scaffold. In lower organisms the ring itself can degrade RNA. In humans, the 9-subunit ring has lost all its catalytic teeth and is purely a structural tunnel that guides RNA into the catalytic subunits (DIS3 or EXOSC10) attached at the bottom. Since RNA is a highly negatively charged polymer, the residues lining this pore are typically positively charged (Lysine, Arginine), but not too "sticky" or RNA will jam. So, to reach the "shredder" at the bottom it must slide through a narrow pore formed by the Exo-9 ring.
The task: analyse all missense variants in the KGP cohort that map to the internal pore-lining residues of the Exo-9 ring. Look for 'charge-swap' variants where a positive residue (K, R) is replaced by a negative one (D, E). If an individual is healthy despite having a 'negative patch' in the tunnel that should repel RNA, do they carry a compensatory variant in the cap subunits (EXOSC1, 2, 3) that widens the entrance? Use a 3D electrostatic surface map to determine if the 'healthy' cohort maintains a specific electrostatic gradient.
Synergistic Epistasis in Redox Homeostasis
Cellular redox homeostasis is maintained by two parallel antioxidant systems: the glutathione system and the thioredoxin system. Complete loss of either GSR or TXNRD1 is incompatible with mammalian development, yet population databases contain individuals carrying variants predicted to impair enzyme function.
Identify clusters of individuals in the KGP cohort who carry multiple 'Moderate' impact VEP variants across both systems. Reasoning through the AlphaMissense structural implications, can you detect a 'balancing act' where a loss of efficiency in Glutathione reductase is consistently paired with high-confidence benign or potentially activating variants in the Thioredoxin system ? Synthesize a model of 'Redox Robustness' based on the co-occurrence of these variants across the cohort.
Architecture
Implemented as a Java EE service, accessing KGP dataset via gRPC calls to public Dnaerys variant store service.
-
provides MCP over Streamable HTTP, HTTP/SSE and STDIO transports
-
service implementation is based on Quarkus MCP Server framework
-
sample population and metadata are managed by an embedded DuckDB instance
-
MCP Tools:
- Genomics database: countSamples, countSamplesHomozygousReference, countVariants, countVariantsInSamples, getDatasetInfo, getKinshipDegree, selectSamples, selectSamplesHomozygousReference, selectVariants, selectVariantsInSamples, computeAlphaMissenseAvg, computeVariantBurden
- Population and metadata: listPopulations, listSuperpopulations, getPopulationStats, getSuperpopulationSummary, getSampleMetadata, selectSamplesByPopulation
- implementation
Installation
Project can be run locally with MCP over stdio and/or http transports
- build the project and package it as a single über-jar:
- jar is located in
target/onekgpd-mcp-runner.jarand includes all dependencies
- jar is located in
./mvnw clean package -DskipTests -Dquarkus.package.jar.type=uber-jar
with skipping test compilation
./mvnw clean package -Dmaven.test.skip=true -Dquarkus.package.jar.type=uber-jar
- run it locally with dev profile
- both stdio and http transports are enabled
- http transport is on port 9000 (quarkus.http.port in config)
- project expects JRE 21 to be available at runtime
java -Dquarkus.profile=dev -jar <full path>/onekgpd-mcp-runner.jar
Connecting with MCP clients
-
to connect via http transport, remote or local, simply direct the client to a destination, e.g.
http://localhost:9000/mcporhttps://db.dnaerys.org:443/mcp- NB: Claude Desktop won't work with
http://localhost:9000/mcpoption. This option is for clients like Goose.
- NB: Claude Desktop won't work with
-
to connect via stdio transport, MCP client should start application with dev profile and with a full path to the jar file
- e.g. for Claude Desktop add to config files (e.g.
claude_desktop_config.json):
- e.g. for Claude Desktop add to config files (e.g.
{
"mcpServers": {
"OneKGPd": {
"command": "java",
"args": ["-Dquarkus.profile=dev", "-jar", "/full/path/onekgpd-mcp-runner.jar"]
}
}
}
Verification
How many variants exist in 1000 Genome Project ?
Test Coverage Status
| Component | Type | Tests | Status |
|---|---|---|---|
| Entity Mappers (9 classes) | Unit | 314 | ✅ Complete |
| DnaerysClient | Unit | 58 (7 disabled) | ✅ Complete |
| DnaerysClient | Integration | 5 (1 disabled) | ✅ Complete |
| OneKGPdMCPServer | Unit | 26 | ✅ Complete |
| OneKGPdMCPServer | Integration | 5 | ✅ Complete |
| Other | Unit | 1 | ✅ Complete |
| Other | Integration | 1 | ✅ Complete |
| Total | 410 tests | 402 passing, 8 disabled |
Test Breakdown:
- Unit tests: 399 (7 disabled, 392 passing)
- Integration tests: 11 (1 disabled, 10 passing)
Disabled Tests:
- 7 DnaerysClient unit tests (PaginationTests, streaming gRPC limitation -
wiremock-grpc-extension:0.11.0cannot mock streaming RPCs yet) - 1 DnaerysClient integration test (PaginationLogicTests, streaming gRPC limitation -
wiremock-grpc-extension:0.11.0cannot mock streaming RPCs yet)
Running Tests
# Unit tests only (no server required)
./mvnw test
# Integration tests (requires db.dnaerys.org access)
./mvnw verify -DskipITs=false
# Update test baselines after data changes
./mvnw verify -DskipITs=false -DupdateBaseline=true
Test part of this project is written by Claude. Fun part is written by humans.
Privacy Policy
OneKGPd MCP Server operates as a read-only interface layer for 1000 Genomes Project dataset. Server does not collect, store, or transmit any user data. No conversation data is recorded. No personal information is collected. No cookies, tracking mechanisms or authentication are used.
Support
- Issues and questions: https://github.com/dnaerys/onekgpd-mcp/issues
- Email: [email protected]
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Servidores relacionados
DealX
MCP Server for DealX platform
LeadEnrich MCP
Waterfall lead enrichment for AI agents — cascades Apollo, Clearbit, and Hunter for maximum data coverage.
Cryptorefills
AI agent commerce: buy gift cards, top up phones, and get travel eSIMs with Bitcoin, Ethereum, Solana, USDC, USDT, Litecoin, Dogecoin, and 15+ cryptos on Base, Polygon, Arbitrum, Tron, and more. No account, no CLI install, no API key — connect via MCP or let your agent pay autonomously with x402.
Search Movie
一个基于 Model Context Protocol (MCP) 构建的智能电影和电视剧资源搜索工具,支持多源搜索和链接验证。An intelligent movie and TV series resource search tool based on Model Context Protocol (MCP), supporting multi-source search and link verification.
exchange-rate
A simple MCP server for currency exchange data. It provides tools to fetch live rates, convert amounts, and retrieve all rates for a base currency in a clean, structured format.
Unphurl
URL intelligence for AI agents. 13 tools for security signals and data quality checks. Analyses URLs across 7 dimensions: redirect behaviour, brand impersonation, domain age, SSL/TLS, parked detection, URL structure, DNS enrichment. Risk score 0-100 with 23 configurable weights.
Doppio Coffee MCP
Order coffee from a roastery DOPPIO, directly through MCP
Ingero
eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs via uprobes and host kernel events via tracepoints to build causal chains explaining GPU latency. 7 MCP tools for AI-assisted GPU debugging and root cause analysis. <2% overhead, production-safe.
Time MCP Server
Enables time awareness for large language models.
Sysmetrics
Give your self-hosted agents 'situational awareness.' This MCP server provides a direct interface for agents to query Linux system telemetry, enabling autonomous resource monitoring, proactive alerting, and interactive troubleshooting via any MCP-compatible client.