Data.gov.il MCP Server

이스라엘 정부 공개 데이터를 data.gov.il 포털에서 접근하세요.

문서

gov-mcp

data-gov-il-mcp

Production-grade Model Context Protocol (MCP) server for Israeli Government Open Data from data.gov.il.

This server gives MCP-compatible clients structured access to Israeli government datasets, resources, tags, organizations, and tabular records through the official CKAN API. It is written in TypeScript, validates inputs and outputs with Zod, returns JSON-first tool responses, and supports both local stdio and remote Streamable HTTP transports.

Highlights

AreaStatus
MCP tools9 tools by default, 10 with optional Sampling enabled
MCP resources5 static resources + 1 dataset resource template
MCP prompts3 domain prompts with argument completions
DiscoveryIn-memory catalog snapshot with Hebrew normalization, fuzzy matching, tag ranking, and co-occurrence
Transportsstdio and Streamable HTTP
Authnone, API key bearer auth, OAuth 2.1 JWT/JWKS
HTTP hardeningHelmet, CORS, rate limiting, request IDs, Host/Origin validation
ResponsesstructuredContent plus identical JSON in content[0].text
Runtime configEnvironment-driven, validated at startup with Zod

Quick Start

Claude Desktop / Local Stdio

Use the published package directly:

{
  "mcpServers": {
    "data-gov-il": {
      "command": "npx",
      "args": ["-y", "data-gov-il-mcp"]
    }
  }
}

Streamable HTTP

npm install -g data-gov-il-mcp
data-gov-il-mcp-http

Then configure your MCP client:

{
  "mcpServers": {
    "data-gov-il": {
      "url": "http://localhost:3664/mcp"
    }
  }
}

Docker

docker run -p 3664:3664 ghcr.io/davidosherproceed/data-gov-il-mcp:latest

For local development:

npm install
npm run build
npm run start:stdio
# or
npm run start:http

Catalog Discovery Layer

The server ships with a committed catalog snapshot at src/data/catalog/catalog.snapshot.json. The snapshot is generated from data.gov.il and bundled into the build. At startup, it is validated and indexed in memory.

The discovery layer powers:

  • find_datasets catalog-first search with live CKAN fallback.
  • list_all_datasets instant local enumeration with optional organization filter.
  • list_organizations instant local organization list with dataset counts.
  • list_available_tags and search_tags from real CKAN tag/facet data.
  • Dynamic completions for datagov://dataset/{id}.
  • datagov://tags and datagov://catalog/stats resources.

It includes:

  • Hebrew normalization, including nikud removal and final-letter normalization.
  • Tokenization and trigram indexes for fuzzy matches and typo tolerance.
  • Dataset, tag, and organization maps.
  • Tag-to-dataset and organization-to-dataset indexes.
  • Tag co-occurrence for related tag suggestions.
  • Weighted ranking across exact, token, tag, organization, and fuzzy signals.

Refresh the snapshot:

npm run catalog:refresh
npm run build

A scheduled GitHub Actions workflow (.github/workflows/catalog-refresh.yml) refreshes the snapshot and opens a PR when catalog data changes.

More detail: docs/catalog-discovery-layer.md.

Tools

All successful tool responses return:

  • structuredContent: the typed JSON object.
  • content[0].text: the same object serialized as JSON for text-only clients.

Default Tools

ToolPurpose
find_datasetsPrimary dataset discovery tool. Uses the local catalog first, then live CKAN fallback when needed.
get_dataset_infoFull CKAN metadata for a dataset, including tags and resources.
list_all_datasetsInstant catalog-backed dataset summaries, optionally filtered by organization.
list_resourcesLists files/datastores inside a dataset and identifies datastore_active resources.
search_recordsQueries CKAN datastore records with full-text search, filters, fields, sorting, pagination, and distinct values.
list_organizationsCatalog-backed organizations with Hebrew titles and dataset counts.
get_organization_infoLive CKAN organization metadata.
list_available_tagsRanked catalog tags with dataset counts and related tags.
search_tagsFuzzy tag search with related tag suggestions.

Optional Tools

ToolEnabled ByPurpose
summarize_datasetMCP_ENABLE_SAMPLING=trueRequests MCP Sampling from compatible clients to produce a human-readable dataset summary. Falls back to metadata when Sampling is unavailable.

Elicitation in find_datasets

find_datasets can optionally expose an interactive parameter:

{
  "query": "תחבורה",
  "interactive": true
}

This is only registered when:

MCP_ENABLE_ELICITATION=true

When enabled, compatible clients may show a clarification form for broad searches. For example, the server can ask the user to narrow many matching datasets by publisher organization. If the client does not support Elicitation, the user declines, or the request times out, the tool falls back to normal search results.

This is disabled by default because MCP client support varies.

Resources

URITypeDescription
datagov://organizationsStatic JSONOrganization list from CKAN, cached.
datagov://tagsStatic JSONRanked tags from the committed catalog snapshot.
datagov://featuredStatic JSONCurated high-value datasets with ready-to-use resource_id values and field schemas.
datagov://guideStatic textUsage guide for tools, resources, and recommended workflows.
datagov://catalog/statsStatic JSONSnapshot metadata: generated time, dataset count, tag count, organization count, top organizations.
datagov://dataset/{id}Template JSONFull dataset metadata by dataset slug or ID. Includes dynamic completions and a catalog-backed resource list.

The server also implements resource subscriptions in a minimal standards-compliant way:

  • Advertises resources.subscribe.
  • Handles resources/subscribe and resources/unsubscribe.
  • Sends notifications/resources/updated only for resources a client subscribed to.
  • Does not poll CKAN in real time.

Prompts

PromptArgumentPurpose
food-nutrition-analysisanalysis_typeFood prices, nutrition, kosher, safety, import/export.
environmental-sustainability-analysisanalysis_focusAir quality, green buildings, waste, water, contaminated sites.
real-estate-market-analysismarket_focusHousing, urban renewal, subsidized housing, city/property analysis.

Prompt arguments use MCP completions. Domain focus suggestions are curated, and organization completions are catalog-backed.

Optional MCP Client Features

These features are off by default. Enable them only when your target MCP client supports them and you want the server to expose them.

VariableDefaultEffect
MCP_ENABLE_ELICITATIONfalseAdds interactive to find_datasets and allows server-initiated clarification forms. Works in clients such as Cursor and Claude Code.
MCP_ENABLE_SAMPLINGfalseRegisters summarize_dataset, which requests client-side model generation through MCP Sampling.

Client support differs:

  • Cursor supports Elicitation, but does not currently expose Sampling.
  • Claude Code supports Elicitation in recent versions.
  • Claude Desktop supports many MCP features, but Elicitation support is not reliable/available.
  • Sampling availability varies; the server always falls back safely.

Configuration

Copy .env.example to .env:

cp .env.example .env

Core

VariableDefaultDescription
TRANSPORTstdioDefault transport when using the generic entry point. Dedicated binaries are also available.
PORT3664HTTP port.
HOST0.0.0.0HTTP bind host.
CORS_ORIGIN*Allowed CORS origins. Avoid wildcard in production browser deployments.
LOG_LEVELinfofatal, error, warn, info, debug, or trace.
NODE_ENVproductiondevelopment, production, or test.

CKAN

VariableDefaultDescription
CKAN_BASE_URLhttps://data.gov.il/api/3/actionCKAN action API base URL.
CKAN_TIMEOUT_MS10000Default CKAN request timeout.
CKAN_SEARCH_TIMEOUT_MS15000Timeout for heavier datastore/search requests.
CACHE_TTL_MS300000Default cache TTL.
CACHE_MAX_ITEMS500Max entries per in-memory cache.

HTTP Hardening

VariableDefaultDescription
TRUST_PROXYfalseTrust reverse proxy headers. Set when behind nginx/Caddy/ALB.
RATE_LIMIT_WINDOW_MS60000Rate limit window. Set 0 to disable.
RATE_LIMIT_MAX120Requests per IP per window. Set 0 to disable.
ALLOWED_HOSTSemptyHost allowlist for DNS rebinding protection. Defaults to loopback/local hosts when unset.
ALLOWED_ORIGINSemptyBrowser Origin allowlist. Falls back to CORS_ORIGIN when appropriate.

Authentication

VariableDefaultDescription
AUTH_MODEnonenone, apikey, or oauth.
API_KEYSemptyComma-separated bearer tokens for AUTH_MODE=apikey.
OAUTH_ISSUERunsetExpected JWT issuer for AUTH_MODE=oauth.
OAUTH_AUDIENCEunsetExpected JWT audience for AUTH_MODE=oauth.
OAUTH_JWKS_URIunsetJWKS URL for JWT verification.
OAUTH_RESOURCE_SERVERunsetCanonical MCP resource URL for OAuth protected resource metadata. Usually includes /mcp.

Service Identity

VariableDefaultDescription
SERVICE_NAMEpackage/server defaultMCP server name override.
SERVICE_VERSIONpackage versionMCP server version override.
SERVICE_DESCRIPTIONbuilt-in descriptionMCP server semantic description override.

Recommended Workflows

Find and Query a Dataset

  1. Use find_datasets with natural Hebrew or English terms.
  2. Use get_dataset_info or list_resources for a chosen dataset.
  3. Pick a resource with datastore_active=true.
  4. Use search_records with limit=5 first to inspect fields.
  5. Add filters, fields, sort, distinct, or pagination as needed.

Example flow:

find_datasets({ "query": "מחיר למשתכן" })
get_dataset_info({ "dataset": "mechir-lamishtaken" })
search_records({
  "resource_id": "7c8255d0-49ef-49db-8904-4cf917586031",
  "limit": 5,
  "include_total": true
})

Discover Tags

search_tags({ "keyword": "דיור", "limit": 5 })
find_datasets({ "query": "תחבורה", "tags": "תחבורה ציבורית" })

Use Interactive Discovery

Requires:

MCP_ENABLE_ELICITATION=true

Then an agent may call:

find_datasets({ "query": "תחבורה", "interactive": true })

Compatible clients may show a form asking the user to narrow results.

Use Client-Side Summaries

Requires:

MCP_ENABLE_SAMPLING=true

Then:

summarize_dataset({ "dataset": "mechir-lamishtaken", "language": "he" })

If Sampling is unavailable, the tool returns the dataset metadata and sampling.used=false.

Development

npm install

# Type-check
npm run typecheck

# Lint
npm run lint

# Test
npm test

# Build
npm run build

# Refresh local catalog snapshot
npm run catalog:refresh

Run locally:

# stdio
npm run build
npm run start:stdio

# HTTP
npm run build
npm run start:http

Enable optional features locally:

MCP_ENABLE_ELICITATION=true MCP_ENABLE_SAMPLING=true npm run start:http

On PowerShell:

$env:MCP_ENABLE_ELICITATION="true"
$env:MCP_ENABLE_SAMPLING="true"
npm run start:http

Project Structure

src/
  auth/           Authentication providers and Express middleware
  bin/            stdio and HTTP entry points
  cache/          In-memory TTL/LRU cache
  catalog/        Snapshot validation, indexing, fuzzy search, CatalogService
  ckan/           Typed CKAN API client and CKAN response types
  config/         Zod env config, constants, server identity
  core/           Dependency container, MCP server factory, lifecycle
  data/catalog/   Committed catalog snapshot artifact
  formatting/     JSON response builders and guidance text
  observability/  Pino logger
  prompts/        MCP prompt definitions, templates, registration
  resources/      MCP resources, templates, subscriptions
  services/       Domain services for CKAN data access
  tools/          MCP tool definitions and Zod schemas
  transports/     stdio and Streamable HTTP transports
tests/
  fixtures/       Test fixtures
  unit/           Unit tests
scripts/
  refresh-catalog.ts
docs/
  catalog-discovery-layer.md
  MIGRATION.md

Docker

docker build -t data-gov-il-mcp .
docker run -p 3664:3664 data-gov-il-mcp

With optional features:

docker run -p 3664:3664 \
  -e MCP_ENABLE_ELICITATION=true \
  -e MCP_ENABLE_SAMPLING=true \
  data-gov-il-mcp

Quality

The project is expected to pass:

npm run typecheck
npm run lint
npm test
npm run build

Current implementation includes unit coverage for environment parsing, auth providers, CKAN errors, cache, formatting, catalog text normalization/fuzzy/index/search logic, snapshot validation, services, resources, subscriptions, and HTTP Host/Origin guard.

License

MIT