OrionBelt Semantic Layer
API-first engine and MCP server that transforms declarative YAML model definitions into optimized SQL for Postgres, Snowflake, ClickHouse, Dremio, and Databricks
OrionBelt Semantic Layer is an API-first engine that transforms declarative YAML model definitions into optimized SQL for Postgres, Snowflake, ClickHouse, Dremio, and Databricks. It provides a unified abstraction over your data warehouse, so analysts and applications can query using business concepts (dimensions, measures, metrics) instead of raw SQL. Every capability — model loading, validation, query compilation, and diagram generation — is exposed through a REST API, making OrionBelt easy to integrate into any application, workflow, or AI assistant.
Features
- 5 SQL Dialects — Postgres, Snowflake, ClickHouse, Dremio, Databricks SQL with dialect-specific optimizations
- AST-Based SQL Generation — Custom SQL AST ensures correct, injection-safe SQL (no string concatenation)
- OrionBelt ML (OBML) — YAML-based semantic models with data objects, dimensions, measures, metrics, and joins
- Star Schema & CFL Planning — Automatic join path resolution with Composite Fact Layer support for multi-fact queries
- Vendor-Specific SQL Validation — Post-generation syntax validation via sqlglot for each target dialect (non-blocking)
- Validation with Source Positions — Precise error reporting with line/column numbers from YAML source, including join graph analysis (cycle and multipath detection, secondary join constraints)
- Session Management — TTL-scoped sessions with per-client model stores
- ER Diagram Generation — Mermaid ER diagrams via API and Gradio UI with theme support, zoom, and secondary join visualization
- REST API — FastAPI-powered session endpoints for model loading, validation, compilation, diagram generation, and management
- MCP Server — Available as a separate thin client in orionbelt-semantic-layer-mcp — delegates to the REST API via HTTP, deployable independently (e.g. to Prefect Horizon)
- Gradio UI — Interactive web interface for model editing, query testing, and SQL compilation with live validation feedback
- OSI Interoperability — Bidirectional conversion between OBML and the Open Semantic Interchange format via REST API (
/convert) and Gradio UI, with validation for both directions - Plugin Architecture — Extensible dialect system with capability flags and registry
Quick Start
Prerequisites
- Python 3.12+
- uv package manager
Installation
git clone https://github.com/ralfbecher/orionbelt-semantic-layer.git
cd orionbelt-semantic-layer
uv sync
Run Tests
uv run pytest
Start the REST API Server
uv run orionbelt-api
# or with reload:
uv run uvicorn orionbelt.api.app:create_app --factory --reload
The API is available at http://127.0.0.1:8000. Interactive docs at /docs (Swagger UI) and /redoc.
Example
Define a Semantic Model
# yaml-language-server: $schema=schema/obml-schema.json
version: 1.0
dataObjects:
Customers:
code: CUSTOMERS
database: WAREHOUSE
schema: PUBLIC
synonyms: [client, buyer, purchaser]
columns:
Customer ID:
code: CUSTOMER_ID
abstractType: string
Country:
code: COUNTRY
abstractType: string
Orders:
code: ORDERS
database: WAREHOUSE
schema: PUBLIC
columns:
Order ID:
code: ORDER_ID
abstractType: string
Order Customer ID:
code: CUSTOMER_ID
abstractType: string
Price:
code: PRICE
abstractType: float
numClass: non-additive
Quantity:
code: QUANTITY
abstractType: int
numClass: additive
joins:
- joinType: many-to-one
joinTo: Customers
columnsFrom:
- Order Customer ID
columnsTo:
- Customer ID
dimensions:
Country:
dataObject: Customers
column: Country
resultType: string
measures:
Revenue:
resultType: float
aggregation: sum
expression: "{[Orders].[Price]} * {[Orders].[Quantity]}"
synonyms: [sales, income, turnover]
The yaml-language-server comment enables schema validation in editors that support it (VS Code with YAML extension, IntelliJ, etc.). The JSON Schema is at schema/obml-schema.json.
Define a Query
Queries select dimensions and measures by their business names:
select:
dimensions:
- Country
measures:
- Revenue
limit: 100
Compile to SQL (Python)
from orionbelt.compiler.pipeline import CompilationPipeline
from orionbelt.models.query import QueryObject, QuerySelect
from orionbelt.parser.loader import TrackedLoader
from orionbelt.parser.resolver import ReferenceResolver
# Load and parse the model
loader = TrackedLoader()
raw, source_map = loader.load("model.yaml")
model, result = ReferenceResolver().resolve(raw, source_map)
# Define a query
query = QueryObject(
select=QuerySelect(
dimensions=["Country"],
measures=["Revenue"],
),
limit=100,
)
# Compile to SQL
pipeline = CompilationPipeline()
result = pipeline.compile(query, model, "postgres")
print(result.sql)
Generated SQL (Postgres):
SELECT
"Customers"."COUNTRY" AS "Country",
SUM("Orders"."PRICE" * "Orders"."QUANTITY") AS "Revenue"
FROM WAREHOUSE.PUBLIC.ORDERS AS "Orders"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
ON "Orders"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
GROUP BY "Customers"."COUNTRY"
LIMIT 100
Change the dialect to "snowflake", "clickhouse", "dremio", or "databricks" to get dialect-specific SQL.
Use the REST API with Sessions
# Start the server
uv run orionbelt-api
# Create a session
curl -s -X POST http://127.0.0.1:8000/sessions | jq
# → {"session_id": "a1b2c3d4e5f6", "model_count": 0, ...}
# Load a model into the session
curl -s -X POST http://127.0.0.1:8000/sessions/a1b2c3d4e5f6/models \
-H "Content-Type: application/json" \
-d '{"model_yaml": "version: 1.0\ndataObjects:\n ..."}' | jq
# → {"model_id": "abcd1234", "data_objects": 2, ...}
# Compile a query
curl -s -X POST http://127.0.0.1:8000/sessions/a1b2c3d4e5f6/query/sql \
-H "Content-Type: application/json" \
-d '{
"model_id": "abcd1234",
"query": {"select": {"dimensions": ["Country"], "measures": ["Revenue"]}},
"dialect": "postgres"
}' | jq .sql
Architecture
YAML Model Query Object
| |
v v
┌───────────┐ ┌──────────────┐
│ Parser │ │ Resolution │ ← Phase 1: resolve refs, select fact table,
│ (ruamel) │ │ │ find join paths, classify filters
└────┬──────┘ └──────┬───────┘
│ │
v v
SemanticModel ResolvedQuery
│ │
│ ┌─────────────┘
│ │
v v
┌───────────────┐
│ Planner │ ← Phase 2: Star Schema or CFL (multi-fact)
│ (star / cfl) │ builds SQL AST with joins, grouping, CTEs
└───────┬───────┘
│
v
SQL AST (Select, Join, Expr...)
│
v
┌───────────────┐
│ Codegen │ ← Phase 3: dialect renders AST to SQL string
│ (dialect) │ handles quoting, time grains, functions
└───────┬───────┘
│
v
SQL String (dialect-specific)
Gradio UI
OrionBelt includes an interactive web UI built with Gradio for exploring and testing the compilation pipeline visually.
Local Development
For local development, the Gradio UI is automatically mounted at /ui on the REST API server when the ui extra is installed:
uv sync --extra ui
uv run orionbelt-api
# → API at http://localhost:8000
# → UI at http://localhost:8000/ui
Standalone Mode
The UI can also run as a separate process, connecting to the API via API_BASE_URL:
uv sync --extra ui
# Start the REST API (required backend)
uv run orionbelt-api &
# Launch the Gradio UI (standalone on port 7860)
API_BASE_URL=http://localhost:8000 uv run orionbelt-ui
API and UI Live Demo Hosting at Google Cloud Run
OrionBelt Semantic Layer API and UI is available as a hosted live demo:
API endpoint: http://35.187.174.102 — Interactive docs: Swagger UI | ReDoc
The API and UI services share a single IP via a Google Cloud Application Load Balancer with path-based routing. Cloud Armor provides WAF protection.
The API and UI are deployed as separate Cloud Run services behind a shared load balancer. The API image (Dockerfile) excludes Gradio for faster cold starts (~2-3s vs ~12s), while the UI image (Dockerfile.ui) connects to the API via API_BASE_URL:
Load Balancer (single IP)
├── /ui/* → orionbelt-ui (Gradio)
└── /* → orionbelt-api (FastAPI)
The UI provides:
- Side-by-side editors — OBML model (YAML) and query (YAML) with syntax highlighting
- Dialect selector — Switch between Postgres, Snowflake, ClickHouse, Dremio, and Databricks
- One-click compilation — Compile button generates formatted SQL output
- SQL validation feedback — Warnings and validation errors from sqlglot are displayed as comments above the generated SQL
- ER Diagram tab — Visualize the semantic model as a Mermaid ER diagram with left-to-right layout, FK annotations, dotted lines for secondary joins, and an adjustable zoom slider
- OSI Import / Export — Import OSI format models (converted to OBML) and export OBML models to OSI format, with validation feedback
- Dark / light mode — Toggle via the header button; all inputs and UI state are persisted across mode switches
The bundled example model (examples/sem-layer.obml.yml) is loaded automatically on startup.
The ER diagram is also available as download (MD, or PNG) or via the REST API.
Docker
Build and Run
Two separate images — API-only (fast) and UI (with Gradio):
# API image (no Gradio, fast cold starts)
docker build -t orionbelt-api .
docker run -p 8080:8080 orionbelt-api
# UI image (Gradio, connects to API)
docker build -f Dockerfile.ui -t orionbelt-ui .
docker run -p 7860:7860 \
-e API_BASE_URL=http://host.docker.internal:8080 \
orionbelt-ui
The API is available at http://localhost:8080. The UI is at http://localhost:7860. Sessions are ephemeral (in-memory, lost on container restart).
Run Integration Tests
# Build image and run 15 endpoint tests
./tests/docker/test_docker.sh
# Skip build (use existing image)
./tests/docker/test_docker.sh --no-build
# Run 30 tests against a live Cloud Run deployment
./tests/cloudrun/test_cloudrun.sh https://orionbelt-semantic-layer-mw2bqg2mva-ew.a.run.app
Configuration
Configuration is via environment variables or a .env file. See .env.example for all options:
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL | INFO | Logging level |
API_SERVER_HOST | localhost | REST API bind host |
API_SERVER_PORT | 8000 | REST API bind port |
PORT | — | Override port (Cloud Run sets this) |
DISABLE_SESSION_LIST | false | Disable GET /sessions endpoint |
SESSION_TTL_SECONDS | 1800 | Session inactivity timeout (30 min) |
SESSION_CLEANUP_INTERVAL | 60 | Cleanup sweep interval (seconds) |
MODEL_FILE | — | Path to OBML YAML for single-model mode |
API_BASE_URL | — | API URL for standalone UI |
ROOT_PATH | — | ASGI root path for UI behind LB |
Single-Model Mode
When MODEL_FILE is set to a path to an OBML YAML file, the server starts in single-model mode:
- The model file is validated at startup (the server refuses to start if it's invalid)
- Every new session is automatically pre-loaded with the configured model
- Model upload (
POST /sessions/{id}/models) and removal (DELETE /sessions/{id}/models/{id}) return 403 Forbidden - All other endpoints (sessions, query, validate, diagram, etc.) work normally
# Start in single-model mode
MODEL_FILE=./examples/sem-layer.obml.yml uv run orionbelt-api
Development
# Install all dependencies (including dev tools)
uv sync
# Run the test suite
uv run pytest
# Lint
uv run ruff check src/
# Type check
uv run mypy src/
# Format code
uv run ruff format src/ tests/
# Build documentation
uv sync --extra docs
uv run mkdocs serve
Documentation
Full documentation is available at the docs site or can be built locally:
uv sync --extra docs
uv run mkdocs serve # http://127.0.0.1:8080
OSI Interoperability
OrionBelt includes a bidirectional converter between OBML and the Open Semantic Interchange (OSI) format. The converter handles the structural differences between the two formats — including metric decomposition, relationship restructuring, and lossless ai_context preservation via customExtensions — with built-in validation for both directions.
The conversion is available via REST API endpoints:
# Convert OSI → OBML
curl -X POST http://127.0.0.1:8000/convert/osi-to-obml \
-H "Content-Type: application/json" \
-d '{"input_yaml": "version: \"0.1.1\"\nsemantic_model:\n ..."}' | jq
# Convert OBML → OSI
curl -X POST http://127.0.0.1:8000/convert/obml-to-osi \
-H "Content-Type: application/json" \
-d '{"input_yaml": "version: 1.0\ndataObjects:\n ..."}' | jq
The Gradio UI also provides Import OSI / Export to OSI buttons that use these API endpoints.
See the OSI ↔ OBML Mapping Analysis for a detailed comparison and conversion reference.
Companion Project
OrionBelt Analytics
OrionBelt Analytics is an ontology-based MCP server that analyzes relational database schemas and generates RDF/OWL ontologies with embedded SQL mappings. It connects to PostgreSQL, Snowflake, and Dremio, providing AI assistants with deep structural and semantic understanding of your data.
Together, the two projects form a powerful combination for AI-guided analytical workflows:
- OrionBelt Analytics gives the AI contextual knowledge of your database schema, relationships, and business semantics
- OrionBelt Semantic Layer ensures correct, optimized SQL generation from business concepts (dimensions, measures, metrics)
By combining both, an AI assistant can navigate your data landscape through ontologies and compile safe, dialect-aware analytical SQL — enabling a seamless end-to-end analytical journey.
License
Copyright 2025 RALFORION d.o.o.
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Related Servers
NSE Ticker MCP Server
Provides access to National Stock Exchange (NSE) data using the Upstox API.
kintone
An MCP server for accessing and managing data on the kintone low-code platform.
Octodet Elasticsearch MCP Server
An MCP server for interacting with Elasticsearch clusters, enabling LLM-powered applications to search, update, and manage data.
MongoDB Lens
Full Featured MCP Server for MongoDB Database.
Notion Content Database
Manage content databases in Notion using the Notion API.
MSSQL MCP Server
Interact with Microsoft SQL Server (MSSQL) databases. List tables, read data, and execute SQL queries with controlled access.
CData Google Sheets MCP Server
A read-only MCP server for Google Sheets, enabling LLMs to query live data using the CData JDBC Driver.
Cryptocurrency Market Data
Provides real-time and historical cryptocurrency market data from major exchanges using the CCXT library.
Teradata MCP Server
Interact with Teradata databases for data queries and business intelligence.
MCP Registry - Lyra
Lyra Registry is a standalone API service that catalogs, scores, and serves metadata for all tools in the Lyra ecosystem. It enables discovery, evaluation, and integration of 800+ crypto, blockchain, DeFi, memecoin, NFT, metaverse, trading tools, MCP tools.