OrionBelt Semantic Layer

API-first engine and MCP server that transforms declarative YAML model definitions into optimized SQL for Postgres, Snowflake, ClickHouse, Dremio, and Databricks

OrionBelt Logo

OrionBelt Semantic Layer

Compile and execute YAML semantic models as analytical SQL across multiple database dialects

GitHub stars Version 1.1.0 Python 3.12+ License: BSL 1.1 Docker Hub FastAPI Pydantic v2 Gradio sqlglot Arrow Flight SQL DB-API 2.0 Ruff mypy

BigQuery PostgreSQL Snowflake ClickHouse Dremio Databricks DuckDB

OrionBelt Semantic Layer is an API-first semantic engine and query planner for AI agents that compiles and executes declarative YAML model definitions as optimized SQL for BigQuery, ClickHouse, Databricks, Dremio, DuckDB/MotherDuck, Postgres, and Snowflake. It provides a unified abstraction over your data warehouse, so analysts and applications can query using business concepts (dimensions, measures, metrics) instead of raw SQL. Every capability — model loading, validation, query compilation and execution, and diagram generation — is exposed through a REST API, making OrionBelt easy to integrate into any application, workflow, or AI assistant.

Features

  • 7 SQL Dialects — BigQuery, ClickHouse, Databricks, Dremio, DuckDB/MotherDuck, Postgres, Snowflake with dialect-specific optimizations
  • AST-Based SQL Generation — Custom SQL AST ensures correct, injection-safe SQL (no string concatenation)
  • OrionBelt ML (OBML) — YAML-based semantic models with data objects, dimensions, measures, metrics, and joins
  • Star Schema & CFL Planning — Automatic join path resolution with Composite Fact Layer support for multi-fact queries and dimension-only queries through intermediate tables
  • Dimension Exclusion — Anti-join queries via dimensionsExclude flag to find non-existing combinations (e.g., directors and producers who never collaborated)
  • Vendor-Specific SQL Validation — Post-generation syntax validation via sqlglot for each target dialect (non-blocking)
  • Validation with Source Positions — Precise error reporting with line/column numbers from YAML source, including join graph analysis (cycle and multipath detection, secondary join constraints)
  • Session Management — TTL-scoped sessions with per-client model stores
  • ER Diagram Generation — Mermaid ER diagrams via API and Gradio UI with theme support, zoom, and secondary join visualization
  • REST API — FastAPI-powered session endpoints for model loading, validation, compilation, execution, diagram generation, and management
  • MCP Server — Available as a separate thin client in orionbelt-semantic-layer-mcp — delegates to the REST API via HTTP, deployable independently (e.g. to Prefect Horizon)
  • Gradio UI — Interactive web interface for model editing, query testing, and SQL compilation with live validation feedback
  • OSI Interoperability — Bidirectional conversion between OBML and the Open Semantic Interchange format via REST API (/convert) and Gradio UI, with validation for both directions
  • DB-API 2.0 Drivers — PEP 249 drivers for all 7 databases with transparent OBML-to-SQL compilation via REST API
  • Arrow Flight SQL — Embedded gRPC server for DBeaver, Tableau, and Power BI — single container, two ports (8080 + 8815)
  • Plugin Architecture — Extensible dialect system with capability flags and registry

Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager

Installation

git clone https://github.com/ralfbecher/orionbelt-semantic-layer.git
cd orionbelt-semantic-layer
uv sync

Run Tests

uv run pytest

Start the REST API Server

uv run orionbelt-api
# or with reload:
uv run uvicorn orionbelt.api.app:create_app --factory --reload

The API is available at http://127.0.0.1:8000. Interactive docs at /docs (Swagger UI) and /redoc.

Interactive Notebook

The Quickstart Notebook walks through the full workflow using TPC-H in DuckDB — explore the model, compile queries across dialects, execute against real data, and see multi-fact CFL and secondary join paths in action. No cloud database needed.

Example

Define a Semantic Model

# yaml-language-server: $schema=schema/obml-schema.json
version: 1.0

dataObjects:
  Customers:
    code: CUSTOMERS
    database: WAREHOUSE
    schema: PUBLIC
    synonyms: [client, buyer, purchaser]
    columns:
      Customer ID:
        code: CUSTOMER_ID
        abstractType: string
      Country:
        code: COUNTRY
        abstractType: string

  Orders:
    code: ORDERS
    database: WAREHOUSE
    schema: PUBLIC
    columns:
      Order ID:
        code: ORDER_ID
        abstractType: string
      Order Customer ID:
        code: CUSTOMER_ID
        abstractType: string
      Price:
        code: PRICE
        abstractType: float
        numClass: non-additive
      Quantity:
        code: QUANTITY
        abstractType: int
        numClass: additive
    joins:
      - joinType: many-to-one
        joinTo: Customers
        columnsFrom:
          - Order Customer ID
        columnsTo:
          - Customer ID

dimensions:
  Country:
    dataObject: Customers
    column: Country
    resultType: string

measures:
  Revenue:
    resultType: float
    aggregation: sum
    expression: "{[Orders].[Price]} * {[Orders].[Quantity]}"
    synonyms: [sales, income, turnover]

The yaml-language-server comment enables schema validation in editors that support it (VS Code with YAML extension, IntelliJ, etc.). The JSON Schema is at schema/obml-schema.json.

Define a Query

Queries select dimensions and measures by their business names:

select:
  dimensions:
    - Country
  measures:
    - Revenue
limit: 100

Compile to SQL (Python)

from orionbelt.compiler.pipeline import CompilationPipeline
from orionbelt.models.query import QueryObject, QuerySelect
from orionbelt.parser.loader import TrackedLoader
from orionbelt.parser.resolver import ReferenceResolver

# Load and parse the model
loader = TrackedLoader()
raw, source_map = loader.load("model.yaml")
model, result = ReferenceResolver().resolve(raw, source_map)

# Define a query
query = QueryObject(
    select=QuerySelect(
        dimensions=["Country"],
        measures=["Revenue"],
    ),
    limit=100,
)

# Compile to SQL
pipeline = CompilationPipeline()
result = pipeline.compile(query, model, "postgres")
print(result.sql)

Generated SQL (Postgres):

SELECT
  "Customers"."COUNTRY" AS "Country",
  SUM("Orders"."PRICE" * "Orders"."QUANTITY") AS "Revenue"
FROM WAREHOUSE.PUBLIC.ORDERS AS "Orders"
LEFT JOIN WAREHOUSE.PUBLIC.CUSTOMERS AS "Customers"
  ON "Orders"."CUSTOMER_ID" = "Customers"."CUSTOMER_ID"
GROUP BY "Customers"."COUNTRY"
LIMIT 100

Change the dialect to "bigquery", "clickhouse", "databricks", "dremio", "duckdb", or "snowflake" to get dialect-specific SQL.

Interactive notebook: Try the full workflow in examples/quickstart.ipynb — uses the TPC-H dataset in DuckDB to demonstrate multi-dialect compilation, query execution, multi-fact CFL, metrics, filters, and ER diagrams.

Use the REST API with Sessions

# Start the server
uv run orionbelt-api

# Create a session
curl -s -X POST http://127.0.0.1:8000/v1/sessions | jq
# → {"session_id": "a1b2c3d4e5f6", "model_count": 0, ...}

# Load a model into the session
curl -s -X POST http://127.0.0.1:8000/v1/sessions/a1b2c3d4e5f6/models \
  -H "Content-Type: application/json" \
  -d '{"model_yaml": "version: 1.0\ndataObjects:\n  ..."}' | jq
# → {"model_id": "abcd1234", "data_objects": 2, ...}

# Compile a query
curl -s -X POST http://127.0.0.1:8000/v1/sessions/a1b2c3d4e5f6/query/sql \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "abcd1234",
    "query": {"select": {"dimensions": ["Country"], "measures": ["Revenue"]}},
    "dialect": "postgres"
  }' | jq .sql

Architecture

YAML Model          Query Object
    |                    |
    v                    v
 ┌───────────┐    ┌──────────────┐
 │  Parser   │    │  Resolution  │  ← Phase 1: resolve refs, select fact table,
 │  (ruamel) │    │              │    find join paths, classify filters
 └────┬──────┘    └──────┬───────┘
      │                  │
      v                  v
 SemanticModel    ResolvedQuery
      │                  │
      │    ┌─────────────┘
      │    │
      v    v
 ┌───────────────┐
 │   Planner     │  ← Phase 2: Star Schema or CFL (multi-fact)
 │  (star / cfl) │    builds SQL AST with joins, grouping, CTEs
 └───────┬───────┘
         │
         v
    SQL AST (Select, Join, Expr...)
         │
         v
 ┌───────────────┐
 │   Codegen     │  ← Phase 3: dialect renders AST to SQL string
 │  (dialect)    │    handles quoting, time grains, functions
 └───────┬───────┘
         │
         v
    SQL String (dialect-specific)

Gradio UI

OrionBelt includes an interactive web UI built with Gradio for exploring and testing the compilation pipeline visually.

Local Development

For local development, the Gradio UI is automatically mounted at /ui on the REST API server when the ui extra is installed:

uv sync --extra ui
uv run orionbelt-api
# → API at http://localhost:8000
# → UI  at http://localhost:8000/ui

Standalone Mode

The UI can also run as a separate process, connecting to the API via API_BASE_URL:

uv sync --extra ui

# Start the REST API (required backend)
uv run orionbelt-api &

# Launch the Gradio UI (standalone on port 7860)
API_BASE_URL=http://localhost:8000 uv run orionbelt-ui

API and UI Live Demo Hosting at Google Cloud Run

OrionBelt Semantic Layer API and UI is available as a hosted live demo:

http://35.187.174.102/ui

API endpoint: http://35.187.174.102 — Interactive docs: Swagger UI | ReDoc

The API and UI services share a single IP via a Google Cloud Application Load Balancer with path-based routing. Cloud Armor provides WAF protection.

The API and UI are deployed as separate Cloud Run services behind a shared load balancer. The API image (Dockerfile) excludes Gradio for faster cold starts (~2-3s vs ~12s), while the UI image (Dockerfile.ui) connects to the API via API_BASE_URL:

Load Balancer (single IP)
  ├── /ui/*     → orionbelt-ui   (Gradio)
  └── /*        → orionbelt-api  (FastAPI)

SQL Compiler in Gradio UI (dark mode)

The UI provides:

  • Side-by-side editors — OBML model (YAML) and query (YAML) with syntax highlighting
  • Dialect selector — Switch between BigQuery, ClickHouse, Databricks, Dremio, DuckDB, Postgres, and Snowflake
  • One-click compilation — Compile button generates formatted SQL output
  • SQL validation feedback — Warnings and validation errors from sqlglot are displayed as comments above the generated SQL
  • ER Diagram tab — Visualize the semantic model as a Mermaid ER diagram with left-to-right layout, FK annotations, dotted lines for secondary joins, and an adjustable zoom slider
  • OSI Import / Export — Import OSI format models (converted to OBML) and export OBML models to OSI format, with validation feedback
  • Dark / light mode — Toggle via the header button; all inputs and UI state are persisted across mode switches

The bundled example model (examples/sem-layer.obml.yml) is loaded automatically on startup.

ER Diagram in Gradio UI (dark mode)

The ER diagram is also available as download (MD, or PNG) or via the REST API.

Docker

Docker Hub

Pre-built multi-platform images (linux/amd64, linux/arm64) are available on Docker Hub:

# API-only (REST API on :8080)
docker pull ralforion/orionbelt-api
docker run -p 8080:8080 ralforion/orionbelt-api

# API + Arrow Flight SQL (REST on :8080, Flight on :8815)
docker pull ralforion/orionbelt-flight
docker run -p 8080:8080 -p 8815:8815 --env-file .env ralforion/orionbelt-flight

# UI (Gradio on :7860, connects to API)
docker pull ralforion/orionbelt-ui
docker run -p 7860:7860 \
  -e API_BASE_URL=http://host.docker.internal:8080 \
  ralforion/orionbelt-ui

See docs/drivers.md for Flight SQL configuration and BI tool setup (DBeaver, Tableau, Power BI).

Build and Run (from source)

Two separate images — API-only (fast) and UI (with Gradio):

# API image (no Gradio, fast cold starts)
docker build -t orionbelt-api .
docker run -p 8080:8080 orionbelt-api

# UI image (Gradio, connects to API)
docker build -f Dockerfile.ui -t orionbelt-ui .
docker run -p 7860:7860 \
  -e API_BASE_URL=http://host.docker.internal:8080 \
  orionbelt-ui

The API is available at http://localhost:8080. The UI is at http://localhost:7860. Sessions are ephemeral (in-memory, lost on container restart).

Run Integration Tests

# Build image and run 15 endpoint tests
./tests/docker/test_docker.sh

# Skip build (use existing image)
./tests/docker/test_docker.sh --no-build

# Run 30 tests against a live Cloud Run deployment
./tests/cloudrun/test_cloudrun.sh https://orionbelt-semantic-layer-mw2bqg2mva-ew.a.run.app

DB-API 2.0 Drivers & Arrow Flight SQL

OrionBelt provides PEP 249 DB-API 2.0 drivers for 7 databases and an Arrow Flight SQL server that enables BI tools like DBeaver, Tableau, and Power BI to run OBML queries directly.

PackageDatabaseNative ConnectorArrow Support
ob-driver-core— (shared foundation)
ob-bigqueryBigQuerygoogle-cloud-bigqueryto_arrow()
ob-duckdbDuckDBduckdbfetch_arrow_table()
ob-postgresPostgreSQLadbc-driver-postgresqlADBC native
ob-snowflakeSnowflakesnowflake-connector-pythonfetch_arrow_all()
ob-clickhouseClickHouseclickhouse-connectquery_arrow()
ob-dremioDremiopyarrow.flightFlight native
ob-databricksDatabricksdatabricks-sql-connectorfetchall_arrow()
ob-flight-extensionArrow Flight SQL serverpyarrow.flight

All drivers work against the OrionBelt REST API in single-model mode (MODEL_FILE set). OBML queries are compiled transparently via POST /v1/query/sql — the user writes OBML, the driver returns SQL results. Plain SQL queries bypass the API entirely.

import ob_duckdb

conn = ob_duckdb.connect(database=":memory:")
with conn.cursor() as cur:
    # OBML query — compiled via API, executed on DuckDB
    cur.execute("select:\n  dimensions:\n    - Region\n  measures:\n    - Revenue\n")
    print(cur.fetchall())

The Arrow Flight SQL server (ob-flight-extension) runs inside the API process as a daemon thread, enabling JDBC/ODBC BI tools to connect directly. It is designed for on-premise or hybrid deployments — Cloud Run uses the standard API-only image.

# On-premise with Flight SQL enabled
docker build -f Dockerfile.flight -t orionbelt-flight .
docker run -p 8080:8080 -p 8815:8815 \
  -v /path/to/models/:/app/models/ \
  --env-file .env \
  orionbelt-flight

See Drivers Documentation for full usage examples, connect() parameters, Flight SQL configuration, Docker Compose setup, and DBeaver/Tableau instructions.

Configuration

Configuration is via environment variables or a .env file. See .env.template for all options:

VariableDefaultDescription
LOG_LEVELINFOLogging level
LOG_FORMATconsoleconsole (pretty) or json (structured)
API_SERVER_HOSTlocalhostREST API bind host
API_SERVER_PORT8000REST API bind port
PORTOverride port (Cloud Run sets this)
DISABLE_SESSION_LISTfalseDisable GET /sessions endpoint
SESSION_TTL_SECONDS1800Session inactivity timeout (30 min)
SESSION_CLEANUP_INTERVAL60Cleanup sweep interval (seconds)
MODEL_FILEPath to OBML YAML for single-model mode
API_BASE_URLAPI URL for standalone UI
ROOT_PATHASGI root path for UI behind LB
FLIGHT_ENABLEDfalseEnable Flight SQL + query execution
FLIGHT_PORT8815Arrow Flight SQL gRPC port
FLIGHT_AUTH_MODEnonenone or token
FLIGHT_API_TOKENStatic token (when auth mode = token)
DB_VENDORduckdbDatabase vendor for query execution

Single-Model Mode

When MODEL_FILE is set to a path to an OBML YAML file, the server starts in single-model mode:

  • The model file is validated at startup (the server refuses to start if it's invalid)
  • Every new session is automatically pre-loaded with the configured model
  • Model upload (POST /v1/sessions/{id}/models) and removal (DELETE /v1/sessions/{id}/models/{id}) return 403 Forbidden
  • All other endpoints (sessions, query, validate, diagram, etc.) work normally
# Start in single-model mode
MODEL_FILE=./examples/sem-layer.obml.yml uv run orionbelt-api

Development

# Install all dependencies (including dev tools)
uv sync

# Run the test suite
uv run pytest

# Lint
uv run ruff check src/

# Type check
uv run mypy src/

# Format code
uv run ruff format src/ tests/

# Build documentation
uv sync --extra docs
uv run mkdocs serve

Documentation

Full documentation is available at the docs site or can be built locally:

uv sync --extra docs
uv run mkdocs serve   # http://127.0.0.1:8080

OSI Interoperability

OrionBelt includes a bidirectional converter between OBML and the Open Semantic Interchange (OSI) format. The converter handles the structural differences between the two formats — including metric decomposition, relationship restructuring, and lossless ai_context preservation via customExtensions — with built-in validation for both directions.

The conversion is available via REST API endpoints:

# Convert OSI → OBML
curl -X POST http://127.0.0.1:8000/v1/convert/osi-to-obml \
  -H "Content-Type: application/json" \
  -d '{"input_yaml": "version: \"0.1.1\"\nsemantic_model:\n  ..."}' | jq

# Convert OBML → OSI
curl -X POST http://127.0.0.1:8000/v1/convert/obml-to-osi \
  -H "Content-Type: application/json" \
  -d '{"input_yaml": "version: 1.0\ndataObjects:\n  ..."}' | jq

The Gradio UI also provides Import OSI / Export to OSI buttons that use these API endpoints.

See the OSI ↔ OBML Mapping Analysis for a detailed comparison and conversion reference.

Companion Project

OrionBelt Analytics

OrionBelt Analytics is an ontology-based MCP server that analyzes relational database schemas and generates RDF/OWL ontologies with embedded SQL mappings. It connects to PostgreSQL, Snowflake, and Dremio, providing AI assistants with deep structural and semantic understanding of your data.

OrionBelt Analytics Architecture

Together, the two projects form a powerful combination for AI-guided analytical workflows:

  • OrionBelt Analytics gives the AI contextual knowledge of your database schema, relationships, and business semantics
  • OrionBelt Semantic Layer ensures correct, optimized SQL compilation and execution from business concepts (dimensions, measures, metrics)

By combining both, an AI assistant can navigate your data landscape through ontologies and compile safe, dialect-aware analytical SQL — enabling a seamless end-to-end analytical journey.

License

Copyright 2025 RALFORION d.o.o.

Licensed under the Business Source License 1.1. The Licensed Work will convert to Apache License 2.0 on 2030-03-16.

By contributing to this project, you agree to the Contributor License Agreement.


RALFORION d.o.o.

相关服务器