airflow-plugins

作者: astronomer

构建嵌入FastAPI应用、自定义UI页面、React组件、中间件、宏和操作符链接的Airflow 3.1+插件,直接集成到Airflow UI中。使用…

npx skills add https://github.com/astronomer/agents --skill airflow-plugins

Airflow 3 Plugins

Airflow 3 plugins let you embed FastAPI apps, React UIs, middleware, macros, operator buttons, and custom timetables directly into the Airflow process. No sidecar, no extra server.

CRITICAL: Plugin components (fastapi_apps, react_apps, external_views) require Airflow 3.1+. NEVER import flask, flask_appbuilder, or use appbuilder_views / flask_blueprints — these are Airflow 2 patterns and will not work in Airflow 3. If existing code uses them, rewrite the entire registration block using FastAPI.

Security: FastAPI plugin endpoints are not automatically protected by Airflow auth. If your endpoints need to be private, implement authentication explicitly using FastAPI's security utilities.

Restart required: Changes to Python plugin files require restarting the API server. Static file changes (HTML, JS, CSS) are picked up immediately. Set AIRFLOW__CORE__LAZY_LOAD_PLUGINS=False during development to load plugins at startup rather than lazily.

Relative paths always: In external_views, href must have no leading slash. In HTML and JavaScript, use relative paths for all assets and fetch() calls. Absolute paths break behind reverse proxies.

Before writing any code, verify

  1. Am I using fastapi_apps / FastAPI — not appbuilder_views / Flask?
  2. Are all HTML/JS asset paths and fetch() calls relative (no leading slash)?
  3. Are all synchronous SDK or SQLAlchemy calls wrapped in asyncio.to_thread()?
  4. Do the static/ and assets/ directories exist before the FastAPI app mounts them?
  5. If the endpoint must be private, did I add explicit FastAPI authentication?

Step 1: Choose plugin components

A single plugin class can register multiple component types at once.

ComponentWhat it doesField
Custom API endpointsFastAPI app mounted in Airflow processfastapi_apps
Nav / page linkEmbeds a URL as an iframe or links outexternal_views
React componentCustom React app embedded in Airflow UIreact_apps
API middlewareIntercepts all Airflow API requests/responsesfastapi_root_middlewares
Jinja macrosReusable Python functions in DAG templatesmacros
Task instance buttonExtra link button in task Detail viewoperator_extra_links / global_operator_extra_links
Custom timetableCustom scheduling logictimetables
Event hooksListener callbacks for Airflow eventslisteners

Step 2: Plugin registration skeleton

Project file structure

Give each plugin its own subdirectory under plugins/ — this keeps the Python file, static assets, and templates together and makes multi-plugin projects manageable:

plugins/
  my-plugin/
    plugin.py       # AirflowPlugin subclass — auto-discovered by Airflow
    static/
      index.html
      app.js
    assets/
      icon.svg

BASE_DIR = Path(__file__).parent in plugin.py resolves to plugins/my-plugin/ — static and asset paths will be correct relative to that. Create the subdirectory and any static/assets folders before starting Airflow, or StaticFiles will raise on import.

from pathlib import Path
from airflow.plugins_manager import AirflowPlugin
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.responses import FileResponse

BASE_DIR = Path(__file__).parent

app = FastAPI(title="My Plugin")

# Both directories must exist before Airflow starts or FastAPI raises on import
app.mount("/static", StaticFiles(directory=BASE_DIR / "static"), name="static")
app.mount("/assets", StaticFiles(directory=BASE_DIR / "assets"), name="assets")


class MyPlugin(AirflowPlugin):
    name = "my_plugin"

    fastapi_apps = [
        {
            "app": app,
            "url_prefix": "/my-plugin",   # plugin available at {AIRFLOW_HOST}/my-plugin/
            "name": "My Plugin",
        }
    ]

    external_views = [
        {
            "name": "My Plugin",
            "href": "my-plugin/ui",              # NO leading slash — breaks on Astro and reverse proxies
            "destination": "nav",                # see locations table below
            "category": "browse",                # nav bar category (nav destination only)
            "url_route": "my-plugin",            # unique route name (required for React apps)
            "icon": "/my-plugin/static/icon.svg" # DOES use a leading slash — served by FastAPI
        }
    ]

External view locations

destinationWhere it appears
"nav"Left navigation bar (also set category)
"dag"Extra tab on every Dag page
"dag_run"Extra tab on every Dag run page
"task"Extra tab on every task page
"task_instance"Extra tab on every task instance page

Nav bar categories (destination: "nav")

Set "category" to place the link under a specific nav group: "browse", "admin", or omit for top-level.

External URLs and minimal plugins

href can be a relative path to an internal endpoint ("my-plugin/ui") or a full external URL. A plugin with only external_views and no fastapi_apps is valid — no backend needed for a simple link or tab:

from airflow.plugins_manager import AirflowPlugin

class LearnViewPlugin(AirflowPlugin):
    name = "learn_view_plugin"

    external_views = [
        {
            "name": "Learn Airflow 3",
            "href": "https://www.astronomer.io/docs/learn",
            "destination": "dag",   # adds a tab to every Dag page
            "url_route": "learn"
        }
    ]

The no-leading-slash rule applies to internal paths only — full https:// URLs are fine.


Step 3: Serve the UI entry point

@app.get("/ui", response_class=FileResponse)
async def serve_ui():
    return FileResponse(BASE_DIR / "static" / "index.html")

In HTML, always use relative paths. Absolute paths break when Airflow is mounted at a sub-path:

<!-- correct -->
<link rel="stylesheet" href="static/app.css" />
<script src="static/app.js?v=20240315"></script>

<!-- breaks behind a reverse proxy -->
<script src="/my-plugin/static/app.js"></script>

Same rule in JavaScript:

fetch('api/dags')           // correct — relative to current page
fetch('/my-plugin/api/dags') // breaks on Astro and sub-path deploys

Step 4: Call the Airflow API from your plugin

Only needed if your plugin calls the Airflow REST API. Plugins that only serve static files, register external_views, or use direct DB access do not need this step — skip to Step 5 or Step 6.

Add the dependency

Only if REST API communication is being implemented: add apache-airflow-client to the project's dependencies. Check which file exists and act accordingly:

File foundAction
requirements.txtAppend apache-airflow-client
pyproject.toml (uv / poetry)uv add apache-airflow-client or poetry add apache-airflow-client
None of the aboveTell the user: "Add apache-airflow-client to your dependencies before running the plugin."

Use apache-airflow-client to talk to Airflow's own REST API. The SDK is synchronous but FastAPI routes are async — never call blocking SDK methods directly inside async def or you will stall the event loop and freeze all concurrent requests.

JWT token management

Cache one token per process. Refresh 5 minutes before the 1-hour expiry. Use double-checked locking so multiple concurrent requests don't all race to refresh simultaneously:

Replace MYPLUGIN_ with a short uppercase prefix derived from the plugin name (e.g. if the plugin is called "Trip Analyzer", use TRIP_ANALYZER_). If no plugin name has been given yet, ask the user before writing env var names.

import asyncio
import os
import threading
import time
import airflow_client.client as airflow_sdk
import requests

AIRFLOW_HOST  = os.environ.get("MYPLUGIN_HOST",     "http://localhost:8080")
AIRFLOW_USER  = os.environ.get("MYPLUGIN_USERNAME", "admin")
AIRFLOW_PASS  = os.environ.get("MYPLUGIN_PASSWORD", "admin")
AIRFLOW_TOKEN = os.environ.get("MYPLUGIN_TOKEN")    # Astronomer Astro: Deployment API token

_cached_token: str | None = None
_token_expires_at: float  = 0.0
_token_lock = threading.Lock()


def _fetch_fresh_token() -> str:
    """Exchange username/password for a JWT via Airflow's auth endpoint."""
    response = requests.post(
        f"{AIRFLOW_HOST}/auth/token",
        json={"username": AIRFLOW_USER, "password": AIRFLOW_PASS},
        timeout=10,
    )
    response.raise_for_status()
    return response.json()["access_token"]


def _get_token() -> str:
    # Astronomer Astro production: use static Deployment API token directly
    if AIRFLOW_TOKEN:
        return AIRFLOW_TOKEN
    global _cached_token, _token_expires_at
    now = time.monotonic()
    # Fast path — no lock if still valid
    if _cached_token and now < _token_expires_at:
        return _cached_token
    # Slow path — one thread refreshes, others wait
    with _token_lock:
        if _cached_token and now < _token_expires_at:
            return _cached_token
        _cached_token = _fetch_fresh_token()
        _token_expires_at = now + 55 * 60  # refresh 5 min before 1-hour expiry
    return _cached_token


def _make_config() -> airflow_sdk.Configuration:
    config = airflow_sdk.Configuration(host=AIRFLOW_HOST)
    config.access_token = _get_token()
    return config

After implementing auth, tell the user:

  • Local development: set MYPLUGIN_USERNAME and MYPLUGIN_PASSWORD in .env — JWT exchange happens automatically.

  • Astronomer Astro (production): create a Deployment API token and set it as MYPLUGIN_TOKEN — the JWT exchange is skipped entirely:

    1. Astro UI → open the Deployment → AccessAPI Tokens+ Deployment API Token
    2. Copy the token value (shown only once)
    3. astro deployment variable create MYPLUGIN_TOKEN=<token>

    MYPLUGIN_USERNAME and MYPLUGIN_PASSWORD are not needed on Astro.

Wrapping SDK calls with asyncio.to_thread

from fastapi import HTTPException
from airflow_client.client.api import DAGApi

@app.get("/api/dags")
async def list_dags():
    try:
        def _fetch():
            with airflow_sdk.ApiClient(_make_config()) as client:
                return DAGApi(client).get_dags(limit=100).dags
        dags = await asyncio.to_thread(_fetch)
        return [{"dag_id": d.dag_id, "is_paused": d.is_paused, "timetable_summary": d.timetable_summary} for d in dags]
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

API field names: Never guess response field names — verify against the REST API reference. Key DAGResponse fields: dag_id, dag_display_name, description, is_paused, timetable_summary, timetable_description, fileloc, owners, tags.

The pattern is always: define a plain inner def _fetch() with all SDK logic, then await asyncio.to_thread(_fetch).

Alternative: Direct database access

Warning — use with caution and tell the user. The Airflow metadb is not a public interface. Direct writes or poorly-formed queries can corrupt scheduler state. Whenever you use this pattern, explicitly tell the user: "This accesses Airflow's internal database directly. The internal models are not part of the public API, can change between Airflow versions, and incorrect queries can cause issues in the metadb. Prefer apache-airflow-client unless the operation is not exposed via the REST API."

Since FastAPI plugin endpoints run inside the API server process (not in a task worker), they have direct access to Airflow's internal SQLAlchemy models — no HTTP round-trip or JWT needed. Use only for read operations not exposed via the REST API, or when the extra HTTP overhead genuinely matters. Always wrap DB calls in asyncio.to_thread() — SQLAlchemy queries are blocking.

from airflow.models import DagBag, DagModel
from airflow.utils.db import provide_session

@app.get("/api/dags/status")
async def dag_status():
    def _fetch():
        @provide_session
        def _query(session=None):
            dagbag = DagBag()
            paused = sum(
                1 for dag_id in dagbag.dags
                if (m := session.query(DagModel).filter(DagModel.dag_id == dag_id).first())
                and m.is_paused
            )
            return {"total": len(dagbag.dags), "paused": paused}
        return _query()
    return await asyncio.to_thread(_fetch)

Step 5: Common API endpoint patterns

If you need an SDK method or field not shown in the examples below, verify it before generating code — do not guess. Either run python3 -c "from airflow_client.client.api import <Class>; print([m for m in dir(<Class>) if not m.startswith('_')])" in any environment where the SDK is installed, or search the apache/airflow-client-python repo for the class definition.

from airflow_client.client.api import DAGApi, DagRunApi
from airflow_client.client.models import TriggerDAGRunPostBody, DAGPatchBody


@app.post("/api/dags/{dag_id}/trigger")
async def trigger_dag(dag_id: str):
    def _run():
        with airflow_sdk.ApiClient(_make_config()) as client:
            return DagRunApi(client).trigger_dag_run(dag_id, TriggerDAGRunPostBody())
    result = await asyncio.to_thread(_run)
    return {"run_id": result.dag_run_id, "state": normalize_state(result.state)}


@app.patch("/api/dags/{dag_id}/pause")
async def toggle_pause(dag_id: str, is_paused: bool):
    def _run():
        with airflow_sdk.ApiClient(_make_config()) as client:
            DAGApi(client).patch_dag(dag_id, DAGPatchBody(is_paused=is_paused))
    await asyncio.to_thread(_run)
    return {"dag_id": dag_id, "is_paused": is_paused}


@app.delete("/api/dags/{dag_id}")
async def delete_dag(dag_id: str):
    def _run():
        with airflow_sdk.ApiClient(_make_config()) as client:
            DAGApi(client).delete_dag(dag_id)
    await asyncio.to_thread(_run)
    return {"deleted": dag_id}


def normalize_state(raw) -> str:
    """Convert SDK enum objects to plain strings before sending to the frontend."""
    if raw is None:
        return "never_run"
    return str(raw).lower()

DAG runs, task instances, and logs

These are the most common calls beyond basic DAG CRUD. For anything not shown here, consult the REST API reference for available endpoints and the matching Python SDK class/method names.

from airflow_client.client.api import DagRunApi, TaskInstanceApi

# Latest run for a DAG
@app.get("/api/dags/{dag_id}/runs/latest")
async def latest_run(dag_id: str):
    def _fetch():
        with airflow_sdk.ApiClient(_make_config()) as client:
            runs = DagRunApi(client).get_dag_runs(dag_id, limit=1, order_by="-start_date").dag_runs
            return runs[0] if runs else None
    run = await asyncio.to_thread(_fetch)
    if not run:
        return {"state": "never_run"}
    return {"run_id": run.dag_run_id, "state": normalize_state(run.state)}


# Task instances for a specific run
@app.get("/api/dags/{dag_id}/runs/{run_id}/tasks")
async def task_instances(dag_id: str, run_id: str):
    def _fetch():
        with airflow_sdk.ApiClient(_make_config()) as client:
            return TaskInstanceApi(client).get_task_instances(dag_id, run_id).task_instances
    tasks = await asyncio.to_thread(_fetch)
    return [{"task_id": t.task_id, "state": normalize_state(t.state)} for t in tasks]


# Task log (try_number starts at 1)
@app.get("/api/dags/{dag_id}/runs/{run_id}/tasks/{task_id}/logs/{try_number}")
async def task_log(dag_id: str, run_id: str, task_id: str, try_number: int):
    def _fetch():
        with airflow_sdk.ApiClient(_make_config()) as client:
            return TaskInstanceApi(client).get_log(
                dag_id, run_id, task_id, try_number, map_index=-1
            )
    result = await asyncio.to_thread(_fetch)
    return {"log": result.content if hasattr(result, "content") else str(result)}

Streaming proxy

Use StreamingResponse to proxy binary content from an external URL through the plugin — useful when the browser can't fetch the resource directly (CORS, auth, etc.):

import requests
from starlette.responses import StreamingResponse

@app.get("/api/files/{filename}")
async def proxy_file(filename: str):
    def _stream():
        r = requests.get(f"https://files.example.com/{filename}", stream=True)
        r.raise_for_status()
        return r
    response = await asyncio.to_thread(_stream)
    return StreamingResponse(
        response.iter_content(chunk_size=8192),
        media_type="application/octet-stream",
        headers={"Content-Disposition": f'attachment; filename="{filename}"'},
    )

Note that requests.get() is blocking — fetch in asyncio.to_thread so the event loop isn't stalled while waiting for the remote server.


Step 6: Other plugin component types

Macros

Macros are loaded by the scheduler (and DAG processor), not the API server. Restart the scheduler after changes.

from airflow.plugins_manager import AirflowPlugin

def format_confidence(confidence: float) -> str:
    return f"{confidence * 100:.2f}%"

class MyPlugin(AirflowPlugin):
    name = "my_plugin"
    macros = [format_confidence]

Use in any templated field — including with XCom:

{{ macros.my_plugin.format_confidence(0.95) }}

{{ macros.my_plugin.format_confidence(ti.xcom_pull(task_ids='score_task')['confidence']) }}

The naming pattern is always macros.{plugin_name}.{function_name}.

Middleware

Middleware applies to all Airflow API requests, including the built-in REST API and any FastAPI plugins. Use sparingly and filter requests explicitly if needed:

from starlette.middleware.base import BaseHTTPMiddleware
from fastapi import Request, Response

class AuditMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next) -> Response:
        # runs before every request to the Airflow API server
        response = await call_next(request)
        return response

class MyPlugin(AirflowPlugin):
    name = "my_plugin"
    fastapi_root_middlewares = [
        {"middleware": AuditMiddleware, "args": [], "kwargs": {}, "name": "Audit"}
    ]

Operator extra links

from airflow.sdk.bases.operatorlink import BaseOperatorLink

class MyDashboardLink(BaseOperatorLink):
    name = "Open in Dashboard"

    def get_link(self, operator, *, ti_key, **context) -> str:
        return f"https://my-dashboard.example.com/tasks/{ti_key.task_id}"

class MyPlugin(AirflowPlugin):
    name = "my_plugin"
    global_operator_extra_links = [MyDashboardLink()]  # appears on every task
    # operator_extra_links = [MyDashboardLink()]       # attach to specific operator instead

React apps

React apps are embedded as JavaScript bundles served via FastAPI. The bundle must expose itself as a global variable matching the plugin name:

// In your bundle (e.g. my-app.js)
globalThis['My Plugin'] = MyComponent;   // matches plugin name
globalThis.AirflowPlugin = MyComponent;  // fallback Airflow looks for
class MyPlugin(AirflowPlugin):
    name = "my_plugin"
    fastapi_apps = [{"app": app, "url_prefix": "/my-plugin", "name": "My Plugin"}]
    react_apps = [
        {
            "name": "My Plugin",
            "bundle_url": "/my-plugin/my-app.js",
            "destination": "nav",
            "category": "browse",
            "url_route": "my-plugin",
        }
    ]

The same bundle can be registered to multiple destinations by adding multiple entries — each needs a unique url_route:

react_apps = [
    {"name": "My Widget", "bundle_url": "/my-plugin/widget.js", "destination": "nav",  "url_route": "my-widget-nav"},
    {"name": "My Widget", "bundle_url": "/my-plugin/widget.js", "destination": "dag",  "url_route": "my-widget-dag"},
]

React app integration is experimental in Airflow 3.1. Interfaces may change in future releases.


Step 7: Environment variables and deployment

Never hardcode credentials:

AIRFLOW_HOST = os.environ.get("MYPLUGIN_HOST",     "http://localhost:8080")
AIRFLOW_USER = os.environ.get("MYPLUGIN_USERNAME", "admin")
AIRFLOW_PASS = os.environ.get("MYPLUGIN_PASSWORD", "admin")

Local Astro CLI:

# .env
MYPLUGIN_HOST=http://localhost:8080
MYPLUGIN_USERNAME=admin
MYPLUGIN_PASSWORD=admin
astro dev restart              # required after any Python plugin change

# Check logs by component (Astro CLI):
astro dev logs --api-server    # FastAPI apps, external_views — plugin import errors show here
astro dev logs --scheduler     # macros, timetables, listeners, operator links
astro dev logs --dag-processor # DAG parsing errors

# Non-Astro:
airflow plugins                # CLI — lists all loaded plugins

Production Astronomer:

astro deployment variable create --deployment-id <id> MYPLUGIN_HOST=https://airflow.example.com

Auto-reload during development (skips lazy loading):

AIRFLOW__CORE__LAZY_LOAD_PLUGINS=False

Cache busting for static files after deploy:

<script src="static/app.js?v=20240315-1"></script>

Verify the plugin loaded: open Admin > Plugins in the Airflow UI.

OpenAPI docs are auto-generated for FastAPI plugins:

  • Swagger UI: {AIRFLOW_HOST}/{url_prefix}/docs
  • OpenAPI JSON: {AIRFLOW_HOST}/{url_prefix}/openapi.json

Common pitfalls

ProblemCauseFix
Nav link goes to 404Leading / in href"my-plugin/ui" not "/my-plugin/ui"
Nav icon not showingMissing / in iconicon takes an absolute path: "/my-plugin/static/icon.svg"
Event loop freezes under loadSync SDK called directly in async defWrap with asyncio.to_thread()
401 errors after 1 hourJWT expires with no refreshUse the 5-minute pre-expiry refresh pattern
StaticFiles raises on startupDirectory missingCreate assets/ and static/ before starting
Plugin not showing upPython file changed without restartastro dev restart
Endpoints accessible without loginFastAPI apps are not auto-authenticatedAdd FastAPI security (e.g. OAuth2, API key) if endpoints must be private
Middleware affecting wrong routesMiddleware applies to all API trafficFilter by request.url.path inside dispatch()
JS fetch() breaks on AstroAbsolute path in fetch()Always use relative paths: fetch('api/dags')

References

来自 astronomer 的更多技能

airflow
astronomer
查询、管理和排查Apache Airflow的DAG、运行记录、任务及系统配置。支持30多种命令,涵盖DAG检查、运行管理、任务日志、配置查询及直接REST API访问。通过持久化配置管理多个Airflow实例;自动发现本地和Astro部署。同步(等待完成)或异步触发DAG运行,诊断故障,清除运行记录以重试,并通过重试/映射索引过滤访问任务日志。输出...
official
airflow-hitl
astronomer
在Airflow DAG中使用可延迟操作符实现人工审批关卡、表单输入和分支。四种操作符类型:用于批准/拒绝决策的ApprovalOperator、带表单的多选项选择HITLOperator、人工驱动的任务路由HITLBranchOperator,以及表单数据收集HITLEntryOperator。所有操作符均为可延迟设计,在通过Airflow UI的"必需操作"标签页或REST API等待人工响应时释放工作槽位。支持包括自定义在内的可选功能...
official
analyzing-data
astronomer
查询数据仓库,利用缓存的模式和概念映射来回答业务问题。支持对重复问题类型进行模式查找和缓存,并通过记录结果来改进后续查询。包含概念到表的映射缓存,以及通过INFORMATION_SCHEMA或代码库grep进行表结构发现。提供run_sql()和run_sql_pandas()内核函数,返回Polars或Pandas DataFrame用于分析。提供CLI命令用于管理概念、模式和表缓存,以及...
official
annotating-task-lineage
astronomer
使用入口和出口为Airflow任务标注数据血缘。支持使用OpenLineage Dataset对象、Airflow Assets和Airflow Datasets定义跨数据库、数据仓库及云存储的输入输出。当运算符缺少内置OpenLineage提取器时作为备用方案;遵循四级优先级系统,其中自定义提取器和OpenLineage方法优先。包含针对Snowflake、BigQuery、S3和PostgreSQL的数据集命名辅助工具,以确保一致性...
official
authoring-dags
astronomer
创建Apache Airflow DAG的引导式工作流,集成验证与测试。采用六阶段结构化方法:发现环境与现有模式、规划DAG结构、遵循最佳实践实现、通过af CLI命令验证、经用户同意测试、迭代修复。用于发现(af config connections、af config providers、af dags list)和验证(af dags errors、af dags get、af dags explore)的CLI命令可提供DAG的即时反馈...
official
blueprint
astronomer
使用Pydantic验证定义可复用的Airflow任务组模板,并从YAML组合DAG。适用于创建blueprint模板、从YAML组合DAG等场景。
official
checking-freshness
astronomer
通过检查表时间戳和更新模式与陈旧度标尺对比,验证数据新鲜度。使用常见ETL命名模式(如_loaded_at、_updated_at、created_at等)识别时间戳列,并查询其最大值以确定数据时效。将数据分为四种新鲜度状态:新鲜(<4小时)、陈旧(4–24小时)、非常陈旧(>24小时)或未知(未找到时间戳)。提供SQL模板,用于检查最近几天的最后更新时间及行数变化趋势。
official
cosmos-dbt-core
astronomer
使用Astronomer Cosmos将dbt Core项目转换为Airflow DAG或TaskGroup。支持三种组装模式:独立的DbtDag、现有DAG中的DbtTaskGroup,以及用于精细控制的独立Cosmos运算符。根据隔离和性能需求,从八种执行模式(WATCHER、LOCAL、VIRTUALENV、KUBERNETES、AIRFLOW_ASYNC等)中选择。提供三种解析策略(dbt_manifest、dbt_ls、dbt_ls_file、自动),以平衡速度和选择器复杂度...
official