A

Skills de Astronomer

airflow
by astronomer
Query, manage, and troubleshoot Apache Airflow DAGs, runs, tasks, and system configuration. Supports 30+ commands across DAG inspection, run management, task logging, configuration queries, and direct REST API access Manage multiple Airflow instances with persistent configuration; auto-discover local and Astro deployments Trigger DAG runs synchronously (wait for completion) or asynchronously, diagnose failures, clear runs for retry, and access task logs with retry/map-index filtering Output...
airflow-hitl
by astronomer
Human approval gates, form inputs, and branching in Airflow DAGs using deferrable operators. Four operator types: ApprovalOperator for approve/reject decisions, HITLOperator for multi-option selection with forms, HITLBranchOperator for human-driven task routing, and HITLEntryOperator for form data collection All operators are deferrable, releasing worker slots while awaiting human response via Airflow UI's Required Actions tab or REST API Supports optional features including custom...
airflow-plugins
by astronomer
Build Airflow 3.1+ plugins that embed FastAPI apps, custom UI pages, React components, middleware, macros, and operator links directly into the Airflow UI. Use…
analyzing-data
by astronomer
Query your data warehouse to answer business questions with cached patterns and concept mappings. Supports pattern lookup and caching for repeated question types, with outcome recording to improve future queries Includes concept-to-table mapping cache and table schema discovery via INFORMATION_SCHEMA or codebase grep Provides run_sql() and run_sql_pandas() kernel functions returning Polars or Pandas DataFrames for analysis CLI commands for managing concept, pattern, and table caches, plus...
annotating-task-lineage
by astronomer
Annotate Airflow tasks with data lineage using inlets and outlets. Supports OpenLineage Dataset objects, Airflow Assets, and Airflow Datasets for defining inputs and outputs across databases, data warehouses, and cloud storage Use as a fallback when operators lack built-in OpenLineage extractors; follows a four-tier precedence system where custom extractors and OpenLineage methods take priority Includes dataset naming helpers for Snowflake, BigQuery, S3, and PostgreSQL to ensure consistent...
authoring-dags
by astronomer
Guided workflow for creating Apache Airflow DAGs with validation and testing integration. Structured six-phase approach: discover environment and existing patterns, plan DAG structure, implement following best practices, validate with af CLI commands, test with user consent, and iterate on fixes CLI commands for discovery ( af config connections , af config providers , af dags list ) and validation ( af dags errors , af dags get , af dags explore ) provide immediate feedback on DAG...
blueprint
by astronomer
Define reusable Airflow task group templates with Pydantic validation and compose DAGs from YAML. Use when creating blueprint templates, composing DAGs from…
checking-freshness
by astronomer
Verify data freshness by checking table timestamps and update patterns against a staleness scale. Identifies timestamp columns using common ETL naming patterns ( _loaded_at , _updated_at , created_at , etc.) and queries their maximum values to determine age Classifies data into four freshness statuses: Fresh (< 4 hours), Stale (4–24 hours), Very Stale (> 24 hours), or Unknown (no timestamp found) Provides SQL templates for checking last update time and row count trends over recent days to...
cosmos-dbt-core
by astronomer
Convert dbt Core projects into Airflow DAGs or TaskGroups using Astronomer Cosmos. Supports three assembly patterns: standalone DbtDag, DbtTaskGroup within existing DAGs, and individual Cosmos operators for fine-grained control Choose from eight execution modes (WATCHER, LOCAL, VIRTUALENV, KUBERNETES, AIRFLOW_ASYNC, and others) based on isolation and performance needs Offers three parsing strategies (dbt_manifest, dbt_ls, dbt_ls_file, automatic) to balance speed and selector complexity...
cosmos-dbt-fusion
by astronomer
Configure Astronomer Cosmos for dbt Fusion projects on Snowflake, Databricks, BigQuery, or Redshift with local execution. Requires Cosmos 1.11.0+, dbt Fusion binary installed separately in the Airflow runtime, and ExecutionMode.LOCAL with subprocess invocation Supports three parsing strategies: dbt_manifest (fastest for large projects), dbt_ls (for complex selectors), or automatic (simple setups) Covers ProfileConfig setup for warehouse connections, ProjectConfig for dbt project paths, and...
creating-openlineage-extractors
by astronomer
Custom OpenLineage extractors for unsupported Airflow operators and complex lineage scenarios. Two approaches: add OpenLineage methods directly to operators you own (recommended), or create custom extractors for third-party operators you cannot modify Extractors intercept operator execution at three points: before execution for static lineage, after success for runtime-determined outputs, and optionally after failure for partial lineage Register extractors via airflow.cfg or environment...
dag-factory
by astronomer
Author Apache Airflow DAGs declaratively with dag-factory YAML configs. Use when creating dag-factory templates, composing DAGs from YAML for dag-factory,…
debugging-dags
by astronomer
Systematic root cause analysis and remediation for failed Airflow DAGs with structured investigation workflows. Guides through four-step diagnosis process: identify the failure, extract error details, gather contextual information, and deliver actionable remediation steps Categorizes failures into four types (data, code, infrastructure, dependency) to focus investigation and suggest appropriate fixes Provides ready-to-use CLI commands for log retrieval, run comparison, task clearing, and DAG...
delegating-to-otto
by astronomer
Drives Astronomer's Otto agent (`astro otto`) as a delegated sub-agent for Airflow, dbt, and data-engineering work. Use when the user explicitly asks to "use…
deploying-airflow
by astronomer
Deploy Airflow DAGs and projects. Use when the user wants to deploy code, push DAGs, set up CI/CD, deploy to production, or asks about deployment strategies…
discovering-data
by astronomer
Discover and explore data for a concept or domain. Use when the user asks what data exists for a topic (e.g., "ARR", "customers", "orders"), wants to find…
init
by astronomer
Initialize warehouse schema discovery. Generates .astro/warehouse.md with all table metadata for instant lookups. Run once per project, refresh when schema…
initializing-warehouse
by astronomer
Initialize warehouse schema discovery. Generates .astro/warehouse.md with all table metadata for instant lookups. Run once per project, refresh when schema…
managing-astro-local-env
by astronomer
Manage local Airflow development environment with Astro CLI commands. Start, stop, restart, and kill local Airflow containers; default credentials are admin/admin with webserver at http://localhost:8080 View logs for all components or specific services (scheduler, webserver) with real-time follow option Access container shells and run Airflow CLI commands directly via astro dev bash and astro dev run Troubleshoot common issues including port conflicts, startup failures, package errors, and...
migrating-ai-sdk-to-common-ai
by astronomer
Migrates Airflow projects from airflow-ai-sdk to apache-airflow-providers-common-ai 0.1.0+. Use this skill when the user wants to replace airflow-ai-sdk with…
migrating-airflow-2-to-3
by astronomer
Automated detection and code migration for upgrading Apache Airflow 2.x DAGs to Airflow 3.x. Provides Ruff-based auto-fix rules (AIR30/AIR301/AIR302/AIR31/AIR311/AIR312) to detect and resolve breaking changes in imports, operators, hooks, and context variables Covers critical architecture shifts: workers no longer access metadata DB directly; use the Airflow Python client or REST API instead of ORM session queries Includes manual migration checklist for issues Ruff cannot auto-fix: cron...
profiling-tables
by astronomer
Comprehensive statistical and quality analysis of database tables with structured profiling output. Generates column-level statistics tailored to data type: min/max/percentiles for numeric columns, length metrics for strings, date ranges for timestamps Performs cardinality analysis to identify categorical vs. high-cardinality columns and detect skewed distributions Assesses data quality across five dimensions: completeness (NULL rates), uniqueness (duplicates), freshness (update timestamps),...
setting-up-astro-project
by astronomer
Initialize and configure Astro/Airflow projects with dependencies, connections, and environment setup. Scaffolds complete project structure with astro dev init , including directories for DAGs, plugins, tests, and configuration files Manage Python and OS-level dependencies via requirements.txt and packages.txt , with custom Dockerfile support for complex setups Configure connections, variables, and pools declaratively in airflow_settings.yaml , with export/import commands for environment...
testing-dags
by astronomer
Iterative test-debug-fix cycles for Airflow DAGs with comprehensive failure diagnosis. Start with af runs trigger-wait <dag_id> to run a DAG and wait for completion; no pre-flight checks needed On failure, use af runs diagnose for comprehensive failure summary and af tasks logs to inspect error details from specific tasks Supports custom configuration, timeouts, and retry attempts; handles success, failure, and timeout scenarios with clear response interpretation Quick validation available...
tracing-downstream-lineage
by astronomer
Trace downstream data lineage to assess change impact before modifying tables or DAGs. Identifies direct consumers of a target table or DAG through source code search, view dependencies, and BI tool connections Builds a full dependency tree mapping all downstream impacts, from tables to dashboards to ML models Categorizes dependencies by criticality (critical, high, medium, low) to prioritize stakeholder communication and testing Generates an impact report with risk assessment, affected...
tracing-upstream-lineage
by astronomer
Trace upstream data lineage to identify sources, DAGs, and dependencies feeding a table or column. Supports tracing three target types: tables, columns, and DAGs; uses Airflow DAG source code and task inspection to find producing pipelines Handles SQL sources (FROM clauses), external systems (S3, Postgres, Salesforce, HTTP APIs), and file-based sources; recursively traces upstream chains Includes column-level tracing through direct mappings, transformations, and aggregations in DAG code...
warehouse-init
by astronomer
Initialize warehouse schema discovery. Generates .astro/warehouse.md with all table metadata for instant lookups. Run once per project, refresh when schema…

NotebookLM Web Importer

Importa páginas web y videos de YouTube a NotebookLM con un clic. Utilizado por más de 200,000 usuarios.

Instalar extensión de Chrome