setting-up-astro-project

作者: astronomer

使用依赖项、连接和环境设置初始化并配置Astro/Airflow项目。通过astro dev init搭建完整的项目结构,包括DAGs、插件、测试和配置文件的目录。通过requirements.txt和packages.txt管理Python和操作系统级依赖项,支持自定义Dockerfile以应对复杂配置。在airflow_settings.yaml中以声明方式配置连接、变量和连接池,并提供用于环境导出/导入的命令...

npx skills add https://github.com/astronomer/agents --skill setting-up-astro-project

Astro Project Setup

This skill helps you initialize and configure Airflow projects using the Astro CLI.

To run the local environment, see the managing-astro-local-env skill. To write DAGs, see the authoring-dags skill. Open-source alternative: If the user isn't on Astro, guide them to Apache Airflow's Docker Compose quickstart for local dev and the Helm chart for production. For deployment strategies, use the deploying-airflow skill.


Initialize a New Project

astro dev init

Don't pass --airflow-version or --runtime-version unless the user explicitly asks for a specific pin. Plain astro dev init resolves to the latest Astro Runtime — that's the right default. Specifying a version risks pinning to a stale value from training data. If the user wants to know what was installed, read the generated Dockerfile afterward instead of guessing.

Creates this structure:

project/
├── dags/                # DAG files
├── include/             # SQL, configs, supporting files
├── plugins/             # Custom Airflow plugins
├── tests/               # Unit tests
├── Dockerfile           # Image customization
├── packages.txt         # OS-level packages
├── requirements.txt     # Python packages
└── airflow_settings.yaml # Connections, variables, pools

Adding Dependencies

Python Packages (requirements.txt)

apache-airflow-providers-snowflake==5.3.0
pandas==2.1.0
requests>=2.28.0

OS Packages (packages.txt)

gcc
libpq-dev

Custom Dockerfile

For complex setups (private PyPI, custom scripts):

FROM quay.io/astronomer/astro-runtime:12.4.0

RUN pip install --extra-index-url https://pypi.example.com/simple my-package

After modifying dependencies: Run astro dev restart


Configuring Connections & Variables

airflow_settings.yaml

Loaded automatically on environment start:

airflow:
  connections:
    - conn_id: my_postgres
      conn_type: postgres
      host: host.docker.internal
      port: 5432
      login: user
      password: pass
      schema: mydb

  variables:
    - variable_name: env
      variable_value: dev

  pools:
    - pool_name: limited_pool
      pool_slot: 5

Export/Import

# Export from running environment
astro dev object export --connections --file connections.yaml

# Import to environment
astro dev object import --connections --file connections.yaml

Validate Before Running

Parse DAGs to catch errors without starting the full environment:

astro dev parse

Related Skills

  • managing-astro-local-env: Start, stop, and troubleshoot the local environment
  • authoring-dags: Write and validate DAGs (uses MCP tools)
  • testing-dags: Test DAGs (uses MCP tools)
  • deploying-airflow: Deploy DAGs to production (Astro, Docker Compose, Kubernetes)

来自 astronomer 的更多技能

airflow
astronomer
查询、管理和排查Apache Airflow的DAG、运行记录、任务及系统配置。支持30多种命令,涵盖DAG检查、运行管理、任务日志、配置查询及直接REST API访问。通过持久化配置管理多个Airflow实例;自动发现本地和Astro部署。同步(等待完成)或异步触发DAG运行,诊断故障,清除运行记录以重试,并通过重试/映射索引过滤访问任务日志。输出...
official
airflow-hitl
astronomer
在Airflow DAG中使用可延迟操作符实现人工审批关卡、表单输入和分支。四种操作符类型:用于批准/拒绝决策的ApprovalOperator、带表单的多选项选择HITLOperator、人工驱动的任务路由HITLBranchOperator,以及表单数据收集HITLEntryOperator。所有操作符均为可延迟设计,在通过Airflow UI的"必需操作"标签页或REST API等待人工响应时释放工作槽位。支持包括自定义在内的可选功能...
official
airflow-plugins
astronomer
构建嵌入FastAPI应用、自定义UI页面、React组件、中间件、宏和操作符链接的Airflow 3.1+插件,直接集成到Airflow UI中。使用…
official
analyzing-data
astronomer
查询数据仓库,利用缓存的模式和概念映射来回答业务问题。支持对重复问题类型进行模式查找和缓存,并通过记录结果来改进后续查询。包含概念到表的映射缓存,以及通过INFORMATION_SCHEMA或代码库grep进行表结构发现。提供run_sql()和run_sql_pandas()内核函数,返回Polars或Pandas DataFrame用于分析。提供CLI命令用于管理概念、模式和表缓存,以及...
official
annotating-task-lineage
astronomer
使用入口和出口为Airflow任务标注数据血缘。支持使用OpenLineage Dataset对象、Airflow Assets和Airflow Datasets定义跨数据库、数据仓库及云存储的输入输出。当运算符缺少内置OpenLineage提取器时作为备用方案;遵循四级优先级系统,其中自定义提取器和OpenLineage方法优先。包含针对Snowflake、BigQuery、S3和PostgreSQL的数据集命名辅助工具,以确保一致性...
official
authoring-dags
astronomer
创建Apache Airflow DAG的引导式工作流,集成验证与测试。采用六阶段结构化方法:发现环境与现有模式、规划DAG结构、遵循最佳实践实现、通过af CLI命令验证、经用户同意测试、迭代修复。用于发现(af config connections、af config providers、af dags list)和验证(af dags errors、af dags get、af dags explore)的CLI命令可提供DAG的即时反馈...
official
blueprint
astronomer
使用Pydantic验证定义可复用的Airflow任务组模板,并从YAML组合DAG。适用于创建blueprint模板、从YAML组合DAG等场景。
official
checking-freshness
astronomer
通过检查表时间戳和更新模式与陈旧度标尺对比,验证数据新鲜度。使用常见ETL命名模式(如_loaded_at、_updated_at、created_at等)识别时间戳列,并查询其最大值以确定数据时效。将数据分为四种新鲜度状态:新鲜(<4小时)、陈旧(4–24小时)、非常陈旧(>24小时)或未知(未找到时间戳)。提供SQL模板,用于检查最近几天的最后更新时间及行数变化趋势。
official