tracing-downstream-lineage

作者： astronomer

追踪下游数据血缘，在修改表或DAG前评估变更影响。通过源代码搜索、视图依赖和BI工具连接识别目标表或DAG的直接消费者，构建完整的依赖树，映射从表到仪表盘再到机器学习模型的所有下游影响。按关键性（关键、高、中、低）对依赖进行分类，以优先安排利益相关者沟通和测试。生成包含风险评估、受影响...的影响报告。

npx skills add https://github.com/astronomer/agents --skill tracing-downstream-lineage

下载 ZIP GitHub

395

Downstream Lineage: Impacts

Answer the critical question: "What breaks if I change this?"

Use this BEFORE making changes to understand the blast radius.

Impact Analysis

Step 1: Identify Direct Consumers

Find everything that reads from this target:

For Tables:

Search DAG source code: Look for DAGs that SELECT from this table
- Use af dags list to get all DAGs
- Use af dags source <dag_id> to search for table references
- Look for: FROM target_table, JOIN target_table

Check for dependent views:

-- Snowflake
SELECT * FROM information_schema.view_table_usage
WHERE table_name = '<target_table>'

-- Or check SHOW VIEWS and search definitions

Look for BI tool connections:
- Dashboards often query tables directly
- Check for common BI patterns in table naming (rpt_, dashboard_)

On Astro

If you're running on Astro, the Lineage tab in the Astro UI provides visual dependency graphs across DAGs and datasets, making downstream impact analysis faster. It shows which DAGs consume a given dataset and their current status, reducing the need for manual source code searches.

For DAGs:

Check what the DAG produces: Use af dags source <dag_id> to find output tables
Then trace those tables' consumers (recursive)

Step 2: Build Dependency Tree

Map the full downstream impact:

SOURCE: fct.orders
    |
    +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
    |       |
    |       +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
    |
    +-- TABLE: ml.order_features --> Model: Demand Forecasting
    |
    +-- DIRECT: Looker Dashboard "Sales Overview"

Step 3: Categorize by Criticality

Critical (breaks production):

Production dashboards
Customer-facing applications
Automated reports to executives
ML models in production
Regulatory/compliance reports

High (causes significant issues):

Internal operational dashboards
Analyst workflows
Data science experiments
Downstream ETL jobs

Medium (inconvenient):

Ad-hoc analysis tables
Development/staging copies
Historical archives

Low (minimal impact):

Deprecated tables
Unused datasets
Test data

Step 4: Assess Change Risk

For the proposed change, evaluate:

Schema Changes (adding/removing/renaming columns):

Which downstream queries will break?
Are there SELECT * patterns that will pick up new columns?
Which transformations reference the changing columns?

Data Changes (values, volumes, timing):

Will downstream aggregations still be valid?
Are there NULL handling assumptions that will break?
Will timing changes affect SLAs?

Deletion/Deprecation:

Full dependency tree must be migrated first
Communication needed for all stakeholders

Step 5: Find Stakeholders

Identify who owns downstream assets:

DAG owners: Check owners field in DAG definitions
Dashboard owners: Usually in BI tool metadata
Team ownership: Look for team naming patterns or documentation

Output: Impact Report

Summary

"Changing fct.orders will impact X tables, Y DAGs, and Z dashboards"

Impact Diagram

                    +--> [agg.daily_sales] --> [Executive Dashboard]
                    |
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
                    |
                    +--> [ml.features] --> [Demand Model]

Detailed Impacts

Downstream	Type	Criticality	Owner	Notes
agg.daily_sales	Table	Critical	data-eng	Updated hourly
Executive Dashboard	Dashboard	Critical	analytics	CEO views daily
ml.order_features	Table	High	ml-team	Retraining weekly

Risk Assessment

Change Type	Risk Level	Mitigation
Add column	Low	No action needed
Rename column	High	Update 3 DAGs, 2 dashboards
Delete column	Critical	Full migration plan required
Change data type	Medium	Test downstream aggregations