tracing-downstream-lineage

작성자: astronomer

테이블이나 DAG를 수정하기 전에 다운스트림 데이터 계보를 추적하여 변경 영향을 평가합니다. 소스 코드 검색, 뷰 종속성, BI 도구 연결을 통해 대상 테이블 또는 DAG의 직접적인 소비자를 식별합니다. 테이블에서 대시보드, ML 모델에 이르기까지 모든 다운스트림 영향을 매핑하는 전체 종속성 트리를 구축합니다. 종속성을 중요도(심각, 높음, 중간, 낮음)별로 분류하여 이해관계자 커뮤니케이션 및 테스트의 우선순위를 지정합니다. 위험 평가, 영향을 받는 항목이 포함된 영향 보고서를 생성합니다...

npx skills add https://github.com/astronomer/agents --skill tracing-downstream-lineage

ZIP 다운로드 GitHub

395

Downstream Lineage: Impacts

Answer the critical question: "What breaks if I change this?"

Use this BEFORE making changes to understand the blast radius.

Impact Analysis

Step 1: Identify Direct Consumers

Find everything that reads from this target:

For Tables:

Search DAG source code: Look for DAGs that SELECT from this table
- Use af dags list to get all DAGs
- Use af dags source <dag_id> to search for table references
- Look for: FROM target_table, JOIN target_table

Check for dependent views:

-- Snowflake
SELECT * FROM information_schema.view_table_usage
WHERE table_name = '<target_table>'

-- Or check SHOW VIEWS and search definitions

Look for BI tool connections:
- Dashboards often query tables directly
- Check for common BI patterns in table naming (rpt_, dashboard_)

On Astro

If you're running on Astro, the Lineage tab in the Astro UI provides visual dependency graphs across DAGs and datasets, making downstream impact analysis faster. It shows which DAGs consume a given dataset and their current status, reducing the need for manual source code searches.

For DAGs:

Check what the DAG produces: Use af dags source <dag_id> to find output tables
Then trace those tables' consumers (recursive)

Step 2: Build Dependency Tree

Map the full downstream impact:

SOURCE: fct.orders
    |
    +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
    |       |
    |       +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
    |
    +-- TABLE: ml.order_features --> Model: Demand Forecasting
    |
    +-- DIRECT: Looker Dashboard "Sales Overview"

Step 3: Categorize by Criticality

Critical (breaks production):

Production dashboards
Customer-facing applications
Automated reports to executives
ML models in production
Regulatory/compliance reports

High (causes significant issues):

Internal operational dashboards
Analyst workflows
Data science experiments
Downstream ETL jobs

Medium (inconvenient):

Ad-hoc analysis tables
Development/staging copies
Historical archives

Low (minimal impact):

Deprecated tables
Unused datasets
Test data

Step 4: Assess Change Risk

For the proposed change, evaluate:

Schema Changes (adding/removing/renaming columns):

Which downstream queries will break?
Are there SELECT * patterns that will pick up new columns?
Which transformations reference the changing columns?

Data Changes (values, volumes, timing):

Will downstream aggregations still be valid?
Are there NULL handling assumptions that will break?
Will timing changes affect SLAs?

Deletion/Deprecation:

Full dependency tree must be migrated first
Communication needed for all stakeholders

Step 5: Find Stakeholders

Identify who owns downstream assets:

DAG owners: Check owners field in DAG definitions
Dashboard owners: Usually in BI tool metadata
Team ownership: Look for team naming patterns or documentation

Output: Impact Report

Summary

"Changing fct.orders will impact X tables, Y DAGs, and Z dashboards"

Impact Diagram

                    +--> [agg.daily_sales] --> [Executive Dashboard]
                    |
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
                    |
                    +--> [ml.features] --> [Demand Model]

Detailed Impacts

Downstream	Type	Criticality	Owner	Notes
agg.daily_sales	Table	Critical	data-eng	Updated hourly
Executive Dashboard	Dashboard	Critical	analytics	CEO views daily
ml.order_features	Table	High	ml-team	Retraining weekly

Risk Assessment

Change Type	Risk Level	Mitigation
Add column	Low	No action needed
Rename column	High	Update 3 DAGs, 2 dashboards
Delete column	Critical	Full migration plan required
Change data type	Medium	Test downstream aggregations