analyzing-data
데이터 웨어하우스에 질의하여 캐시된 패턴과 개념 매핑을 통해 비즈니스 질문에 답변합니다. 반복되는 질문 유형에 대한 패턴 조회 및 캐싱을 지원하며, 결과 기록을 통해 향후 질의를 개선합니다. 개념-테이블 매핑 캐시와 INFORMATION_SCHEMA 또는 코드베이스 grep을 통한 테이블 스키마 탐색을 포함합니다. 분석을 위해 Polars 또는 Pandas DataFrame을 반환하는 run_sql() 및 run_sql_pandas() 커널 함수를 제공합니다. 개념, 패턴 및 테이블 캐시를 관리하기 위한 CLI 명령어와 추가 기능을 포함합니다.
npx skills add https://github.com/astronomer/agents --skill analyzing-dataData Analysis
Answer business questions by querying the data warehouse. The kernel auto-starts on first exec call.
All CLI commands below are relative to this skill's directory. Before running any scripts/cli.py command, cd to the directory containing this file.
Workflow
-
Pattern lookup — Check for a cached query strategy:
uv run scripts/cli.py pattern lookup "<user's question>"If a pattern exists, follow its strategy. Record the outcome after executing:
uv run scripts/cli.py pattern record <name> --success # or --failure -
Concept lookup — Find known table mappings:
uv run scripts/cli.py concept lookup <concept> -
Table discovery — If cache misses, search the codebase (
Grep pattern="<concept>" glob="**/*.sql") or queryINFORMATION_SCHEMA. See reference/discovery-warehouse.md. -
Execute query:
uv run scripts/cli.py exec "df = run_sql('SELECT ...')" uv run scripts/cli.py exec "print(df)" -
Cache learnings — Always cache before presenting results:
# Cache concept → table mapping uv run scripts/cli.py concept learn <concept> <TABLE> -k <KEY_COL> # Cache query strategy (if discovery was needed) uv run scripts/cli.py pattern learn <name> -q "question" -s "step" -t "TABLE" -g "gotcha" -
Present findings to user.
Kernel Functions
| Function | Returns |
|---|---|
run_sql(query, limit=100) | Polars DataFrame |
run_sql_pandas(query, limit=100) | Pandas DataFrame |
run_sql_many(queries, limit=100) | List of Polars DataFrames (one per query) |
pl (Polars) and pd (Pandas) are pre-imported.
Run independent queries together with run_sql_many — they execute concurrently (Snowflake async / connection-pool fan-out) instead of one at a time:
uv run scripts/cli.py exec "dfs = run_sql_many(['SELECT ...', 'SELECT ...']); print(dfs[0])"
run_sql_many is fail-fast: if any query errors, the call raises and the results of the queries that succeeded are discarded. Use separate run_sql calls if you need partial results.
Timeouts: exec waits up to 120s by default, then interrupts the query and returns a "client stopped waiting" message (the query may still finish server-side). Raise it for known long-running queries: uv run scripts/cli.py exec "..." -t 600.
Idle kernel: the kernel self-terminates after 2h idle (preserving state until then). Override with ASTRO_KERNEL_IDLE_TIMEOUT (seconds; 0 disables).
CLI Reference
Kernel
uv run scripts/cli.py warehouse list # List warehouses
uv run scripts/cli.py start [-w name] # Start kernel (with optional warehouse)
uv run scripts/cli.py exec "..." # Execute Python code
uv run scripts/cli.py status # Kernel status
uv run scripts/cli.py restart # Restart kernel
uv run scripts/cli.py stop # Stop kernel
uv run scripts/cli.py install <pkg> # Install package
Concept Cache
uv run scripts/cli.py concept lookup <name> # Look up
uv run scripts/cli.py concept learn <name> <TABLE> -k <KEY_COL> # Learn
uv run scripts/cli.py concept list # List all
uv run scripts/cli.py concept import -p /path/to/warehouse.md # Bulk import
Pattern Cache
uv run scripts/cli.py pattern lookup "question" # Look up
uv run scripts/cli.py pattern learn <name> -q "..." -s "..." -t "TABLE" -g "gotcha" # Learn
uv run scripts/cli.py pattern record <name> --success # Record outcome
uv run scripts/cli.py pattern list # List all
uv run scripts/cli.py pattern delete <name> # Delete
Table Schema Cache
uv run scripts/cli.py table lookup <TABLE> # Look up schema
uv run scripts/cli.py table cache <TABLE> -c '[...]' # Cache schema
uv run scripts/cli.py table list # List cached
uv run scripts/cli.py table delete <TABLE> # Delete
Cache Management
uv run scripts/cli.py cache status # Stats
uv run scripts/cli.py cache clear [--stale-only] # Clear
References
- reference/discovery-warehouse.md — Large table handling, warehouse exploration, INFORMATION_SCHEMA queries
- reference/common-patterns.md — SQL templates for trends, comparisons, top-N, distributions, cohorts