find-traces

작성자: axiomhq

Axiom에서 OpenTelemetry 분산 트레이스를 분석합니다. 트레이스 ID를 조사하거나, 기준(오류, 지연 시간, 서비스)별로 트레이스를 찾거나, 디버깅할 때 사용하세요…

npx skills add https://github.com/axiomhq/cli --skill find-traces

Trace Analysis

Analyze OpenTelemetry distributed traces to identify errors, latency issues, and root causes.

Arguments

When invoked with a trace ID (e.g., /find-traces abc123...), it's available as $ARGUMENTS.

Trace Dataset Discovery

First, find trace datasets:

axiom dataset list -f json

Look for datasets containing trace data (often named *traces*, *spans*, or otel-*).

Schema Discovery

Always verify field names first:

axiom query "['<trace-dataset>'] | getschema" --start-time -1h

Common Operations

Get Trace by ID

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| sort by _time asc
| limit 100" --start-time -1h -f json

Find Error Traces

axiom query "['<dataset>']
| where _time >= ago(1h)
| where error == true
| extend error = coalesce(ensure_field(\"error\", typeof(bool)), false)
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    error_count = countif(error),
    services = make_set(['service.name']),
    root_operation = arg_min(_time, name)
  by trace_id
| sort by start_time desc
| limit 20" --start-time -1h -f json

Find Slow Traces

axiom query "['<dataset>']
| where _time >= ago(1h)
| where duration >= 1000000000
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    services = make_set(['service.name'])
  by trace_id
| sort by total_duration desc
| limit 20" --start-time -1h -f json

Find Traces by Service

axiom query "['<dataset>']
| where _time >= ago(1h)
| where ['service.name'] == '<SERVICE>'
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    error_count = countif(error == true)
  by trace_id
| sort by start_time desc
| limit 20" --start-time -1h -f json

Error Spans in Trace

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| where error == true
| project _time, ['service.name'], name, duration, ['status.message']" --start-time -1h -f json

Critical Path Analysis

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| project span_id, parent_span_id, ['service.name'], name, duration, error
| sort by duration desc" --start-time -1h -f json

OTel Field Reference

FieldBracket?Description
trace_idNo32-char trace identifier
span_idNo16-char span identifier
parent_span_idNoParent span (empty for root)
nameNoOperation name
durationNoDuration in nanoseconds
kindNoCLIENT, SERVER, INTERNAL, PRODUCER, CONSUMER
errorNoBoolean error flag
['service.name']YesService identifier
['status.code']YesOK, ERROR, or nil
['status.message']YesError description
['scope.name']YesInstrumentation library

Duration Conversion

OTel durations are in nanoseconds:

HumanNanosecondsFilter
1 ms1,000,000duration >= 1000000
100 ms100,000,000duration >= 100000000
1 s1,000,000,000duration >= 1000000000

Convert for display:

| extend duration_ms = duration / 1000000.0

Custom Attributes

Non-standard span attributes are stored in attributes.custom map:

// Filter by custom attribute
| where ['attributes.custom']['user_id'] == "123"

// Aggregation requires explicit cast
| summarize count() by tostring(['attributes.custom']['tenant'])

Without tostring(), aggregations fail with "grouping by field of type unknown".

Codebase Correlation

When working in a repository that matches the traced service, correlate trace data with source code to identify root causes.

Mapping Trace Data to Code

  1. Extract package/module path from ['scope.name']

    • Contains the instrumentation library or package path
    • Strip the module prefix to get the local path
    • Example: github.com/org/repo/pkg/authpkg/auth
  2. Find code from operation name

    • The name field often contains function names or HTTP routes
    • Search the codebase for matching handlers, functions, or endpoints
  3. Trace the call chain

    • Follow parent-child span relationships
    • Map each span to its corresponding code location
    • Identify where errors originate and propagate

Note: Codebase correlation is optional. Proceed with trace-only analysis if code is unavailable or doesn't match the traced services.

Output Format

When analyzing a trace, provide:

## Trace Summary
- **Trace ID:** <id>
- **Duration:** <human-readable>
- **Services:** <list>
- **Outcome:** success/failure

## Sequence of Events
1. <Service> - <operation> (<duration>)
2. <Service> - <operation> (<duration>) ⚠️ ERROR
...

## Error Analysis
<What failed, when, why>

## Root Cause
<Deepest error and explanation>

## Codebase Locations (if applicable)
- **Service:** <service.name>
- **Package:** <scope.name>
- **Files:** <specific files to investigate>

## Recommended Actions
1. <Specific action>
2. <What to investigate next>

When NOT to Use

  • Metrics analysis: Traces are for request flow; use logs/metrics skills for aggregated performance data
  • Non-OTel data: This skill assumes OpenTelemetry field conventions (trace_id, span_id, etc.)
  • Known trace structure: If you already have the query, run it directly without invoking this skill
  • Alerting on trace patterns: Use Axiom Monitors for continuous alerting

APL Reference

For query syntax, invoke the axiom-apl skill which provides trace analysis patterns and duration unit guidance.

axiomhq의 다른 스킬

axiom-apl
axiomhq
APL 쿼리 언어 레퍼런스 for Axiom. 연산자, 함수, 패턴 및 CLI 사용법을 제공합니다. 전문화된 Axiom 스킬에 의해 작성 시 자동 호출됩니다…
official
detect-anomalies
axiomhq
Axiom 데이터셋에서 통계적 분석을 사용하여 이상 징후를 탐지합니다. 비정상적인 패턴, 볼륨 급증, 이상치 또는 새로운 오류 유형을 찾을 때 사용하세요.
official
explore-dataset
axiomhq
Axiom 데이터셋을 탐색하여 스키마, 필드, 볼륨 및 패턴을 이해합니다. 새 데이터셋을 발견하거나 데이터 구조를 조사할 때 사용합니다.
official
gilfoyle
axiomhq
당신이 할 수 없는 일을 해내는 SRE 에이전트. 관측 가능성 스택을 조회합니다. 근본 원인을 찾아냅니다. 당황하지 않습니다. 추측하지 않습니다. 당신의 감정에 신경 쓰지 않습니다. 사용…
official
axiom-sre
axiomhq
전문 SRE 조사관으로서 인시던트 및 디버깅을 수행합니다. 가설 기반 방법론과 체계적 트라이지를 사용합니다. 사용 가능 시 Axiom 관찰 가능성을 쿼리할 수 있습니다.…
official
building-dashboards
axiomhq
API를 통해 Axiom 대시보드를 설계하고 구축합니다. 차트 유형, APL 및 메트릭/MPL 쿼리 패턴, SmartFilters, 레이아웃, 구성 옵션을 다룹니다. 다음 경우에 사용하세요…
official
controlling-costs
axiomhq
Axiom 쿼리 패턴을 분석하여 사용되지 않는 데이터를 찾고, 비용 최적화를 위한 대시보드와 모니터를 구축합니다. Axiom 비용 절감, 미사용 데이터 찾기 요청 시 사용하세요.
official
query-metrics
axiomhq
스크립트를 통해 Axiom MetricsDB에 메트릭 쿼리를 실행합니다. 사용 가능한 메트릭, 태그 및 태그 값을 탐색합니다. 메트릭 쿼리, 메트릭 탐색 등을 요청받았을 때 사용하세요.
official