find-tracesby axiomhq

Analyze OpenTelemetry distributed traces from Axiom. Use when investigating a trace ID, finding traces by criteria (errors, latency, service), or debugging…

npx skills add https://github.com/axiomhq/cli --skill find-traces

Trace Analysis

Analyze OpenTelemetry distributed traces to identify errors, latency issues, and root causes.

Arguments

When invoked with a trace ID (e.g., /find-traces abc123...), it's available as $ARGUMENTS.

Trace Dataset Discovery

First, find trace datasets:

axiom dataset list -f json

Look for datasets containing trace data (often named *traces*, *spans*, or otel-*).

Schema Discovery

Always verify field names first:

axiom query "['<trace-dataset>'] | getschema" --start-time -1h

Common Operations

Get Trace by ID

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| sort by _time asc
| limit 100" --start-time -1h -f json

Find Error Traces

axiom query "['<dataset>']
| where _time >= ago(1h)
| where error == true
| extend error = coalesce(ensure_field(\"error\", typeof(bool)), false)
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    error_count = countif(error),
    services = make_set(['service.name']),
    root_operation = arg_min(_time, name)
  by trace_id
| sort by start_time desc
| limit 20" --start-time -1h -f json

Find Slow Traces

axiom query "['<dataset>']
| where _time >= ago(1h)
| where duration >= 1000000000
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    services = make_set(['service.name'])
  by trace_id
| sort by total_duration desc
| limit 20" --start-time -1h -f json

Find Traces by Service

axiom query "['<dataset>']
| where _time >= ago(1h)
| where ['service.name'] == '<SERVICE>'
| summarize
    start_time = min(_time),
    total_duration = max(duration),
    span_count = count(),
    error_count = countif(error == true)
  by trace_id
| sort by start_time desc
| limit 20" --start-time -1h -f json

Error Spans in Trace

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| where error == true
| project _time, ['service.name'], name, duration, ['status.message']" --start-time -1h -f json

Critical Path Analysis

axiom query "['<dataset>']
| where trace_id == '<TRACE_ID>'
| project span_id, parent_span_id, ['service.name'], name, duration, error
| sort by duration desc" --start-time -1h -f json

OTel Field Reference

FieldBracket?Description
trace_idNo32-char trace identifier
span_idNo16-char span identifier
parent_span_idNoParent span (empty for root)
nameNoOperation name
durationNoDuration in nanoseconds
kindNoCLIENT, SERVER, INTERNAL, PRODUCER, CONSUMER
errorNoBoolean error flag
['service.name']YesService identifier
['status.code']YesOK, ERROR, or nil
['status.message']YesError description
['scope.name']YesInstrumentation library

Duration Conversion

OTel durations are in nanoseconds:

HumanNanosecondsFilter
1 ms1,000,000duration >= 1000000
100 ms100,000,000duration >= 100000000
1 s1,000,000,000duration >= 1000000000

Convert for display:

| extend duration_ms = duration / 1000000.0

Custom Attributes

Non-standard span attributes are stored in attributes.custom map:

// Filter by custom attribute
| where ['attributes.custom']['user_id'] == "123"

// Aggregation requires explicit cast
| summarize count() by tostring(['attributes.custom']['tenant'])

Without tostring(), aggregations fail with "grouping by field of type unknown".

Codebase Correlation

When working in a repository that matches the traced service, correlate trace data with source code to identify root causes.

Mapping Trace Data to Code

  1. Extract package/module path from ['scope.name']

    • Contains the instrumentation library or package path
    • Strip the module prefix to get the local path
    • Example: github.com/org/repo/pkg/authpkg/auth
  2. Find code from operation name

    • The name field often contains function names or HTTP routes
    • Search the codebase for matching handlers, functions, or endpoints
  3. Trace the call chain

    • Follow parent-child span relationships
    • Map each span to its corresponding code location
    • Identify where errors originate and propagate

Note: Codebase correlation is optional. Proceed with trace-only analysis if code is unavailable or doesn't match the traced services.

Output Format

When analyzing a trace, provide:

## Trace Summary
- **Trace ID:** <id>
- **Duration:** <human-readable>
- **Services:** <list>
- **Outcome:** success/failure

## Sequence of Events
1. <Service> - <operation> (<duration>)
2. <Service> - <operation> (<duration>) ⚠️ ERROR
...

## Error Analysis
<What failed, when, why>

## Root Cause
<Deepest error and explanation>

## Codebase Locations (if applicable)
- **Service:** <service.name>
- **Package:** <scope.name>
- **Files:** <specific files to investigate>

## Recommended Actions
1. <Specific action>
2. <What to investigate next>

When NOT to Use

  • Metrics analysis: Traces are for request flow; use logs/metrics skills for aggregated performance data
  • Non-OTel data: This skill assumes OpenTelemetry field conventions (trace_id, span_id, etc.)
  • Known trace structure: If you already have the query, run it directly without invoking this skill
  • Alerting on trace patterns: Use Axiom Monitors for continuous alerting

APL Reference

For query syntax, invoke the axiom-apl skill which provides trace analysis patterns and duration unit guidance.

NotebookLM Web Importer

Import web pages and YouTube videos to NotebookLM with one click. Trusted by 200,000+ users.

Install Chrome Extension