controlling-costs

Analyse les modèles de requêtes Axiom pour trouver les données inutilisées, puis crée des tableaux de bord et des moniteurs pour optimiser les coûts. À utiliser lorsqu'on vous demande de réduire les coûts Axiom, de trouver des données inutilisées…

npx skills add https://github.com/axiomhq/skills --skill controlling-costs

Axiom Cost Control

Dashboards, monitors, and waste identification for Axiom usage optimization.

Before You Start

  1. Load required skills:

    skill: axiom-sre
    skill: building-dashboards
    

    Building-dashboards provides: dashboard-list, dashboard-get, dashboard-create, dashboard-update, dashboard-delete

  2. Find the audit dataset. Try axiom-audit first:

    ['axiom-audit']
    | where _time > ago(1h)
    | summarize count() by action
    | where action in ('usageCalculated', 'runAPLQueryCost')
    
    • If not found → ask user. Common names: axiom-audit-logs-view, audit-logs
    • If found but no usageCalculated events → wrong dataset, ask user
  3. Verify axiom-history access (required for Phase 4):

    ['axiom-history'] | where _time > ago(1h) | take 1
    

    If not found, Phase 4 optimization will not work.

  4. Confirm with user:

    • Deployment name?
    • Audit dataset name?
    • Contract limit in TB/day? (required for Phase 3 monitors)
  5. Replace <deployment> and <audit-dataset> in all commands below.

Tips:

  • Run any script with -h for full usage
  • Do NOT pipe script output to head or tail — causes SIGPIPE errors
  • Requires jq for JSON parsing
  • Use axiom-sre's axiom-query for ad-hoc APL, not direct CLI

Which Phases to Run

User requestRun these phases
"reduce costs" / "find waste"0 → 1 → 4
"set up cost control"0 → 1 → 2 → 3
"deploy dashboard"0 → 2
"create monitors"0 → 3
"check for drift"0 only

Phase 0: Check Existing Setup

# Existing dashboard?
dashboard-list <deployment> | grep -i cost

# Existing monitors?
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'

If found, fetch with dashboard-get and compare to templates/dashboard.json for drift.


Phase 1: Discovery

scripts/baseline-stats -d <deployment> -a <audit-dataset>

Captures daily ingest stats and produces the Analysis Queue (needed for Phase 4).


Phase 2: Dashboard

scripts/deploy-dashboard -d <deployment> -a <audit-dataset>

Creates dashboard with: ingest trends, burn rate, projections, waste candidates, top users. See reference/dashboard-panels.md for details.


Phase 3: Monitors

Contract is required. You must have the contract limit from preflight step 4.

Step 1: List available notifiers

scripts/list-notifiers -d <deployment>

Present the list to the user and ask which notifier they want for cost alerts. If they don't want notifications, proceed without -n.

Step 2: Create monitors

scripts/create-monitors -d <deployment> -a <audit-dataset> -c <contract_tb> [-n <notifier_id>]

Creates 3 monitors:

  1. Total Ingest Guard — alerts when daily ingest >1.2x contract OR 7-day avg grows >15% vs baseline
  2. Per-Dataset Spike — robust z-score detection, alerts per dataset with attribution
  3. Query Cost Spike — hardened z-score with 30d baseline, 5d exclusion gap, persistence-based gating (median_z > 3, p25_z > 2.5)

The spike monitors use notifyByGroup: true so each dataset triggers a separate alert.

See reference/monitor-strategy.md for threshold derivation.


Phase 4: Optimization

Get the Analysis Queue

Run scripts/baseline-stats if not already done. It outputs a prioritized list:

PriorityMeaning
P0⛔Top 3 by ingest OR >10% of total — MANDATORY
P1Never queried — strong drop candidate
P2Rarely queried (Work/GB < 100) — likely waste

Work/GB = query cost (GB·ms) / ingest (GB). Lower = less value from data.

Analyze datasets in order

Work top-to-bottom. For each dataset:

Step 1: Column analysis

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset>

If 0 queries → recommend DROP, move to next.

Step 2: Field value analysis

Pick a field from suggested list (usually app, service, or kubernetes.labels.app):

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset> -f <field>

Note values with high volume but never queried (⚠️ markers).

Step 3: Handle empty values

If (empty) has >5% volume, you MUST drill down with alternative field (e.g., kubernetes.namespace_name).

Step 4: Record recommendation

For each dataset, note: name, ingest volume, Work/GB, top unqueried values, action (DROP/SAMPLE/KEEP), estimated savings.

Done when

All P0⛔ and P1 datasets analyzed. Then compile report using reference/analysis-report-template.md.



Cleanup

# Delete monitors
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'
axiom-api <deployment> DELETE "/v2/monitors/<id>"

# Delete dashboard
dashboard-list <deployment> | grep -i cost
dashboard-delete <deployment> <id>

Note: Running create-monitors twice creates duplicates. Delete existing monitors first if re-deploying.


Reference

Audit Dataset Fields

FieldDescription
actionusageCalculated or runAPLQueryCost
properties.hourly_ingest_bytesHourly ingest in bytes
properties.hourly_billable_query_gbmsHourly query cost
properties.datasetDataset name
resource.idOrg ID
actor.emailUser email

Common Fields for Value Analysis

Dataset typePrimary fieldAlternatives
Kubernetes logskubernetes.labels.appkubernetes.namespace_name, kubernetes.container_name
Application logsapp or servicelevel, logger, component
Infrastructurehostregion, instance
Tracesservice.namespan.kind, http.route

Units & Conversions

  • Scripts use TB/day
  • Dashboard filter uses GB/month
ContractTB/dayGB/month
5 PB/month1675,000,000
10 PB/month33310,000,000
15 PB/month50015,000,000

Optimization Actions

SignalAction
Work/GB = 0Drop or stop ingesting
High-volume unqueried valuesSample or reduce log level
Empty values from system namespacesFilter at ingest or accept
WoW spikeCheck recent deploys

Plus de skills de axiomhq

axiom-apl
axiomhq
Référence du langage de requête APL pour Axiom. Fournit les opérateurs, fonctions, motifs et utilisation en ligne de commande. Invoqué automatiquement par les compétences Axiom spécialisées lors de l'écriture ou…
official
detect-anomalies
axiomhq
Détecter les anomalies dans les jeux de données Axiom à l'aide d'une analyse statistique. À utiliser lors de la recherche de motifs inhabituels, de pics de volume, de valeurs aberrantes ou de nouveaux types d'erreurs dans…
official
explore-dataset
axiomhq
Explorer un jeu de données Axiom pour comprendre son schéma, ses champs, son volume et ses motifs. À utiliser lors de la découverte d’un nouveau jeu de données, de l’investigation de sa structure, ou…
official
find-traces
axiomhq
Analyser les traces distribuées OpenTelemetry provenant d'Axiom. Utiliser lors de l'investigation d'un ID de trace, de la recherche de traces par critères (erreurs, latence, service), ou du débogage…
official
gilfoyle
axiomhq
Agent SRE qui fait ce que vous ne pouvez pas. Interroge votre pile d'observabilité. Trouve les causes racines. Ne panique pas. Ne devine pas. Ne se soucie pas de vos sentiments. Utilisez…
official
axiom-sre
axiomhq
Expert SRE enquêteur pour incidents et débogage. Utilise une méthodologie basée sur des hypothèses et un triage systématique. Peut interroger Axiom observability lorsqu'il est disponible.…
official
building-dashboards
axiomhq
Conçoit et construit des tableaux de bord Axiom via l'API. Couvre les types de graphiques, les modèles de requêtes APL et metrics/MPL, les SmartFilters, la mise en page et les options de configuration. Utilisez lorsque…
official
query-metrics
axiomhq
Exécute des requêtes de métriques sur Axiom MetricsDB via des scripts. Découvre les métriques, tags et valeurs de tags disponibles. Utilisez lorsque l'on vous demande d'interroger des métriques, d'explorer des métriques…
official