golang-benchmark

作者: samber

Golang 基準測試、效能剖析與效能測量。適用於撰寫、執行或比較 Go 基準測試,使用 pprof 對熱點路徑進行效能剖析,解讀 CPU/記憶體/追蹤分析結果,透過 benchstat 分析結果,設定 CI 基準測試回歸檢測,或使用 Prometheus 執行時期指標調查生產環境效能。也適用於開發者需要針對特定效能指標進行深入分析時——此技能提供測量方法論,同時...

npx skills add https://github.com/samber/cc-skills-golang --skill golang-benchmark

Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.

Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.

Dependencies:

  • benchstat: go install golang.org/x/perf/cmd/benchstat@latest

Go Benchmarking & Performance Measurement

Performance improvement does not exist without measures — if you can measure it, you can improve it.

This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.

Writing Benchmarks

b.Loop() (Go 1.24+) — preferred

For Go 1.24+, prefer b.Loop() for new benchmarks. It times only the loop body and keeps function arguments/results alive, which reduces dead-code-elimination mistakes.

func BenchmarkParse(b *testing.B) {
    data := loadFixture("large.json") // setup — excluded from timing
    for b.Loop() {
        Parse(data)  // compiler cannot eliminate this call
    }
}

Legacy b.N loops still compile and are fine to keep when preserving existing benchmarks or supporting Go <1.24. They are easier to get wrong: setup may need b.ResetTimer(), and results may need a sink if the compiler can eliminate the work. Go 1.26 fixed an earlier b.Loop() inlining limitation — benchmarks on 1.24–1.25 already benefit from b.Loop() but may miss inlining optimizations that 1.26 delivers.

Memory tracking

func BenchmarkAlloc(b *testing.B) {
    b.ReportAllocs() // or run with -benchmem flag
    var sink []byte
    for b.Loop() {
        sink = make([]byte, 1024)
    }
    _ = sink
}

b.ReportMetric() adds custom metrics (e.g., throughput):

b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s") // b.Elapsed() is only valid inside b.Loop()

Sub-benchmarks and table-driven

func BenchmarkEncode(b *testing.B) {
    for _, size := range []int{64, 256, 4096} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            data := make([]byte, size)
            for b.Loop() {
                Encode(data)
            }
        })
    }
}

Running Benchmarks

go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
FlagPurpose
-bench=.Run all benchmarks (regexp filter)
-benchmemReport allocations (B/op, allocs/op)
-count=10Run 10 times for statistical significance
-benchtime=3sMinimum time per benchmark (default 1s)
-cpu=1,2,4Run with different GOMAXPROCS values
-cpuprofile=cpu.profWrite CPU profile
-memprofile=mem.profWrite memory profile
-trace=trace.outWrite execution trace

Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.

Documenting Results in Commits

Paste benchstat output in the commit body when the change has a measurable performance impact. This documents why an optimization was made, prevents future readers from reverting it, and lets reviewers verify the claim without re-running benchmarks.

Commit format:

perf(parser): reduce Parse allocations 50% with sync.Pool

Replace per-call []byte allocation with a pooled buffer.

goos: linux / goarch: amd64 / cpu: AMD Ryzen 9 5950X
          │    old     │              new               │
          │  sec/op    │  sec/op     vs base            │
Parse-32    4.592µ ± 2%  3.041µ ± 1%  -33.78% (p=0.000 n=10)

          │   old    │             new              │
          │   B/op   │   B/op     vs base           │
Parse-32   1.024Ki ± 0%  0.512Ki ± 0%  -50.00% (p=0.000 n=10)

          │ old  │            new             │
          │ allocs/op │ allocs/op  vs base    │
Parse-32   12.00 ± 0%   6.000 ± 0%  -50.00% (p=0.000 n=10)

Rules:

  • Only include benchmarks directly affected by the change — strip unrelated rows
  • Never paste results with ~ (no statistical significance) — the improvement cannot be claimed
  • Include the hardware context line (goos/goarch/cpu) so results are reproducible
  • Use perf(scope): commit type for performance-only changes

Profiling from Benchmarks

Generate profiles directly from benchmark runs — no HTTP server needed:

# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof

# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof

# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out

For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.

Reference Files

  • pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs allocobjects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into _where time and memory are being spent in your code.

  • benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.

  • Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.

  • Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.

  • Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.

  • CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.

  • Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.

  • Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.

Cross-References

  • → See samber/cc-skills-golang@golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")
  • → See samber/cc-skills-golang@golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodology
  • → See samber/cc-skills-golang@golang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)
  • → See samber/cc-skills-golang@golang-testing skill for general testing practices
  • → See samber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findings

來自 samber 的更多技能

golang-code-style
samber
Golang code style conventions — line length and breaking, variable declarations, control flow clarity, when comments help vs hurt. Use when writing or reviewing Go code, asking about style or clarity, or establishing project coding standards. Not for naming conventions (→ See `samber/cc-skills-golang@golang-naming` skill), linter configuration (→ See `samber/cc-skills-golang@golang-lint` skill), or doc comments (→ See `samber/cc-skills-golang@golang-documentation` skill).
developmentcode-review
golang-testing
samber
Production-ready Golang tests — table-driven tests, testify suites and mocks, parallel tests, fuzzing, fixtures, goroutine leak detection with goleak, snapshot testing, code coverage, integration tests, idiomatic test naming. Use when writing or reviewing Go tests, choosing a testing approach, setting up Go test CI, or debugging flaky/slow tests. For testify-specific APIs see `samber/cc-skills-golang@golang-stretchr-testify`; for measurement methodology see...
developmenttestingcode-review
golang-design-patterns
samber
符合慣例的 Golang 設計模式 — 函數選項、建構子、錯誤流程與串聯、資源管理與生命週期、優雅關閉、韌性、架構、依賴注入、資料處理、串流等。適用於明確選擇架構模式、實作函數選項、設計建構子 API、設定優雅關閉、應用韌性模式,或詢問哪種慣用 Go 模式適合特定問題時。
developmentdesigncode-review
golang-error-handling
samber
Idiomatic Golang error handling — creation, wrapping with %w, errors.Is/As, errors.Join, custom error types, sentinel errors, panic/recover, the single handling rule, structured logging with slog, HTTP request logging middleware, and samber/oops for production errors. Built to make logs usable at scale with log aggregation 3rd-party tools. Apply when creating, wrapping, inspecting, or logging errors in Go code. For samber/oops specifics → See `samber/cc-skills-golang@golang-samber-oops`...
developmentcode-review
golang-performance
samber
Golang 性能優化模式與方法論 - 若遇到 X 瓶頸,則應用 Y。涵蓋減少分配、CPU 效率、記憶體佈局、GC 調校、池化、快取以及熱路徑優化。適用於當性能分析或基準測試已識別出瓶頸,且需要正確的優化模式來解決時。亦適用於進行性能代碼審查時,提出改進建議或可協助快速識別性能增益的基準測試。不適用於測量方法論(→...
developmentcode-review
golang-security
samber
Golang的安全最佳實踐與漏洞防範。涵蓋注入攻擊(SQL、命令、XSS)、密碼學、檔案系統安全、網路安全、Cookie、機密管理、記憶體安全及日誌記錄。適用於撰寫、審查或稽核Go程式碼的安全性,或處理涉及加密、I/O、機密管理、使用者輸入處理或身分驗證的高風險程式碼。包含安全工具的配置。
securitycode-reviewdevelopment
golang-database
samber
Go 資料庫存取的全面指南 — 參數化查詢、結構掃描、可空欄位、交易、隔離層級、SELECT FOR UPDATE、連線池、批次處理、上下文傳遞與遷移工具。適用於撰寫、審查或除錯與 PostgreSQL、MariaDB、MySQL 或 SQLite 互動的 Golang 程式碼;資料庫測試;或關於 database/sql、sqlx 或 pgx 的問題。不產生資料庫結構或遷移 SQL。
developmentdatabase
golang-lint
samber
針對 Golang 專案的 lint 最佳實務與 golangci-lint 配置 — 執行 linter、設定 .golangci.yml、使用 nolint 指令抑制警告、解讀 lint 輸出,以及選擇 linter。適用於配置 golangci-lint、詢問 lint 警告或 nolint 抑制方式、設定程式碼品質工具,或挑選 linter 時。亦適用於使用者提及 golangci-lint、go vet、staticcheck 或 revive 時。
developmentcode-reviewtesting