heap-snapshot-analysis

作者: microsoft

分析 V8 堆積快照以調查記憶體洩漏和保留問題。在提供 .heapsnapshot 檔案、要求比較前後快照、要求…時使用。

npx skills add https://github.com/microsoft/vscode --skill heap-snapshot-analysis

Heap Snapshot Analysis

Investigate memory leaks from V8 heap snapshots (.heapsnapshot files). This skill starts when snapshots already exist: either the user provided them, DevTools exported them, or another workflow produced them. Use the helpers here to compare snapshots, group object deltas, and trace retainer paths.

IGNORE Prior Investigations

Start every investigation fresh. Do NOT read, consult, or be influenced by prior investigations found in:

  • /memories/ (user, session, or repo memory)
  • .github/skills/heap-snapshot-analysis/scratchpad/ (previous dated subfolders and their findings.md files)
  • Any other notes from earlier sessions

Previous findings can bias the analysis toward suspects that are no longer relevant, or cause the agent to skip steps and jump to conclusions. Let the current snapshots speak for themselves. Only reference prior work if the user explicitly asks you to.

When to Use

  • User provides .heapsnapshot files (before/after a workflow)
  • User has heap snapshots captured by another skill or script
  • Need to find what retains disposed objects (retainer path analysis)
  • Comparing object counts/sizes between two snapshots
  • Investigating why particular objects survive GC

Workflow

If the user needs the agent to launch VS Code, drive a scenario, and capture snapshots first, use the VS Code performance workflow skill before returning here for low-level snapshot analysis.

1. Parse Snapshots

Use the helpers in parseSnapshot.ts to load snapshots. The files are often >500MB and too large for JSON.parse as a string — the helpers use Buffer-based extraction. In scratchpad scripts, import helpers from ../helpers/*.ts.

For very large snapshots, the helper may still be too eager. Node cannot create a Buffer larger than roughly 2 GiB, so snapshots above that size can fail with ERR_FS_FILE_TOO_LARGE even before parsing. In that case, do not try to raise --max-old-space-size and retry the same full-file read. Switch to a streaming script.

import { parseSnapshot, buildGraph } from '../helpers/parseSnapshot.ts';

const data = parseSnapshot('/path/to/snapshot.heapsnapshot');
const graph = buildGraph(data);

Snapshots Larger Than 2 GiB

When a snapshot is too large to load into a single Buffer, write scratchpad scripts that scan and parse only the sections needed for the question. Use streamSnapshot.mjs for the common streaming primitives instead of copying them between scratch scripts.

Useful tricks:

  • Find top-level section offsets first. Scan the file as bytes for markers like "nodes":, "edges":, "strings":, and "trace_function_infos":. This lets follow-up scripts jump directly to the large arrays instead of searching the whole file repeatedly.
  • Parse snapshot.meta separately from the small header at the start of the file. Use meta.node_fields, meta.node_types, meta.edge_fields, and meta.edge_types to avoid hard-coding tuple widths.
  • Stream numeric arrays in chunks. For nodes and edges, keep a small carryover string between chunks, split on commas, and process complete numeric tokens as they arrive.
  • Avoid materializing the full strings table unless the investigation truly needs it. If you only need suspicious names, collect string indexes from matching nodes/edges first, then resolve only those indexes in a second streaming pass.
  • If you do need many strings, store only short previews and category counters. Full source strings, ref-listing strings, and prompt payloads can dominate memory and make the analyzer become the leak.
  • Write intermediate outputs to files in the scratchpad. Large heap analysis is iterative and slow; cached node ids, offsets, and retainer traces save repeated multi-minute passes.
  • Prefer self-size attribution and field-level ownership for huge graphs. Full retained-size walks can wildly overcount shared services, roots, maps, and singleton caches.
  • When quantifying a suspected owner, count obvious owned fields separately: wrapper object, key arrays, array elements, direct strings, and parent strings of sliced/concatenated strings. This often gives a better lower-bound than a single direct string bucket.
  • Be explicit about approximation boundaries. A field-level subtotal usually undercounts listeners/watchers/back-references but avoids the much worse problem of attributing the whole runtime to one object.

Example large-snapshot workflow:

import { findArrayStart, findTokenOffsets, parseMeta, streamNumberTuples } from '../../helpers/streamSnapshot.mjs';

const { size, offsets } = findTokenOffsets(snapshotPath);
const meta = parseMeta(snapshotPath);
const nodeFieldCount = meta.node_fields.length;
const nodesStart = findArrayStart(snapshotPath, offsets.get('"nodes"'));

streamNumberTuples(snapshotPath, nodesStart, offsets.get('"edges"'), nodeFieldCount, (node, nodeIndex) => {
    // node is reused for speed; copy it before storing.
});
cd .github/skills/heap-snapshot-analysis
node --max-old-space-size=24576 scratchpad/YYYY-MM-DD-topic/findOffsets.mjs /path/to/Heap.heapsnapshot
node --max-old-space-size=24576 scratchpad/YYYY-MM-DD-topic/streamAnalyze.mjs /path/to/Heap.heapsnapshot > scratchpad/YYYY-MM-DD-topic/streamAnalyze.out
node --max-old-space-size=24576 scratchpad/YYYY-MM-DD-topic/traceNodes.mjs /path/to/Heap.heapsnapshot 12345 67890 > scratchpad/YYYY-MM-DD-topic/traceNodes.out

2. Compare Before/After

Use compareSnapshots.ts to diff two snapshots:

import { compareSnapshots } from '../helpers/compareSnapshots.ts';

const result = compareSnapshots('/path/to/before.heapsnapshot', '/path/to/after.heapsnapshot');
// result.topBySize, result.topByCount, result.newObjectGroups, result.summary

3. Find Retainer Paths

Use findRetainers.ts to trace why an object is alive:

import { findRetainerPaths } from '../helpers/findRetainers.ts';

// Find what keeps ChatModel instances alive (skipping weak edges)
findRetainerPaths(graph, 'ChatModel', { maxPaths: 5, maxDepth: 25, maxAttempts: 200 });

4. Write Investigation Scripts

Write investigation-specific scripts in the scratchpad directory. This folder is gitignored — use it freely for one-off analysis.

Organize scratchpad work into dated subfolders named YYYY-MM-DD-short-description/ (e.g., 2026-04-09-chat-model-retainers/). Each subfolder should contain:

  • The analysis scripts (.mjs, .mts, etc.)
  • A findings.md file documenting the full investigation: all ideas considered, which ones led to changes and which were rejected (and why), before/after measurements, and a summary of the outcome. This lets the user review the agent's reasoning, decide which changes to keep, and follow up on deferred ideas.

Scripts can import the helpers:

cd .github/skills/heap-snapshot-analysis
node --max-old-space-size=16384 scratchpad/2026-04-09-chat-model-retainers/analyze.mjs

Key Concepts

V8 Heap Snapshot Format

The .heapsnapshot file is JSON with these key sections:

  • snapshot.meta: Field definitions for nodes and edges
  • nodes: Flat array, every N values = one node (N = meta.node_fields.length, typically 6: type, name, id, self_size, edge_count, detachedness)
  • edges: Flat array, every M values = one edge (M = meta.edge_fields.length, typically 3: type, name_or_index, to_node)
  • strings: String table indexed by name fields in nodes/edges

Edge Types That Matter

TypeMeaningPrevents GC?
propertyNamed JS propertyYes
elementArray indexYes
contextClosure variableYes
internalV8 internal referenceYes
hiddenV8 hidden referenceYes
weakWeakRef/WeakMap keyNo
shortcutConvenience linkDepends

Always skip weak edges when tracing retainer paths. WeakMap entries show up as edges from key → backing array, but they don't prevent collection — they're red herrings.

Common VS Code Retention Patterns

  1. RowCache templates: ListView's RowCache stores template rows. Templates have currentElement pointing to old viewmodel items. If not cleared on session switch, retains entire model chains.

  2. Resource pools: pool.clear() only disposes idle items. If _onDidUpdateViewModel.fire() runs AFTER pool.clear(), released items re-enter the empty pool and are never disposed. Fire event first, then clear.

  3. autorunIterableDelta lastValues: The closure captures a Map of previous iteration values. Values stay until the autorun re-runs. Async disposal delays keep models in observable stores longer than expected.

  4. HoverService._delayedHovers: Global singleton Map retaining disposed objects via show closure → resolveHoverOptions closure → this. If hover cleanup disposable doesn't fire, the entire object tree is retained.

  5. ObjectMutationLog._previous: The incremental serializer keeps a full snapshot of the last-serialized state. Every loaded ChatModel holds 2x its data: live + _previous.

  6. _previousModelRef pattern: MutableDisposable setter disposes the old value. Reading .value and storing it elsewhere, then setting .value = undefined, disposes the stored reference. Use clearAndLeak() to extract without disposing.

Defensive Nulling

Null heavy fields in dispose() to break retention chains even when something retains the disposed object:

override dispose() {
    super.dispose();
    this._requests.length = 0;      // conversation data
    this.dataSerializer = undefined;  // serialization snapshot
    this._editingSession = undefined; // editing session + TextModels
    this._session = undefined!;       // back-reference cycles
}

Caveat: Don't null fields on viewmodel items (ChatResponseViewModel._model). The tree's diffIdentityProvider accesses them after the parent viewmodel is disposed but before setChildren replaces them.

False Retainers to Watch For

  • DevTools debugger global handles: If the snapshot was captured after opening DevTools, large source strings, compiled scripts, preview data, inspected objects, or debugger bookkeeping can be retained by paths like DevTools debugger(internal)synthetic::(Global handles) → GC roots. Treat these as debugger-induced until proven otherwise. They may not exist in the app before DevTools opens, and they should not be confused with application-owned leaks.
  • DevToolsLogger._aliveInstances (Map): Enabled by VSCODE_DEV_DEBUG_OBSERVABLES env var. Retains ALL observed observables. Check if this is active before investigating observable-rooted paths.
  • GCBasedDisposableTracker (FinalizationRegistry): If register(target, held, target) is used (target === unregister token), creates a strong self-reference preventing GC. Currently commented out in production.
  • WeakMap backing arrays: Show up in retainer paths but don't prevent collection.

Running Analysis

All helper scripts use ESM and need Node with extra memory:

node --max-old-space-size=16384 scratchpad/analyze.mjs

Typical analysis takes 30-120 seconds per snapshot depending on size.

來自 microsoft 的更多技能

oss-growth
microsoft
開源增長駭客角色
official
microsoft-foundry
microsoft
端到端部署、評估與管理 Foundry 代理:Docker 建置、ACR 推送、託管/提示代理建立、容器啟動、批次評估、持續評估、提示最佳化工作流程、agent.yaml、從追蹤資料集整理。用途:將代理部署至 Foundry、託管代理、建立代理、調用代理、評估代理、執行批次評估、持續評估、持續監控、持續評估狀態、最佳化提示、改善提示、提示最佳化器、最佳化代理指令、改善代理...
officialdevelopmentdevops
azure-ai
microsoft
用於 Azure AI:搜尋、語音、OpenAI、文件智慧。協助搜尋、向量/混合搜尋、語音轉文字、文字轉語音、轉錄、OCR。適用情境:AI 搜尋、查詢搜尋、向量搜尋、混合搜尋、語意搜尋、語音轉文字、文字轉語音、轉錄、OCR、將文字轉換為語音。
officialdevelopmentapi
azure-deploy
microsoft
對已準備好的應用程式執行 Azure 部署,這些應用程式需具備現有的 .azure/deployment-plan.md 與基礎架構檔案。當使用者要求建立新應用程式時,請勿使用此技能——應改用 azure-prepare。此技能會執行 azd up、azd deploy、terraform apply 及 az deployment 命令,並內建錯誤復原機制。需具備來自 azure-prepare 的 .azure/deployment-plan.md,以及來自 azure-validate 的驗證狀態。適用時機:「執行 azd up」、「執行 azd deploy」、「執行部署」……
officialdevopsaws
azure-storage
microsoft
Azure Storage Services 包括 Blob 儲存體、檔案共用、佇列儲存體、表格儲存體和 Data Lake。回答關於儲存存取層(熱、冷、凍結、封存)、各層使用時機及層級比較的問題。提供物件儲存、SMB 檔案共用、非同步訊息、NoSQL 鍵值及大數據分析。包含生命週期管理。用於:blob 儲存體、檔案共用、佇列儲存體、表格儲存體、data lake、上傳檔案、下載 blob、儲存帳戶、存取層...
officialdevelopmentdatabase
azure-diagnostics
microsoft
在 Azure 上使用 AppLens、Azure Monitor、資源健康狀態和安全分類來偵錯 Azure 生產問題。適用時機:偵錯生產問題、疑難排解應用程式服務、應用程式服務高 CPU、應用程式服務部署失敗、疑難排解容器應用程式、疑難排解函數、疑難排解 AKS、kubectl 無法連線、kube-system/CoreDNS 失敗、Pod 擱置、CrashLoop、節點未就緒、升級失敗、分析記錄、KQL、深入解析、映像提取失敗、冷啟動問題、健康狀態探查失敗...
officialdevopsdevelopment
azure-prepare
microsoft
準備 Azure 應用程式以進行部署(基礎架構 Bicep/Terraform、azure.yaml、Dockerfile)。用於建立/現代化或建立+部署;不適用於跨雲端遷移(請使用 azure-cloud-migrate)。請勿用於:copilot-sdk 應用程式(請使用 azure-hosted-copilot-sdk)。適用時機:「建立應用程式」、「建置 Web 應用程式」、「建立 API」、「建立無伺服器 HTTP API」、「建立前端」、「建立後端」、「建置服務」、「現代化應用程式」、「更新應用程式」、「新增驗證」、「新增快取」、「託管於 Azure」、「建立並...」
officialdevelopmentdevops
azure-validate
microsoft
部署前驗證 Azure 就緒狀態。對設定、基礎架構(Bicep 或 Terraform)、RBAC 角色指派、受控身分權限及先決條件進行深度檢查,再進行部署。適用時機:驗證我的應用程式、檢查部署就緒狀態、執行預檢檢查、驗證設定、確認是否可部署、驗證 azure.yaml、驗證 Bicep、部署前測試、疑難排解部署錯誤、驗證 Azure Functions、驗證函式應用程式、驗證無伺服器...
officialdevopstesting