otelpor microsoft

OpenTelemetry instrumentation for the Copilot Chat extension — covers the four agent execution paths, the IOTelService abstraction, span/metric/event…

npx skills add https://github.com/microsoft/vscode --skill otel

OpenTelemetry Instrumentation Skill

When adding, changing, or reviewing OTel telemetry in the Copilot Chat extension, always read the two source-of-truth docs first and always keep them in sync with the code you change.

1. Authoritative Documents

The extensions/copilot/docs/monitoring/ directory contains the two specs that define the OTel contract for the extension. Treat them like the layout / layer specs in vs/sessions.

DocumentPathAudienceCovers
User-facingextensions/copilot/docs/monitoring/agent_monitoring.mdExtension usersQuick start, settings, env vars, exported spans/metrics/events, backend setup guides
Architectureextensions/copilot/docs/monitoring/agent_monitoring_arch.mdDevelopersMulti-agent strategies, span hierarchies, file structure, instrumentation points, IOTelService, configuration channels
Visual flowextensions/copilot/docs/monitoring/otel-data-flow.htmlDevelopersRenders the bridge data flow for the in-process Copilot CLI agent

If the implementation changes, you must update the relevant doc in the same PR. The arch doc is the most likely to drift; treat divergence as a bug.

2. Architecture at a Glance

The extension has four agent execution paths, each with a different OTel strategy:

AgentProcess ModelStrategyDebug Panel Source
Foreground (toolCallingLoop)Extension hostDirect IOTelService spansExtension spans
Copilot CLI in-processExtension host (same process)Bridge SpanProcessor — SDK creates spans natively; bridge forwards to debug panelSDK native spans via bridge
Copilot CLI terminalSeparate terminal processForward OTel env varsN/A (separate process)
Claude CodeChild process (Node fork)Synthesized from SDK messages — extension intercepts the Claude SDK message stream in claudeMessageDispatch.ts and emits GenAI spans; LLM calls are proxied through claudeLanguageModelServer.ts (which calls chatMLFetcher, producing standard chat spans).Extension spans

Why asymmetric? The CLI SDK runs in-process with full trace hierarchy (subagents, permissions, hooks). A bridge captures this directly. Claude runs as a separate process — internal spans are inaccessible, so the extension synthesizes spans by translating SDK messages and proxying the model API.

3. Where Things Live (canonical map)

extensions/copilot/src/platform/otel/
├── common/
│   ├── otelService.ts          # IOTelService interface + ISpanHandle + injectCompletedSpan
│   ├── otelConfig.ts           # Config resolution (env → settings → defaults), enabledVia, dbSpanExporter
│   ├── noopOtelService.ts      # Zero-cost no-op (used by chatLib / tests)
│   ├── inMemoryOTelService.ts  # ← actually under node/, see below
│   ├── agentOTelEnv.ts         # deriveCopilotCliOTelEnv / deriveClaudeOTelEnv
│   ├── genAiAttributes.ts      # ⚠ Single source of truth for attribute keys & enums
│   ├── genAiEvents.ts          # Event emitter helpers (emit*Event)
│   ├── genAiMetrics.ts         # GenAiMetrics class
│   ├── messageFormatters.ts    # truncateForOTel, normalizeProviderMessages, toSystemInstructions, …
│   ├── workspaceOTelMetadata.ts
│   ├── sessionUtils.ts
│   └── index.ts                # ⚠ Public barrel — re-export new helpers/constants here
└── node/
    ├── otelServiceImpl.ts      # NodeOTelService + DiagnosticSpanExporter + FilteredSpanExporter + EXPORTABLE_OPERATION_NAMES
    ├── inMemoryOTelService.ts  # InMemoryOTelService (used when OTel is disabled — feeds debug panel only)
    ├── fileExporters.ts        # File-based span/log/metric exporters
    └── sqlite/                 # OTelSqliteStore + SqliteSpanExporter (dbSpanExporter pipeline)

extensions/copilot/src/extension/
├── chatSessions/
│   ├── copilotcli/node/
│   │   ├── copilotCliBridgeSpanProcessor.ts  # Bridge: SDK spans → IOTelService (+ hook span enrichment)
│   │   ├── copilotcliSession.ts              # Root invoke_agent copilotcli span + traceparent + hook stash
│   │   └── copilotcliSessionService.ts       # Bridge installation + env var setup
│   └── claude/
│       ├── common/claudeMessageDispatch.ts   # execute_tool / execute_hook spans + subagent context wiring
│       └── node/
│           ├── claudeOTelTracker.ts          # invoke_agent claude span + per-session token/cost rollup
│           └── claudeLanguageModelServer.ts  # Local HTTP proxy → chatMLFetcher (chat spans)
├── chat/vscode-node/
│   └── chatHookService.ts                    # execute_hook spans for foreground agent hooks
├── intents/node/toolCallingLoop.ts           # invoke_agent spans for foreground agent
├── tools/vscode-node/toolsService.ts         # execute_tool spans for foreground tools
├── prompt/node/chatMLFetcher.ts              # chat spans for all LLM calls
├── byok/vscode-node/                         # BYOK provider chat spans (anthropicProvider, geminiNativeProvider, …)
└── trajectory/vscode-node/
    ├── otelChatDebugLogProvider.ts           # Debug panel data provider
    ├── otelSpanToChatDebugEvent.ts           # Span → ChatDebugEvent conversion
    └── otlpFormatConversion.ts               # OTLP ↔ in-memory span format

4. Service Layer & Selection

IOTelService (otelService.ts) is the only abstraction consumers should depend on — never import the OTel SDK directly outside node/otelServiceImpl.ts. Three implementations:

ClassWhen Used
NoopOTelServicechatLib and tests where no telemetry pipeline is needed — zero cost
NodeOTelServiceOTel enabled — full SDK, OTLP/file/console export, optional SQLite span exporter
InMemoryOTelServiceRegistered when OTel is disabled — no SDK is loaded, but spans/metrics/logs are still captured in-memory so the Agent Debug Log panel keeps working

Selection happens in src/extension/extension/vscode-node/services.ts: exactly one of NodeOTelService or InMemoryOTelService is bound to IOTelService per extension host based on resolveOTelConfig().enabled.

5. Span / Metric / Event Conventions

Follow the OTel GenAI semantic conventions. Always use the constants from genAiAttributes.ts — never raw string literals.

OperationSpan NameKindConstant
Agent orchestrationinvoke_agent {agent_name}INTERNALGenAiOperationName.INVOKE_AGENT
LLM API callchat {model}CLIENTGenAiOperationName.CHAT
Tool executionexecute_tool {tool_name}INTERNALGenAiOperationName.EXECUTE_TOOL
Hook executionexecute_hook {hook_type}INTERNALGenAiOperationName.EXECUTE_HOOK

Attribute namespaces:

NamespaceConstant moduleExamples
gen_ai.*GenAiAttrgen_ai.operation.name, gen_ai.usage.input_tokens
copilot_chat.*CopilotChatAttrcopilot_chat.session_id, copilot_chat.chat_session_id, copilot_chat.hook_*
github.copilot.*CopilotCliSdkAttrSDK-emitted hook attributes (read-only — bridge & debug panel)
claude_code.*(raw)Claude subprocess SDK attributes — only ever observed in OTLP, not produced by the extension

Standard span pattern

return this._otelService.startActiveSpan(
    `execute_tool ${name}`,
    {
        kind: SpanKind.INTERNAL,
        attributes: {
            [GenAiAttr.OPERATION_NAME]: GenAiOperationName.EXECUTE_TOOL,
            [GenAiAttr.TOOL_NAME]: name,
            // …
        },
    },
    async (span) => {
        try {
            const result = await this._actualWork();
            span.setStatus(SpanStatusCode.OK);
            return result;
        } catch (err) {
            span.setStatus(SpanStatusCode.ERROR, err instanceof Error ? err.message : String(err));
            span.setAttribute(StdAttr.ERROR_TYPE, err instanceof Error ? err.constructor.name : 'Error');
            throw err;
        }
    },
);

Cross-boundary trace propagation

// Parent: store context keyed by something the child knows
const ctx = this._otelService.getActiveTraceContext();
if (ctx) { this._otelService.storeTraceContext(`subagent:invocation:${id}`, ctx); }

// Child: retrieve and use as parent
const parentCtx = this._otelService.getStoredTraceContext(`subagent:invocation:${id}`);
return this._otelService.startActiveSpan('invoke_agent child', { parentTraceContext: parentCtx, … }, fn);

Content capture

The extension uses two conventions side-by-side; pick the right one for the attribute you're adding.

  1. Always emit (truncated) — used for inputs/outputs that the Agent Debug Log panel needs to be useful even when OTel export is off (e.g. gen_ai.tool.call.arguments in toolsService.ts, and copilot_chat.hook_input / hook_output in chatHookService.ts). The attribute is captured unconditionally but always passed through truncateForOTel. Use this for moderate-sized, generally-non-secret arguments / results.
  2. Gate on config.captureContent — used for full prompt / response / system-instruction bodies (e.g. gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, gen_ai.tool.definitions in chatMLFetcher.ts and the BYOK providers). These are larger and more likely to contain user secrets.
// Pattern 1 — always emit, always truncate
span.setAttribute(GenAiAttr.TOOL_CALL_ARGUMENTS, truncateForOTel(JSON.stringify(args)));

// Pattern 2 — gated on captureContent
if (this._otelService.config.captureContent) {
    span.setAttribute(GenAiAttr.INPUT_MESSAGES, truncateForOTel(JSON.stringify(messages)));
}

Debug panel vs OTLP isolation

Spans whose gen_ai.operation.name is not in EXPORTABLE_OPERATION_NAMES (defined in otelServiceImpl.ts) are visible to the debug panel via onDidCompleteSpan but excluded from OTLP and SQLite exporters by DiagnosticSpanExporter and FilteredSpanExporter. Currently exportable: chat, invoke_agent, execute_tool, embeddings, execute_hook. If you add a new operation name that should reach the user's collector, update EXPORTABLE_OPERATION_NAMES and document it in agent_monitoring.md.

6. Configuration Surface (must stay in sync)

When you add or change a setting/env var/command, update all three of:

  1. The setting/command registration in extensions/copilot/package.json (search for github.copilot.chat.otel).
  2. resolveOTelConfig in otelConfig.ts — if the setting affects runtime config — and the enabledVia channel if it can implicitly enable OTel.
  3. agent_monitoring.md ("VS Code Settings", "Environment Variables", "Activation", "Commands" tables) and agent_monitoring_arch.md ("Activation Channels", "Agent-Specific Env Var Translation" tables).

For sub-process env vars, also update:

  • deriveCopilotCliOTelEnv / deriveClaudeOTelEnv in agentOTelEnv.ts.
  • The corresponding tests in src/platform/otel/common/test/agentOTelEnv.spec.ts.

7. Procedure Checklists

When adding a new span / attribute

  1. Add the attribute key as a constant to genAiAttributes.ts (under GenAiAttr, CopilotChatAttr, or a new domain group). Never inline a raw 'copilot_chat.foo' literal.
  2. Add it to the public barrel in index.ts if it lives in a new group.
  3. Use IOTelService.startActiveSpan (preferred) or startSpan — never BasicTracerProvider / getTracer directly.
  4. Pass the value through truncateForOTel (mandatory for any free-form content attribute — prevents OTLP batch failures). Decide whether the attribute should be always-emitted (debug-panel-essential, e.g. tool args, hook input/output) or gated on config.captureContent (large prompt/response bodies, system instructions); follow the existing convention for similar data.
  5. If the new operation should reach OTLP, add its op-name to EXPORTABLE_OPERATION_NAMES in otelServiceImpl.ts.
  6. Document the new attribute in agent_monitoring.md (under the relevant span table) and add a test in src/platform/otel/common/test/.

When adding a new metric / event

  1. Add the helper to genAiMetrics.ts or genAiEvents.ts (mirror existing static / functional patterns).
  2. Re-export it from index.ts.
  3. Add the metric/event row to agent_monitoring.md ("Metrics" / "Events" sections) with all attributes documented.
  4. Add a unit test in src/platform/otel/common/test/genAiMetrics.spec.ts or genAiEvents.spec.ts (assert the exact name + attribute keys).

When instrumenting a new agent surface

  1. Pick a strategy: direct spans (foreground-style), bridge processor (CLI-style), or message-stream synthesis (Claude-style).
  2. Add the new emit site to the Instrumentation Points table in agent_monitoring_arch.md and the Span Hierarchies diagrams.
  3. If you forward OTel env vars to a child process, do it via a new derive*OTelEnv helper in agentOTelEnv.ts and add a row to the Agent-Specific Env Var Translation table.
  4. Wire trace propagation explicitly with storeTraceContext / parentTraceContext for any subagent or async boundary; do not rely on global active context across processes.

When changing the Copilot CLI bridge

The bridge (copilotCliBridgeSpanProcessor.ts) reaches into _delegate._activeSpanProcessor._spanProcessors — internal OTel SDK v2 state. This is documented as a known risk. If you touch it:

  • Keep the runtime guard that degrades gracefully if the internal shape changes.
  • Update the ⚠ SDK Internal Access Warning block in agent_monitoring_arch.md if the access pattern changes.
  • Add a unit test in copilotCliBridgeSpanProcessor.spec.ts.

8. Validation

Before sending a PR that touches OTel code:

# From extensions/copilot/
npx tsc --noEmit --project tsconfig.json

# OTel + Bridge unit tests
npm test -- --grep "OTel\|Bridge"

Manual sanity checks:

  • The Aspire Dashboard quick-start in agent_monitoring.md still works end-to-end (one agent message → invoke_agent + chat + execute_tool spans visible at http://localhost:18888).
  • The Agent Debug Log panel in VS Code still shows the full span tree for foreground, Copilot CLI, and Claude sessions.

9. Known Risks & Limitations

These are documented in agent_monitoring_arch.md — preserve them:

  • SDK _spanProcessors internal access (graceful runtime guard).
  • Two TracerProviders in the same process when CLI SDK is active.
  • process.env mutation for the CLI SDK (only OTel-specific vars, set before LocalSessionManager ctor).
  • Single captureContent flag for the CLI SDK applies to both debug panel and OTLP — document any user-visible change clearly.
  • Claude SDK has no file exporter, and the CLI runtime only supports otlp-http.

10. Anti-Patterns to Reject

  • ❌ Importing @opentelemetry/api (or any @opentelemetry/* package) from anywhere other than node/otelServiceImpl.ts, fileExporters.ts, or the CLI bridge processor type imports.
  • ❌ Hard-coded attribute keys: 'copilot_chat.hook_type' instead of CopilotChatAttr.HOOK_TYPE.
  • ❌ Hard-coded provider strings: 'github' / 'anthropic' / 'gemini' instead of GenAiProviderName.*.
  • ❌ Magic SpanStatusCode numbers (code: 1, code: 2) — use the enum.
  • ❌ Emitting any free-form content attribute without passing it through truncateForOTel — OTLP batches will silently drop or fail.
  • ❌ Logging full prompt / response / system-instruction bodies without config.captureContent gating (these are pattern 2 above).
  • ❌ Adding a span operation name without deciding whether it's exportable (EXPORTABLE_OPERATION_NAMES).
  • ❌ Updating instrumentation without updating agent_monitoring.md / agent_monitoring_arch.md in the same change.

NotebookLM Web Importer

Importe páginas da web e vídeos do YouTube para o NotebookLM com um clique. Confiado por mais de 200.000 usuários.

Instalar extensão do Chrome