otel

OpenTelemetry-Instrumentierung für die Copilot Chat-Erweiterung – umfasst die vier Agenten-Ausführungspfade, die IOTelService-Abstraktion, Span/Metric/Event…

npx skills add https://github.com/microsoft/vscode --skill otel

OpenTelemetry Instrumentation Skill

When adding, changing, or reviewing OTel telemetry in the Copilot Chat extension, always read the two source-of-truth docs first and always keep them in sync with the code you change.

1. Authoritative Documents

The extensions/copilot/docs/monitoring/ directory contains the two specs that define the OTel contract for the extension. Treat them like the layout / layer specs in vs/sessions.

DocumentPathAudienceCovers
User-facingextensions/copilot/docs/monitoring/agent_monitoring.mdExtension usersQuick start, settings, env vars, exported spans/metrics/events, backend setup guides
Architectureextensions/copilot/docs/monitoring/agent_monitoring_arch.mdDevelopersMulti-agent strategies, span hierarchies, file structure, instrumentation points, IOTelService, configuration channels
Visual flowextensions/copilot/docs/monitoring/otel-data-flow.htmlDevelopersRenders the bridge data flow for the in-process Copilot CLI agent

If the implementation changes, you must update the relevant doc in the same PR. The arch doc is the most likely to drift; treat divergence as a bug.

2. Architecture at a Glance

The extension has four agent execution paths, each with a different OTel strategy:

AgentProcess ModelStrategyDebug Panel Source
Foreground (toolCallingLoop)Extension hostDirect IOTelService spansExtension spans
Copilot CLI in-processExtension host (same process)Bridge SpanProcessor — SDK creates spans natively; bridge forwards to debug panelSDK native spans via bridge
Copilot CLI terminalSeparate terminal processForward OTel env varsN/A (separate process)
Claude CodeChild process (Node fork)Synthesized from SDK messages — extension intercepts the Claude SDK message stream in claudeMessageDispatch.ts and emits GenAI spans; LLM calls are proxied through claudeLanguageModelServer.ts (which calls chatMLFetcher, producing standard chat spans).Extension spans

Why asymmetric? The CLI SDK runs in-process with full trace hierarchy (subagents, permissions, hooks). A bridge captures this directly. Claude runs as a separate process — internal spans are inaccessible, so the extension synthesizes spans by translating SDK messages and proxying the model API.

3. Where Things Live (canonical map)

extensions/copilot/src/platform/otel/
├── common/
│   ├── otelService.ts          # IOTelService interface + ISpanHandle + injectCompletedSpan
│   ├── otelConfig.ts           # Config resolution (env → settings → defaults), enabledVia, dbSpanExporter
│   ├── noopOtelService.ts      # Zero-cost no-op (used by chatLib / tests)
│   ├── inMemoryOTelService.ts  # ← actually under node/, see below
│   ├── agentOTelEnv.ts         # deriveCopilotCliOTelEnv / deriveClaudeOTelEnv
│   ├── genAiAttributes.ts      # ⚠ Single source of truth for attribute keys & enums
│   ├── genAiEvents.ts          # Event emitter helpers (emit*Event)
│   ├── genAiMetrics.ts         # GenAiMetrics class
│   ├── messageFormatters.ts    # truncateForOTel, normalizeProviderMessages, toSystemInstructions, …
│   ├── workspaceOTelMetadata.ts
│   ├── sessionUtils.ts
│   └── index.ts                # ⚠ Public barrel — re-export new helpers/constants here
└── node/
    ├── otelServiceImpl.ts      # NodeOTelService + DiagnosticSpanExporter + FilteredSpanExporter + EXPORTABLE_OPERATION_NAMES
    ├── inMemoryOTelService.ts  # InMemoryOTelService (used when OTel is disabled — feeds debug panel only)
    ├── fileExporters.ts        # File-based span/log/metric exporters
    └── sqlite/                 # OTelSqliteStore + SqliteSpanExporter (dbSpanExporter pipeline)

extensions/copilot/src/extension/
├── chatSessions/
│   ├── copilotcli/node/
│   │   ├── copilotCliBridgeSpanProcessor.ts  # Bridge: SDK spans → IOTelService (+ hook span enrichment)
│   │   ├── copilotcliSession.ts              # Root invoke_agent copilotcli span + traceparent + hook stash
│   │   └── copilotcliSessionService.ts       # Bridge installation + env var setup
│   └── claude/
│       ├── common/claudeMessageDispatch.ts   # execute_tool / execute_hook spans + subagent context wiring
│       └── node/
│           ├── claudeOTelTracker.ts          # invoke_agent claude span + per-session token/cost rollup
│           └── claudeLanguageModelServer.ts  # Local HTTP proxy → chatMLFetcher (chat spans)
├── chat/vscode-node/
│   └── chatHookService.ts                    # execute_hook spans for foreground agent hooks
├── intents/node/toolCallingLoop.ts           # invoke_agent spans for foreground agent
├── tools/vscode-node/toolsService.ts         # execute_tool spans for foreground tools
├── prompt/node/chatMLFetcher.ts              # chat spans for all LLM calls
├── byok/vscode-node/                         # BYOK provider chat spans (anthropicProvider, geminiNativeProvider, …)
└── trajectory/vscode-node/
    ├── otelChatDebugLogProvider.ts           # Debug panel data provider
    ├── otelSpanToChatDebugEvent.ts           # Span → ChatDebugEvent conversion
    └── otlpFormatConversion.ts               # OTLP ↔ in-memory span format

3a. Attribute namespaces & dual-emit policy

Three namespaces coexist on extension-emitted spans:

NamespacePurposeStatus
gen_ai.*OTel GenAI Semantic Conventions. Use whenever a standard key exists.Canonical
github.copilot.*Copilot-specific vendor namespace.Preferred — new attributes go here.
copilot_chat.*Original VS Code-only namespace. Several keys remain for backwards compatibility.Legacy — keep emitting; do not add new keys here.

Dual-emit rules

  • When adding a new attribute that belongs to Copilot's vendor namespace, emit it under github.copilot.* only — do not introduce a copilot_chat.* twin.
  • When renaming an existing copilot_chat.* attribute to its github.copilot.* equivalent (e.g., copilot_chat.repo.*github.copilot.git.*, gen_ai.usage.reasoning_tokensgen_ai.usage.reasoning.output_tokens), dual-emit both keys indefinitely. Downstream readers (Agent Debug Log, Chronicle, SQLite span store, OTLP collectors) may depend on the legacy key.
  • Mark the legacy row in agent_monitoring.md with Legacy in the "Requirement" column and a pointer to the preferred key. No sunset date — legacy keys live on indefinitely.
  • Hash sensitive identifiers (e.g., MCP server names) with hashTelemetryValue from util/node/crypto.ts. Emit hashes unconditionally; raw values only when captureContent is enabled.

4. Service Layer & Selection

IOTelService (otelService.ts) is the only abstraction consumers should depend on — never import the OTel SDK directly outside node/otelServiceImpl.ts. Three implementations:

ClassWhen Used
NoopOTelServicechatLib and tests where no telemetry pipeline is needed — zero cost
NodeOTelServiceOTel enabled — full SDK, OTLP/file/console export, optional SQLite span exporter
InMemoryOTelServiceRegistered when OTel is disabled — no SDK is loaded, but spans/metrics/logs are still captured in-memory so the Agent Debug Log panel keeps working

Selection happens in src/extension/extension/vscode-node/services.ts: exactly one of NodeOTelService or InMemoryOTelService is bound to IOTelService per extension host based on resolveOTelConfig().enabled.

5. Span / Metric / Event Conventions

Follow the OTel GenAI semantic conventions. Always use the constants from genAiAttributes.ts — never raw string literals.

OperationSpan NameKindConstant
Agent orchestrationinvoke_agent {agent_name}INTERNALGenAiOperationName.INVOKE_AGENT
LLM API callchat {model}CLIENTGenAiOperationName.CHAT
Tool executionexecute_tool {tool_name}INTERNALGenAiOperationName.EXECUTE_TOOL
Hook executionexecute_hook {hook_type}INTERNALGenAiOperationName.EXECUTE_HOOK

Attribute namespaces:

NamespaceConstant moduleExamples
gen_ai.*GenAiAttrgen_ai.operation.name, gen_ai.usage.input_tokens
copilot_chat.*CopilotChatAttrcopilot_chat.session_id, copilot_chat.chat_session_id, copilot_chat.hook_*
github.copilot.*CopilotCliSdkAttrSDK-emitted hook attributes (read-only — bridge & debug panel)
claude_code.*(raw)Claude subprocess SDK attributes — only ever observed in OTLP, not produced by the extension

Standard span pattern

return this._otelService.startActiveSpan(
    `execute_tool ${name}`,
    {
        kind: SpanKind.INTERNAL,
        attributes: {
            [GenAiAttr.OPERATION_NAME]: GenAiOperationName.EXECUTE_TOOL,
            [GenAiAttr.TOOL_NAME]: name,
            // …
        },
    },
    async (span) => {
        try {
            const result = await this._actualWork();
            span.setStatus(SpanStatusCode.OK);
            return result;
        } catch (err) {
            span.setStatus(SpanStatusCode.ERROR, err instanceof Error ? err.message : String(err));
            span.setAttribute(StdAttr.ERROR_TYPE, err instanceof Error ? err.constructor.name : 'Error');
            throw err;
        }
    },
);

Cross-boundary trace propagation

// Parent: store context keyed by something the child knows
const ctx = this._otelService.getActiveTraceContext();
if (ctx) { this._otelService.storeTraceContext(`subagent:invocation:${id}`, ctx); }

// Child: retrieve and use as parent
const parentCtx = this._otelService.getStoredTraceContext(`subagent:invocation:${id}`);
return this._otelService.startActiveSpan('invoke_agent child', { parentTraceContext: parentCtx, … }, fn);

Content capture

The extension uses two conventions side-by-side; pick the right one for the attribute you're adding.

  1. Always emit (truncated) — used for inputs/outputs that the Agent Debug Log panel needs to be useful even when OTel export is off (e.g. gen_ai.tool.call.arguments in toolsService.ts, and copilot_chat.hook_input / hook_output in chatHookService.ts). The attribute is captured unconditionally but always passed through truncateForOTel. Use this for moderate-sized, generally-non-secret arguments / results.
  2. Gate on config.captureContent — used for full prompt / response / system-instruction bodies (e.g. gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions, gen_ai.tool.definitions in chatMLFetcher.ts and the BYOK providers). These are larger and more likely to contain user secrets.
// Pattern 1 — always emit, always truncate
span.setAttribute(GenAiAttr.TOOL_CALL_ARGUMENTS, truncateForOTel(JSON.stringify(args)));

// Pattern 2 — gated on captureContent
if (this._otelService.config.captureContent) {
    span.setAttribute(GenAiAttr.INPUT_MESSAGES, truncateForOTel(JSON.stringify(messages)));
}

Debug panel vs OTLP isolation

Spans whose gen_ai.operation.name is not in EXPORTABLE_OPERATION_NAMES (defined in otelServiceImpl.ts) are visible to the debug panel via onDidCompleteSpan but excluded from OTLP and SQLite exporters by DiagnosticSpanExporter and FilteredSpanExporter. Currently exportable: chat, invoke_agent, execute_tool, embeddings, execute_hook. If you add a new operation name that should reach the user's collector, update EXPORTABLE_OPERATION_NAMES and document it in agent_monitoring.md.

6. Configuration Surface (must stay in sync)

When you add or change a setting/env var/command, update all three of:

  1. The setting/command registration in extensions/copilot/package.json (search for github.copilot.chat.otel).
  2. resolveOTelConfig in otelConfig.ts — if the setting affects runtime config — and the enabledVia channel if it can implicitly enable OTel.
  3. agent_monitoring.md ("VS Code Settings", "Environment Variables", "Activation", "Commands" tables) and agent_monitoring_arch.md ("Activation Channels", "Agent-Specific Env Var Translation" tables).

For sub-process env vars, also update:

  • deriveCopilotCliOTelEnv / deriveClaudeOTelEnv in agentOTelEnv.ts.
  • The corresponding tests in src/platform/otel/common/test/agentOTelEnv.spec.ts.

7. Procedure Checklists

When adding a new span / attribute

  1. Add the attribute key as a constant to genAiAttributes.ts (under GenAiAttr, CopilotChatAttr, or a new domain group). Never inline a raw 'copilot_chat.foo' literal.
  2. Add it to the public barrel in index.ts if it lives in a new group.
  3. Use IOTelService.startActiveSpan (preferred) or startSpan — never BasicTracerProvider / getTracer directly.
  4. Pass the value through truncateForOTel (mandatory for any free-form content attribute — prevents OTLP batch failures). Decide whether the attribute should be always-emitted (debug-panel-essential, e.g. tool args, hook input/output) or gated on config.captureContent (large prompt/response bodies, system instructions); follow the existing convention for similar data.
  5. If the new operation should reach OTLP, add its op-name to EXPORTABLE_OPERATION_NAMES in otelServiceImpl.ts.
  6. Document the new attribute in agent_monitoring.md (under the relevant span table) and add a test in src/platform/otel/common/test/.

When adding a new metric / event

  1. Add the helper to genAiMetrics.ts or genAiEvents.ts (mirror existing static / functional patterns).
  2. Re-export it from index.ts.
  3. Add the metric/event row to agent_monitoring.md ("Metrics" / "Events" sections) with all attributes documented.
  4. Add a unit test in src/platform/otel/common/test/genAiMetrics.spec.ts or genAiEvents.spec.ts (assert the exact name + attribute keys).

When instrumenting a new agent surface

  1. Pick a strategy: direct spans (foreground-style), bridge processor (CLI-style), or message-stream synthesis (Claude-style).
  2. Add the new emit site to the Instrumentation Points table in agent_monitoring_arch.md and the Span Hierarchies diagrams.
  3. If you forward OTel env vars to a child process, do it via a new derive*OTelEnv helper in agentOTelEnv.ts and add a row to the Agent-Specific Env Var Translation table.
  4. Wire trace propagation explicitly with storeTraceContext / parentTraceContext for any subagent or async boundary; do not rely on global active context across processes.

When changing the Copilot CLI bridge

The bridge (copilotCliBridgeSpanProcessor.ts) reaches into _delegate._activeSpanProcessor._spanProcessors — internal OTel SDK v2 state. This is documented as a known risk. If you touch it:

  • Keep the runtime guard that degrades gracefully if the internal shape changes.
  • Update the ⚠ SDK Internal Access Warning block in agent_monitoring_arch.md if the access pattern changes.
  • Add a unit test in copilotCliBridgeSpanProcessor.spec.ts.

8. Validation

Before sending a PR that touches OTel code:

# From extensions/copilot/
npx tsc --noEmit --project tsconfig.json

# OTel + Bridge unit tests
npm test -- --grep "OTel\|Bridge"

Manual sanity checks:

  • The Aspire Dashboard quick-start in agent_monitoring.md still works end-to-end (one agent message → invoke_agent + chat + execute_tool spans visible at http://localhost:18888).
  • The Agent Debug Log panel in VS Code still shows the full span tree for foreground, Copilot CLI, and Claude sessions.

9. Known Risks & Limitations

These are documented in agent_monitoring_arch.md — preserve them:

  • SDK _spanProcessors internal access (graceful runtime guard).
  • Two TracerProviders in the same process when CLI SDK is active.
  • process.env mutation for the CLI SDK (only OTel-specific vars, set before LocalSessionManager ctor).
  • Single captureContent flag for the CLI SDK applies to both debug panel and OTLP — document any user-visible change clearly.
  • Claude SDK has no file exporter, and the CLI runtime only supports otlp-http.

10. Anti-Patterns to Reject

  • ❌ Importing @opentelemetry/api (or any @opentelemetry/* package) from anywhere other than node/otelServiceImpl.ts, fileExporters.ts, or the CLI bridge processor type imports.
  • ❌ Hard-coded attribute keys: 'copilot_chat.hook_type' instead of CopilotChatAttr.HOOK_TYPE.
  • ❌ Hard-coded provider strings: 'github' / 'anthropic' / 'gemini' instead of GenAiProviderName.*.
  • ❌ Magic SpanStatusCode numbers (code: 1, code: 2) — use the enum.
  • ❌ Emitting any free-form content attribute without passing it through truncateForOTel — OTLP batches will silently drop or fail.
  • ❌ Logging full prompt / response / system-instruction bodies without config.captureContent gating (these are pattern 2 above).
  • ❌ Adding a span operation name without deciding whether it's exportable (EXPORTABLE_OPERATION_NAMES).
  • ❌ Updating instrumentation without updating agent_monitoring.md / agent_monitoring_arch.md in the same change.

Mehr Skills von microsoft

oss-growth
microsoft
OSS-Wachstums-Hacker-Persona
official
microsoft-foundry
microsoft
Foundry-Agenten end-to-end bereitstellen, evaluieren und verwalten: Docker-Build, ACR-Push, gehostete/Prompt-Agenten erstellen, Container starten, Batch-Evaluierung, kontinuierliche Evaluierung, Prompt-Optimizer-Workflows, agent.yaml, Datensatzkuration aus Traces. VERWENDUNG FÜR: Agent in Foundry bereitstellen, gehosteten Agenten, Agenten erstellen, Agenten aufrufen, Agenten evaluieren, Batch-Evaluierung ausführen, kontinuierliche Evaluierung, kontinuierliches Monitoring, Status der kontinuierlichen Evaluierung, Prompt optimieren, Prompt verbessern, Prompt-Optimizer, Agentenanweisungen optimieren, Agenten verbessern...
officialdevelopmentdevops
azure-ai
microsoft
Verwendung für Azure AI: Suche, Sprache, OpenAI, Dokumentenintelligenz. Hilft bei Suche, Vektor-/Hybridsuche, Sprach-zu-Text, Text-zu-Sprache, Transkription, OCR. WANN: KI-Suche, Abfragesuche, Vektorsuche, Hybridsuche, semantische Suche, Sprach-zu-Text, Text-zu-Sprache, Transkribieren, OCR, Text in Sprache umwandeln.
officialdevelopmentapi
azure-deploy
microsoft
Führen Sie Azure-Bereitstellungen für BEREITS VORBEREITETE Anwendungen aus, die vorhandene .azure/deployment-plan.md- und Infrastrukturdateien haben. Verwenden Sie diese Fähigkeit NICHT, wenn der Benutzer darum bittet, eine neue Anwendung zu ERSTELLEN – verwenden Sie stattdessen azure-prepare. Diese Fähigkeit führt azd up, azd deploy, terraform apply und az deployment-Befehle mit integrierter Fehlerbehebung aus. Erfordert .azure/deployment-plan.md von azure-prepare und validierten Status von azure-validate. WANN: "run azd up", "run azd deploy", "execute deployment",...
officialdevopsaws
azure-storage
microsoft
Azure Storage-Dienste, darunter Blob Storage, Dateifreigaben, Queue Storage, Table Storage und Data Lake. Beantwortet Fragen zu Speicherzugriffsebenen (heiß, kühl, kalt, Archiv), wann welche Ebene verwendet werden sollte, und zum Vergleich der Ebenen. Bietet Objektspeicher, SMB-Dateifreigaben, asynchrone Nachrichtenübermittlung, NoSQL-Schlüssel-Wert und Big-Data-Analysen. Beinhaltet Lebenszyklusverwaltung. VERWENDUNG FÜR: Blob-Speicher, Dateifreigaben, Queue-Speicher, Table-Speicher, Data Lake, Dateien hochladen, Blobs herunterladen, Speicherkonten, Zugriffsebenen,...
officialdevelopmentdatabase
azure-diagnostics
microsoft
Debuggen von Azure-Produktionsproblemen mit AppLens, Azure Monitor, Ressourcenintegrität und sicherer Triage. WANN: Debuggen von Produktionsproblemen, Fehlerbehebung bei App Service, hohe CPU-Auslastung im App Service, Fehler bei der App Service-Bereitstellung, Fehlerbehebung bei Container-Apps, Fehlerbehebung bei Functions, Fehlerbehebung bei AKS, kubectl kann keine Verbindung herstellen, kube-system/CoreDNS-Fehler, ausstehende Pods, Crashloop, Knoten nicht bereit, Upgrade-Fehler, Analyse von Protokollen, KQL, Einblicke, Fehler beim Image-Pull, Probleme mit Kaltstarts, Fehler bei Integritätsprüfungen,...
officialdevopsdevelopment
azure-prepare
microsoft
Bereiten Sie Azure-Apps für die Bereitstellung vor (Infra Bicep/Terraform, azure.yaml, Dockerfiles). Verwenden Sie für Erstellen/Modernisieren oder Erstellen+Bereitstellen; nicht für Cross-Cloud-Migration (verwenden Sie azure-cloud-migrate). NICHT VERWENDEN FÜR: Copilot-SDK-Apps (verwenden Sie azure-hosted-copilot-sdk). WANN: "App erstellen", "Web-App erstellen", "API erstellen", "serverlose HTTP-API erstellen", "Frontend erstellen", "Backend erstellen", "Dienst erstellen", "Anwendung modernisieren", "Anwendung aktualisieren", "Authentifizierung hinzufügen", "Caching hinzufügen", "auf Azure hosten", "erstellen und...
officialdevelopmentdevops
azure-validate
microsoft
Vor der Bereitstellung durchgeführte Validierung der Azure-Bereitschaft. Führen Sie umfassende Prüfungen der Konfiguration, Infrastruktur (Bicep oder Terraform), RBAC-Rollenzuweisungen, verwalteten Identitätsberechtigungen und Voraussetzungen durch, bevor Sie bereitstellen. WANN: meine App validieren, Bereitstellungsbereitschaft prüfen, Preflight-Prüfungen durchführen, Konfiguration verifizieren, prüfen, ob bereit zur Bereitstellung, azure.yaml validieren, Bicep validieren, vor der Bereitstellung testen, Bereitstellungsfehler beheben, Azure Functions validieren, Funktionen-App validieren, serverlos validieren...
officialdevopstesting