google-agents-cli-observability

作者: google

此技能應在使用者想要「設定追蹤」、「監控我的 ADK 代理程式」、「設定記錄」、「加入可觀測性」、「偵錯正式環境流量」,或需要關於監控已部署 ADK(代理程式開發套件)代理程式的指引時使用。涵蓋 Cloud Trace、提示-回應記錄、BigQuery 代理程式分析、第三方整合(AgentOps、Phoenix、MLflow 等)以及疑難排解。屬於 Google ADK(代理程式開發套件)技能套件的一部分。請勿用於部署設定(請使用...

npx skills add https://github.com/google/agents-cli --skill google-agents-cli-observability

ADK Observability Guide

Cloud Trace works out of the box — no infrastructure needed. Prompt-response logging and BigQuery Agent Analytics require Terraform-provisioned infrastructure (service account, GCS bucket, BigQuery dataset). Run agents-cli infra single-project --project PROJECT_ID to provision these resources. See references/cloud-trace-and-logging.md for details, env vars, and verification commands. If your project isn't scaffolded yet, see /google-agents-cli-scaffold first.

Order of operations for agent_runtime deployments

For deployment_target = agent_runtime, run agents-cli infra single-project before the first agents-cli deploy. The Terraform module owns the entire Reasoning Engine resource (display_name, service account, deployment spec, env vars), so applying it after a SDK-based deploy creates a state mismatch — Terraform has no record of the SDK-deployed instance and cannot layer env vars onto it without taking ownership of the whole resource.

If you have already run agents-cli deploy, you have two options:

  1. Switch to Terraform-managed. Delete the SDK-deployed Reasoning Engine, then run agents-cli infra single-project followed by agents-cli deploy. Sessions and any in-flight state on the previous instance are lost.
  2. Keep the SDK-deployed instance. Skip infra single-project and set the observability env vars on the running instance directly via the vertexai client update API. You will also need to grant the instance's service account the IAM permissions required to emit telemetry — writing to the logs GCS bucket, BigQuery dataset access, log writer, etc. See deployment/terraform/single-project/iam.tf and telemetry.tf in your scaffolded project for the full set of bindings the Terraform module would otherwise provision. Terraform-managed env vars are not available in this mode.

Reference Files

FileContents
references/cloud-trace-and-logging.mdScaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally
references/bigquery-agent-analytics.mdBQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance

Observability Tiers

Choose the right level of observability based on your needs:

TierWhat It DoesScopeDefault StateBest For
Cloud TraceDistributed tracing — execution flow, latency, errors via OpenTelemetry spansAll templates, all environmentsAlways enabledDebugging latency, understanding agent execution flow
Prompt-Response LoggingGenAI interactions exported to GCS, BigQuery, and Cloud LoggingADK agents onlyDisabled locally, enabled when deployedAuditing LLM interactions, compliance
BigQuery Agent AnalyticsStructured agent events (LLM calls, tool use, outcomes) to BigQueryADK agents with plugin enabledOpt-in (--bq-analytics at scaffold time)Conversational analytics, custom dashboards, LLM-as-judge evals
Third-Party IntegrationsExternal observability platforms (AgentOps, Phoenix, MLflow, etc.)Any ADK agentOpt-in, per-provider setupTeam collaboration, specialized visualization, prompt management

Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.


Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.

Span Hierarchy

invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)

Setup by Deployment Type

DeploymentSetup
Agent RuntimeAutomatic — traces are exported to Cloud Trace by default
Cloud Run (scaffolded)Automatic — otel_to_cloud=True in the FastAPI app
GKE (scaffolded)Automatic — otel_to_cloud=True in the FastAPI app
Cloud Run / GKE (manual)Configure OpenTelemetry exporter in your app
Local devWorks with agents-cli playground; traces visible in Cloud Console

View traces: Cloud Console → Trace → Trace explorer

For detailed setup instructions (Agent Runtime CLI/SDK, Cloud Run, custom deployments), fetch https://adk.dev/integrations/cloud-trace/index.md.


Prompt-Response Logging

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL) and BigQuery (via direct log sinks and external tables). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.

Key env var: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT — set to NO_CONTENT (metadata only, default in deployed envs), true (full content), or false (disabled). Logging is disabled locally unless LOGS_BUCKET_NAME is set.

For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see references/cloud-trace-and-logging.md.

For ADK logging docs (log levels, configuration, debugging), fetch https://adk.dev/observability/logging/index.md.


BigQuery Agent Analytics Plugin

Optional plugin that logs structured agent events to BigQuery. Enable with --bq-analytics at scaffold time. See references/bigquery-agent-analytics.md for details.


Third-Party Integrations

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.

PlatformKey DifferentiatorSetup ComplexitySelf-Hosted Option
AgentOpsSession replays, 2-line setup, replaces native telemetryMinimalNo (SaaS)
Arize AXCommercial platform, production monitoring, evaluation dashboardsLowNo (SaaS)
PhoenixOpen-source, custom evaluators, experiment testingLowYes
MLflowOTel traces to MLflow Tracking Server, span tree visualizationMedium (needs SQL backend)Yes
Monocle1-call setup, VS Code Gantt chart visualizerMinimalYes (local files)
WeaveW&B platform, team collaboration, timeline viewsLowNo (SaaS)
FreeplayPrompt management + evals + observability in one platformLowNo (SaaS)

Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.


Troubleshooting

IssueSolution
No traces in Cloud TraceVerify otel_to_cloud=True in FastAPI app; check service account has cloudtrace.agent role
Prompt-response data not appearingCheck LOGS_BUCKET_NAME is set; verify SA has storage.objectCreator on the bucket; check app logs for telemetry setup warnings
Privacy mode misconfiguredCheck OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT value — use NO_CONTENT for metadata-only, false to disable
BigQuery Analytics not loggingVerify plugin is configured in app/agent.py; check BQ_ANALYTICS_DATASET_ID env var is set
Third-party integration not capturing spansCheck provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry
Traces missing tool spansTool execution spans appear under execute_tool — check trace explorer filters
High telemetry costsSwitch to NO_CONTENT mode; reduce BigQuery retention; disable unused tiers

Deep Dive: ADK Docs (WebFetch URLs)

For detailed documentation beyond what this skill covers, fetch these pages:

TopicURL
Observability overviewhttps://adk.dev/observability/index.md
Agent activity logginghttps://adk.dev/observability/logging/index.md
Cloud Trace integrationhttps://adk.dev/integrations/cloud-trace/index.md
BigQuery Agent Analyticshttps://adk.dev/integrations/bigquery-agent-analytics/index.md
AgentOpshttps://adk.dev/integrations/agentops/index.md
Arize AXhttps://adk.dev/integrations/arize-ax/index.md
Phoenix (Arize)https://adk.dev/integrations/phoenix/index.md
MLflow tracinghttps://adk.dev/integrations/mlflow-tracing/index.md
Monoclehttps://adk.dev/integrations/monocle/index.md
W&B Weavehttps://adk.dev/integrations/weave/index.md
Freeplayhttps://adk.dev/integrations/freeplay/index.md

Related Skills

  • /google-agents-cli-deploy — Deployment targets, CI/CD pipelines, and production workflows
  • /google-agents-cli-workflow — Development workflow, coding guidelines, and operational rules
  • /google-agents-cli-adk-code — ADK Python API quick reference for writing agent code

來自 google 的更多技能

google-agents-cli-adk-code
google
當使用者想要「撰寫代理程式碼」、「使用 ADK 建置代理程式」、「新增工具」、「建立回呼」、「定義代理程式」、「使用狀態管理」,或需要 ADK(代理程式開發套件)Python API 模式與程式碼範例時,應使用此技能。屬於 Google ADK 技能套件的一部分。提供代理程式類型、工具定義、編排模式、回呼與狀態管理的快速參考。請勿用於建立新專案(請使用 google-agents-cli-scaffold)或部署...
developmentapicode-review
google-agents-cli-eval
google
當使用者想要「執行評估」、「評估我的 ADK 代理」、「撰寫評估資料集」、「分析評估失敗原因」、「比較評估結果」、「優化代理」,或需要 Agent Platform 評估方法論與品質飛輪的指引時,應使用此技能。涵蓋評估指標、資料集結構、LLM 作為評審的評分方式,以及常見的失敗原因。請勿用於 API 程式碼模式(請使用 google-agents-cli-adk-code)、部署(請使用 google-agents-cli-deploy)或專案架構生成(請使用...
developmenttestingdata-analysis
google-agents-cli-workflow
google
此技能應在使用者想要「開發代理」、「使用 ADK 建置代理」、「在本機執行代理」、「除錯代理程式碼」、「測試代理」、「部署代理」、「發布代理」、「監控代理」,或需要 ADK(代理開發套件)開發生命週期與編碼指南時使用。為建置 ADK 代理的進入點。始終啟用 — 提供完整工作流程(建立框架、建置、評估、部署、發布、監控)、程式碼保留規則、模型選擇指引,以及...
developmentdevopstesting
google-agents-cli-deploy
google
當使用者想要「部署代理程式」、「部署我的ADK代理程式」、「設定CI/CD」、「設定機密」、「疑難排解部署」,或需要關於Agent Runtime、Cloud Run或GKE部署目標的指引時,應使用此技能。涵蓋部署工作流程、服務帳戶、回滾及生產基礎架構。屬於Google ADK(代理程式開發套件)技能套件的一部分。請勿用於API程式碼模式(請使用google-agents-cli-adk-code)、評估(請使用google-agents-cli-eval)或...
developmentdevops
google-agents-cli-scaffold
google
This skill should be used when the user wants to "create an agent project", "start a new ADK project", "build me a new agent", "add CI/CD to my project", "add deployment", "enhance my project", or "upgrade my project". Part of the Google ADK (Agent Development Kit) skills suite. Covers `agents-cli scaffold create`, `scaffold enhance`, and `scaffold upgrade` commands, template options, deployment targets, and the prototype-first workflow. Do NOT use for writing agent code (use...
developmentdevops
google-agents-cli-publish
google
當使用者想要「發布代理程式」、「發布我的ADK代理程式」、「向Gemini Enterprise註冊代理程式」、「發布到Gemini Enterprise」,或需要關於agents-cli publish gemini-enterprise指令的指引時,應使用此技能。涵蓋ADK與A2A註冊模式、程式化與互動式使用方式、旗標參考、從部署中繼資料自動偵測,以及疑難排解。屬於Google ADK(代理程式開發套件)技能套件的一部分。請勿用於部署(請使用...
developmentdevopsapi