spark-consumption-cli

Verificação de Atualização — UMA VEZ POR SESSÃO (obrigatório) Na primeira vez que esta habilidade for usada em uma sessão, execute a habilidade check-updates antes de prosseguir.

npx skills add https://github.com/microsoft/skills-for-fabric --skill spark-consumption-cli

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.

  • GitHub Copilot CLI / VS Code: invoke the check-updates skill.
  • Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
  • Skip if the check was already performed earlier in this session.

CRITICAL NOTES

  1. To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
  2. To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering

Data Engineering Consumption — CLI Skill

Table of Contents

TaskReferenceNotes
Fabric Topology & Key ConceptsCOMMON-CORE.md § Fabric Topology & Key Concepts
Environment URLsCOMMON-CORE.md § Environment URLs
Authentication & Token AcquisitionCOMMON-CORE.md § Authentication & Token AcquisitionWrong audience = 401; read before any auth issue
Core Control-Plane REST APIsCOMMON-CORE.md § Core Control-Plane REST APIs
PaginationCOMMON-CORE.md § Pagination
Long-Running Operations (LRO)COMMON-CORE.md § Long-Running Operations (LRO)
Rate Limiting & ThrottlingCOMMON-CORE.md § Rate Limiting & Throttling
OneLake Data AccessCOMMON-CORE.md § OneLake Data AccessRequires storage.azure.com token, not Fabric token
Job ExecutionCOMMON-CORE.md § Job Execution
Capacity ManagementCOMMON-CORE.md § Capacity Management
Gotchas & TroubleshootingCOMMON-CORE.md § Gotchas & Troubleshooting
Best PracticesCOMMON-CORE.md § Best Practices
Tool Selection RationaleCOMMON-CLI.md § Tool Selection Rationale
Finding Workspaces and Items in FabricCOMMON-CLI.md § Finding Workspaces and Items in FabricMandatoryREAD link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id]
Authentication RecipesCOMMON-CLI.md § Authentication Recipesaz login flows and token acquisition
Fabric Control-Plane API via az restCOMMON-CLI.md § Fabric Control-Plane API via az restAlways pass --resource https://api.fabric.microsoft.com or az rest fails
Pagination PatternCOMMON-CLI.md § Pagination Pattern
Long-Running Operations (LRO) PatternCOMMON-CLI.md § Long-Running Operations (LRO) Pattern
OneLake Data Access via curlCOMMON-CLI.md § OneLake Data Access via curlUse curl not az rest (different token audience)
SQL / TDS Data-Plane AccessCOMMON-CLI.md § SQL / TDS Data-Plane Accesssqlcmd (Go) connect, query, CSV export
Job Execution (CLI)COMMON-CLI.md § Job Execution
OneLake ShortcutsCOMMON-CLI.md § OneLake Shortcuts
Capacity Management (CLI)COMMON-CLI.md § Capacity Management
Composite RecipesCOMMON-CLI.md § Composite Recipes
Gotchas & Troubleshooting (CLI-Specific)COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific)az rest audience, shell escaping, token expiry
Quick Reference: az rest TemplateCOMMON-CLI.md § Quick Reference: az rest Template
Quick Reference: Token Audience / CLI Tool MatrixCOMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool MatrixWhich --resource + tool for each service
Relationship to SPARK-AUTHORING-CORE.mdSPARK-CONSUMPTION-CORE.md § Relationship to SPARK-AUTHORING-CORE.md
Data Engineering Consumption Capability MatrixSPARK-CONSUMPTION-CORE.md § Data Engineering Consumption Capability Matrix
OneLake Table APIs (Schema-enabled Lakehouses)SPARK-CONSUMPTION-CORE.md § OneLake Table APIs (Schema-enabled Lakehouses)Unity Catalog-compatible metadata; requires storage.azure.com token
Lakehouse Livy Session ManagementSPARK-CONSUMPTION-CORE.md § Livy Session ManagementLakehouse Livy API: session creation, states, lifecycle, termination
Interactive Data ExplorationSPARK-CONSUMPTION-CORE.md § Interactive Data ExplorationStatement execution, output retrieval, data discovery
PySpark Analytics PatternsSPARK-CONSUMPTION-CORE.md § PySpark Analytics PatternsCross-lakehouse 3-part naming, performance optimization
Must/Prefer/AvoidSKILL.md § Must/Prefer/AvoidMUST DO / AVOID / PREFER checklists
Quick StartSKILL.md § Quick StartCLI-specific Lakehouse Livy session setup and data exploration
Key Fabric PatternsSKILL.md § Key Fabric PatternsSpark pattern quick-reference table
Session CleanupSKILL.md § Session CleanupClean up idle Lakehouse Livy sessions via CLI

Must/Prefer/Avoid

MUST DO

  • Check for existing idle sessions before creating new ones
  • Use dynamic workspace/lakehouse discovery
  • Follow API patterns from COMMON-CLI.md

PREFER

  • sqldw-consumption-cli for simple lakehouse queries — row counts, SELECT, schema exploration, filtering, and aggregation on lakehouse Delta tables should use the SQL Endpoint via sqlcmd, not Spark. Only use this skill when the user explicitly requests PySpark, DataFrames, or Spark-specific features.
  • SQL Endpoint for Delta tables
  • Livy for unstructured/JSON data or complex Python analytics
  • Session reuse over creation

AVOID

  • Hardcoded workspace IDs
  • Creating unnecessary sessions
  • Large result sets without LIMIT
  • Confusing Lakehouse Livy sessions with Notebook Spark sessions — This skill covers Lakehouse Livy sessions (the public Livy API at /lakehouses/{lhId}/livyapi/.../sessions). Notebook Spark sessions are created internally when running a notebook via the Jobs API (RunNotebook) and are NOT managed through the Livy API. To run a notebook as a job, see SPARK-AUTHORING-CORE.md § Notebook Execution & Job Management

Quick Start

Environment Setup

Apply environment detection from COMMON-CORE.md Environment Detection Pattern to set:

  • $FABRIC_API_BASE and $FABRIC_RESOURCE_SCOPE
  • $FABRIC_API_URL and $LIVY_API_PATH for Livy operations

Authentication: Use token acquisition from COMMON-CLI.md Environment Detection and API Configuration

Workspace & Item Discovery

Preferred: Use COMMON-CLI.md item discovery patterns (Finding things in Fabric) to find workspaces and items by name.

Fallback (when workspace is already known):

# List workspaces
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces" --query "value[].{name:displayName, id:id}" --output table
read -p "Workspace ID: " workspaceId

# List lakehouses in workspace
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/items?type=Lakehouse" --query "value[].{name:displayName, id:id}" --output table  
read -p "Lakehouse ID: " lakehouseId

Lakehouse Livy Session Management

Two types of Spark sessions in Fabric — This skill manages Lakehouse Livy sessions, created via the public Livy API endpoint (/lakehouses/{lhId}/livyapi/.../sessions). These are ad-hoc interactive sessions for remote clients. Notebook Spark sessions are a separate mechanism — they are created internally when a Fabric Notebook is executed (via portal or Jobs API RunNotebook), and are managed through the notebook lifecycle, not the Livy API.

# Check for existing idle Lakehouse Livy session (avoid resource waste)
sessionId=$(az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'][0].id" --output tsv)

# Create if none available - FORCE STARTER POOL USAGE
if [[ -z "$sessionId" ]]; then
    cat > /tmp/body.json << 'EOF'
{
    "name":"analysis",
    "driverMemory":"56g",
    "driverCores":8,
    "executorMemory":"56g",
    "executorCores":8,
    "conf": {
        "spark.dynamicAllocation.enabled": "true",
        "spark.fabric.pool.name": "Starter Pool"
    }
}
EOF
    sessionId=$(az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --body @/tmp/body.json --query "id" --output tsv)
    
    echo "⏳ Waiting for starter pool session to be ready..." 
    # With starter pools, this should be 3-5 seconds
    timeout=30  # Reduced from 90s since starter pools are fast
    while [ $timeout -gt 0 ]; do
        state=$(az rest --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId" --query "state" --output tsv)
        if [[ "$state" == "idle" ]]; then
            echo "✅ Session ready in starter pool!"
            break
        fi
        echo "   Session state: $state (${timeout}s remaining)"
        sleep 3
        timeout=$((timeout - 3))
    done
fi

Data Exploration (Fabric-Specific Patterns)

# Execute statement (LLM knows Python/Spark syntax)
cat > /tmp/body.json << 'EOF'
{
  "code": "spark.sql(\"SHOW TABLES\").show(); df = spark.table(\"your_table\"); df.describe().show()",
  "kind": "pyspark"
}
EOF
az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId/statements" --body @/tmp/body.json

Key Fabric Patterns

PatternCodeUse Case
Table Discoveryspark.sql("SHOW TABLES")List available tables
Cross-Lakehousespark.sql("SELECT * FROM other_workspace.table")Query across workspaces
Delta Featuresdf.history(), df.readVersion(1)Time travel, versioning
Schema Evolutiondf.printSchema()Understand structure

Lakehouse Livy Session Cleanup

# Clean up idle Lakehouse Livy sessions (optional)
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'].id" --output tsv | xargs -I {} az rest --method delete --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/{}"

Focus: This skill provides Fabric-specific REST API patterns. LLM already knows Python/Spark syntax — we focus on Fabric integration, session management, and API endpoints.

Mais skills de microsoft

oss-growth
microsoft
Persona de growth hacker OSS
official
microsoft-foundry
microsoft
Implantar, avaliar e gerenciar agentes Foundry de ponta a ponta: build Docker, push ACR, criação de agente hospedado/prompt, inicialização de contêiner, avaliação em lote, avaliação contínua, fluxos de trabalho do otimizador de prompt, agent.yaml, curadoria de conjunto de dados a partir de rastros. USE PARA: implantar agente no Foundry, agente hospedado, criar agente, invocar agente, avaliar agente, executar avaliação em lote, avaliação contínua, monitoramento contínuo, status da avaliação contínua, otimizar prompt, melhorar prompt, otimizador de prompt, otimizar instruções do agente, melhorar agente...
officialdevelopmentdevops
azure-ai
microsoft
Use para Azure AI: Search, Speech, OpenAI, Document Intelligence. Ajuda com pesquisa, busca vetorial/híbrida, fala para texto, texto para fala, transcrição, OCR. QUANDO: AI Search, pesquisa de consulta, busca vetorial, busca híbrida, busca semântica, fala para texto, texto para fala, transcrever, OCR, converter texto em fala.
officialdevelopmentapi
azure-deploy
microsoft
Execute implantações do Azure para aplicativos JÁ PREPARADOS que possuem arquivos .azure/deployment-plan.md e de infraestrutura existentes. NÃO use esta skill quando o usuário pedir para CRIAR um novo aplicativo — use azure-prepare. Esta skill executa comandos azd up, azd deploy, terraform apply e az deployment com recuperação de erros integrada. Requer .azure/deployment-plan.md do azure-prepare e status validado do azure-validate. QUANDO: "executar azd up", "executar azd deploy", "executar implantação",...
officialdevopsaws
azure-storage
microsoft
Serviços de Armazenamento do Azure, incluindo Blob Storage, File Shares, Queue Storage, Table Storage e Data Lake. Responde a perguntas sobre camadas de acesso ao armazenamento (hot, cool, cold, archive), quando usar cada camada e comparação entre elas. Oferece armazenamento de objetos, compartilhamentos de arquivos SMB, mensagens assíncronas, NoSQL chave-valor e análise de big data. Inclui gerenciamento de ciclo de vida. USE PARA: blob storage, file shares, queue storage, table storage, data lake, upload de arquivos, download de blobs, contas de armazenamento, camadas de acesso,...
officialdevelopmentdatabase
azure-diagnostics
microsoft
Depure problemas de produção no Azure usando AppLens, Azure Monitor, integridade de recursos e triagem segura. QUANDO: depurar problemas de produção, solucionar problemas do Serviço de Aplicativo, alto uso de CPU no Serviço de Aplicativo, falha de implantação do Serviço de Aplicativo, solucionar problemas de aplicativos em contêineres, solucionar problemas de funções, solucionar problemas do AKS, kubectl não consegue conectar, falhas do kube-system/CoreDNS, pod pendente, crashloop, nó não pronto, falhas de atualização, analisar logs, KQL, insights, falhas ao puxar imagem, problemas de inicialização a frio, falhas de sonda de integridade,...
officialdevopsdevelopment
azure-prepare
microsoft
Prepare aplicativos do Azure para implantação (infra Bicep/Terraform, azure.yaml, Dockerfiles). Use para criar/modernizar ou criar+implantar; não para migração entre nuvens (use azure-cloud-migrate). NÃO USE PARA: aplicativos copilot-sdk (use azure-hosted-copilot-sdk). QUANDO: "criar aplicativo", "construir aplicativo web", "criar API", "criar API HTTP serverless", "criar frontend", "criar backend", "construir um serviço", "modernizar aplicativo", "atualizar aplicativo", "adicionar autenticação", "adicionar cache", "hospedar no Azure", "criar e...
officialdevelopmentdevops
azure-validate
microsoft
Validação pré-implantação para prontidão do Azure. Execute verificações aprofundadas de configuração, infraestrutura (Bicep ou Terraform), atribuições de função RBAC, permissões de identidade gerenciada e pré-requisitos antes de implantar. QUANDO: validar meu aplicativo, verificar prontidão para implantação, executar verificações de pré-voo, verificar configuração, verificar se está pronto para implantar, validar azure.yaml, validar Bicep, testar antes de implantar, solucionar erros de implantação, validar Azure Functions, validar function app, validar serverless...
officialdevopstesting