spark-authoring-cli

작성자: microsoft

업데이트 확인 — 세션당 한 번 (필수) 이 스킬이 세션에서 처음 사용될 때, 진행 전에 check-updates 스킬을 실행하십시오.

npx skills add https://github.com/microsoft/skills-for-fabric --skill spark-authoring-cli

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.

  • GitHub Copilot CLI / VS Code: invoke the check-updates skill.
  • Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
  • Skip if the check was already performed earlier in this session.

CRITICAL NOTES

  1. To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
  2. To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering

Spark Authoring — CLI Skill

This skill covers two complementary areas: (1) managing Fabric Spark artifacts via REST APIs (workspaces, lakehouses, notebooks, jobs, pipelines) and (2) writing code inside Fabric Notebook cells (PySpark, Scala, SparkR, SQL with correct lakehouse access, notebookutils, and Spark configuration). For notebook code authoring fundamentals and shared modules, MUST see SPARK-NOTEBOOK-AUTHORING-CORE.md.

Table of Contents

TaskReferenceNotes
RULES — Read these first, follow them alwaysSKILL.md § RULESMUST read — 4 rules for this skill
Finding Workspaces and Items in FabricCOMMON-CLI.md § Finding Workspaces and Items in FabricMandatoryREAD link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id]
Fabric Topology & Key ConceptsCOMMON-CORE.md § Fabric Topology & Key Concepts
Environment URLsCOMMON-CORE.md § Environment URLs
Authentication & Token AcquisitionCOMMON-CORE.md § Authentication & Token AcquisitionWrong audience = 401; read before any auth issue
Core Control-Plane REST APIsCOMMON-CORE.md § Core Control-Plane REST APIs
PaginationCOMMON-CORE.md § Pagination
Long-Running Operations (LRO)COMMON-CORE.md § Long-Running Operations (LRO)
Rate Limiting & ThrottlingCOMMON-CORE.md § Rate Limiting & Throttling
OneLake Data AccessCOMMON-CORE.md § OneLake Data AccessRequires storage.azure.com token, not Fabric token
Definition EnvelopeITEM-DEFINITIONS-CORE.md § Definition EnvelopeDefinition payload structure
Per-Item-Type DefinitionsITEM-DEFINITIONS-CORE.md § Per-Item-Type DefinitionsSupport matrix, decoded content, part paths — REST specs, CLI recipes
Job ExecutionCOMMON-CORE.md § Job Execution
Capacity ManagementCOMMON-CORE.md § Capacity Management
Gotchas & TroubleshootingCOMMON-CORE.md § Gotchas & Troubleshooting
Best PracticesCOMMON-CORE.md § Best Practices
Tool Selection RationaleCOMMON-CLI.md § Tool Selection Rationale
Authentication RecipesCOMMON-CLI.md § Authentication Recipesaz login flows and token acquisition
Fabric Control-Plane API via az restCOMMON-CLI.md § Fabric Control-Plane API via az restAlways pass --resource https://api.fabric.microsoft.com or az rest fails
Pagination PatternCOMMON-CLI.md § Pagination Pattern
Long-Running Operations (LRO) PatternCOMMON-CLI.md § Long-Running Operations (LRO) Pattern
OneLake Data Access via curlCOMMON-CLI.md § OneLake Data Access via curlUse curl not az rest (different token audience)
SQL / TDS Data-Plane AccessCOMMON-CLI.md § SQL / TDS Data-Plane Access
Job Execution (CLI)COMMON-CLI.md § Job Execution
Job SchedulingCOMMON-CLI.md § Job SchedulingURL is /jobs/{jobType}/schedules; endDateTime required
OneLake ShortcutsCOMMON-CLI.md § OneLake Shortcuts
Capacity Management (CLI)COMMON-CLI.md § Capacity Management
Composite RecipesCOMMON-CLI.md § Composite Recipes
Gotchas & Troubleshooting (CLI-Specific)COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific)az rest audience, shell escaping, token expiry
Quick Reference: az rest TemplateCOMMON-CLI.md § Quick Reference: az rest Template
Quick Reference: Token Audience / CLI Tool MatrixCOMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool MatrixWhich --resource + tool for each service
Relationship to SPARK-CONSUMPTION-CORE.mdSPARK-AUTHORING-CORE.md § Relationship to SPARK-CONSUMPTION-CORE.md
Data Engineering Authoring Capability MatrixSPARK-AUTHORING-CORE.md § Data Engineering Authoring Capability Matrix
Lakehouse ManagementSPARK-AUTHORING-CORE.md § Lakehouse Management
Notebook ManagementSPARK-AUTHORING-CORE.md § Notebook Management
Notebook Execution & Job ManagementSPARK-AUTHORING-CORE.md § Notebook Execution & Job Management
CI/CD & Automation PatternsSPARK-AUTHORING-CORE.md § CI/CD & Automation Patterns
Infrastructure-as-CodeSPARK-AUTHORING-CORE.md § Infrastructure-as-Code
Performance Optimization & Resource ManagementSPARK-AUTHORING-CORE.md § Performance Optimization & Resource Management
Authoring Gotchas and TroubleshootingSPARK-AUTHORING-CORE.md § Authoring Gotchas and Troubleshooting
Quick Reference: Authoring Decision GuideSPARK-AUTHORING-CORE.md § Quick Reference: Authoring Decision Guide
Recommended Patterns (Data Engineering)data-engineering-patterns.md § Recommended patterns
Data Ingestion Principlesdata-engineering-patterns.md § Data Ingestion Principles
Transformation Patternsdata-engineering-patterns.md § Transformation Patterns
Delta Lake Best Practicesdata-engineering-patterns.md § Delta Lake Best Practices
Quality Assurance Strategiesdata-engineering-patterns.md § Quality Assurance Strategies
Recommended Patterns (Development Workflow)development-workflow.md § Recommended patterns
Notebook Lifecycledevelopment-workflow.md § Notebook Lifecycle
Parameterization Patternsdevelopment-workflow.md § Parameterization Patterns
Variable Library (notebook + pipeline usage)development-workflow.md § Method 4: Variable LibrarygetLibrary() + dot notation in notebooks; libraryVariables + @pipeline().libraryVariables in pipelines
Variable Library DefinitionITEM-DEFINITIONS-CORE.md § VariableLibraryDefinition parts, decoded content, types, pipeline mappings, gotchas
Local Testing Strategydevelopment-workflow.md § Local Testing Strategy
Debugging Patternsdevelopment-workflow.md § Debugging Patterns
Recommended Patterns (Infrastructure)infrastructure-orchestration.md § Recommended patterns
Materialized Lake View patternsmaterialized-lake-view-patterns.md § Recommended patternsSpark Lakehouse authoring guidance for MLV design (when to use MLVs, layering patterns)
MLV incremental refresh patternsmlv-incremental-refresh-patterns.md § IR-friendly syntax guideUse for refresh-readiness review and safe non-breaking rewrites
MLV schedule & job managementmlv-operations-cliRoute here when user asks to schedule, trigger, monitor, or cancel MLV refreshes (not authoring)
Workspace Provisioning Principlesinfrastructure-orchestration.md § Workspace Provisioning Principles
Lakehouse Configuration Guidanceinfrastructure-orchestration.md § Lakehouse Configuration Guidance
Pipeline Design Patternsinfrastructure-orchestration.md § Pipeline Design Patterns
CI/CD Integration Strategyinfrastructure-orchestration.md § CI/CD Integration Strategy
Notebook API — Which Endpoint to Usenotebook-api-operations.md § Quick DecisionStart here for remote notebook edits — getDefinition vs updateDefinition
Notebook Modification Workflownotebook-api-operations.md § WorkflowFive-step flow: retrieve, decode, modify, encode, upload
Notebook API Error Referencenotebook-api-operations.md § Error Reference411, 400 (updateMetadata), 401, 403 explained
Notebook API Gotchasnotebook-api-operations.md § Gotchas/result suffix, empty body, \n per-line rule, format=ipynb
Default Lakehouse Bindingnotebook-api-operations.md § Default Lakehouse Binding.ipynb metadata vs .py # METADATA block; discover IDs dynamically
Public URL Data Ingestionnotebook-api-operations.md § Public URL Data IngestionUse real source URL, stage into Files/, then read with Spark
getDefinition (read notebook content)notebook-api-operations.md § Step 1 — Retrieve Notebook ContentLRO flow, ?format=ipynb, empty body (--body '{}') requirement
Decode Base64 Notebook Payloadnotebook-api-operations.md § Step 2 — Decode the Notebook ContentExtract payload, base64 decode, ipynb JSON structure
Modify Notebook Cellsnotebook-api-operations.md § Step 3 — Modify the Notebook ContentFind cell, insert/replace lines, \n per-line rule
updateDefinition (write notebook content)notebook-api-operations.md § Step 4 — Re-encode and UploadRe-encode, upload, LRO poll, updateMetadata flag pitfall
Verify Notebook Update (Optional)notebook-api-operations.md § Step 5 — Verify the UpdateSkip unless you suspect a silent failure — Succeeded from updateDefinition is sufficient (see Rule 2)
Notebook API Error Referencenotebook-api-operations.md § Error Reference411, 400 (updateMetadata), 401, 403 explained
Notebook API End-to-End Scriptnotebook-api-operations.md § Complete End-to-End ScriptFull bash: get → decode → modify → encode → update → verify
Quick Start ExamplesSKILL.md § Quick Start ExamplesMinimal examples for common operations
— Notebook Code Authoring (shared modules) —
Notebook Authoring CoreSPARK-NOTEBOOK-AUTHORING-CORE.mdREAD FIRST for notebook code tasks — fundamentals, code gen approach, module index

Must/Prefer/Avoid

MUST DO

  • Check for recent jobs BEFORE creating new notebook runs — Query job instances from last 5 minutes; if recent job exists, monitor it instead of creating duplicate
  • Capture job instance ID immediately after POST — Store job ID before any other operations to enable proper monitoring
  • Verify workspace capacity assignment before operations — Workspace must have capacity assigned and active
  • When user provides a public data URL, follow the Public URL Data Ingestion policy — keep detailed behavior in the linked resource section to avoid drift/duplication
  • Format notebook cells correctly — Each line in cell source array MUST end with \n to prevent code merging
  • Use correct Lakehouse Livy session body format — Send a FLAT JSON with name, driverMemory, driverCores, executorMemory, executorCores. Do NOT wrap in {"payload": ...} or send only {"kind": "pyspark"} — that causes HTTP 500. Use valid memory values (28g, 56g, 112g, 224g). See Create Lakehouse Livy Session example below and SPARK-CONSUMPTION-CORE.md.

PREFER

  • Poll job status with proper intervals — 10-30 seconds between polls; timeout after reasonable duration (e.g., 30 minutes)
  • Check job history when POST response is unreadable — If POST returns "No Content" or unreadable response, query recent jobs (last 1 minute) before retrying
  • Use Starter Pool for development — Development/testing workloads should use useStarterPool: true
  • Use Workspace Pool for production — Production workloads need consistent performance with useWorkspacePool: true
  • Enable lakehouse schemas during creation — Set creationPayload.enableSchemas: true for better table organization
  • Implement idempotency checks — Prevent duplicate operations by checking existing state first

AVOID

  • Never retry POST with same parameters — If you have a job ID, only use GET to check status; don't create duplicate job instances
  • Don't skip capacity verification — Operations will fail if workspace capacity is paused or unassigned
  • Avoid immediate POST retries on failures — Check for existing/active jobs first to prevent duplicates
  • Don't create new runs if monitoring existing job — One job at a time; wait for completion before submitting new runs
  • Don't hardcode workspace/lakehouse IDs — Discover dynamically via item listing or catalog search APIs
  • Do NOT use Lakehouse Livy sessions to run a Fabric notebook — Lakehouse Livy sessions (the public Livy API) are for ad-hoc interactive Spark code execution. To run a notebook as a job, use the Jobs API (RunNotebook) which creates a Notebook Spark session internally. See SPARK-AUTHORING-CORE.md § Notebook Execution & Job Management
  • Do NOT schedule MLV refreshes from notebooks — If the user asks to "schedule MLV refresh", route to mlv-operations-cli which uses the REST API. Notebook-based REFRESH MATERIALIZED LAKE VIEW ... FULL is for one-time manual refresh only, not recurring schedules.

RULES — Read these first, follow them always

Rule 1 — Validate prerequisites before operations. Verify workspace has capacity assigned (see COMMON-CORE.md Create Workspace and Capacity Management) and resource IDs exist before attempting operations.

Rule 2 — Trust updateDefinition success. A Succeeded poll result from updateDefinition is sufficient confirmation that content and lakehouse bindings persisted. Do NOT call getDefinition after every upload — it is an async LRO that adds significant latency. Only use getDefinition for its intended purpose: reading current notebook content before making modifications.

Rule 3 — Prevent duplicate jobs and monitor execution properly. Before submitting new notebook run, ALWAYS check for recent job instances first (last 5 minutes). If recent job exists, monitor it instead of creating duplicate. After submission, capture job instance ID immediately and poll status - never retry POST. See SPARK-AUTHORING-CORE.md Job Monitoring for patterns.

Rule 4 — For notebook code authoring, MUST follow SPARK-NOTEBOOK-AUTHORING-CORE.md. When writing code inside notebook cells, MUST read SPARK-NOTEBOOK-AUTHORING-CORE.md first — it defines the code generation approach, rules, and a Module Index linking to detailed guides (lakehouse paths, connections, context, orchestration, etc.). Use the Spark-specific resources in this skill (data-engineering-patterns.md, development-workflow.md) for Spark-only implementation details. When the task is about Materialized Lake Views, read materialized-lake-view-patterns.md for authoring/design guidance and mlv-incremental-refresh-patterns.md for refresh-readiness analysis.


Quick Start Examples

For detailed patterns, authentication, and comprehensive API usage, see:

  • COMMON-CORE.md — Fabric REST API patterns, authentication, item discovery
  • COMMON-CLI.mdaz rest usage, environment detection, token acquisition
  • SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, job execution

Below are minimal quick-start examples. Always reference the COMMON- files for production use.*

Create Workspace & Lakehouse

# See COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns
cat > /tmp/body.json << 'EOF'
{"displayName": "DataEng-Dev"}
EOF
workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com" \
  --url "https://api.fabric.microsoft.com/v1/workspaces" \
  --body @/tmp/body.json --query "id" --output tsv)

cat > /tmp/body.json << 'EOF'
{"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}}
EOF
lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com" \
  --url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items" \
  --body @/tmp/body.json --query "id" --output tsv)

Organize Lakehouse Tables with Schemas

# See SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns
# Create schemas for medallion architecture
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze")
spark.sql("CREATE SCHEMA IF NOT EXISTS silver")
spark.sql("CREATE SCHEMA IF NOT EXISTS gold")

Create and Refresh a Materialized Lake View (MLV)

-- See resources/materialized-lake-view-patterns.md for design guidance
-- and resources/mlv-incremental-refresh-patterns.md for refresh-readiness review.

-- Bronze/Silver/Gold schemas in a Lakehouse with schemas enabled
CREATE SCHEMA IF NOT EXISTS bronze;
CREATE SCHEMA IF NOT EXISTS silver;
CREATE SCHEMA IF NOT EXISTS gold;

-- A simple Silver MLV with data quality constraints
--
-- Prerequisite for incremental refresh: enable Change Data Feed (CDF) on every
-- source table the MLV reads from. Without CDF, optimal refresh can only choose
-- between no refresh (sources unchanged) and full refresh — never incremental.
-- See resources/mlv-incremental-refresh-patterns.md.
ALTER TABLE bronze.orders_raw SET TBLPROPERTIES (delta.enableChangeDataFeed = true);

CREATE OR REPLACE MATERIALIZED LAKE VIEW silver.orders_clean
(
    CONSTRAINT valid_order_id CHECK (order_id IS NOT NULL) ON MISMATCH DROP
)
AS
SELECT
  order_id,
  customer_id,
  CAST(order_ts AS TIMESTAMP) AS order_ts,
  amount
FROM bronze.orders_raw;

-- Routine refresh is handled by the lakehouse Materialized lake views → Manage
-- schedule/lineage view; don't orchestrate from notebooks. The SQL form below is
-- documented only for forcing a one-time FULL recompute (troubleshooting / after
-- a correction). There is no documented SQL form for triggering incremental refresh.
REFRESH MATERIALIZED LAKE VIEW silver.orders_clean FULL;

Create Lakehouse Livy Session

# See SPARK-CONSUMPTION-CORE.md for Lakehouse Livy session configuration and management
# IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}
cat > /tmp/body.json << 'EOF'
{"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}}
EOF
az rest --method post --resource "https://api.fabric.microsoft.com" \
  --url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions" \
  --body @/tmp/body.json

Lakehouse Livy Session Body — Common Mistakes

  • {"payload": {"kind": "pyspark"}} → HTTP 500 (wrong wrapper, missing required fields)
  • {"kind": "pyspark"} → HTTP 500 (missing driverMemory, executorMemory, etc.)
  • ✅ Flat JSON with name, driverMemory, driverCores, executorMemory, executorCores (and optionally conf with Starter Pool)

Spark Performance Configs

For detailed workload-specific configurations, see data-engineering-patterns.md Delta Lake Best Practices.

Quick reference:

# Write-heavy (Bronze): Disable V-Order, enable autoCompact
# Balanced (Silver): Enable V-Order, adaptive execution  
# Read-heavy (Gold): Vectorized reads, optimal parallelism
# See data-engineering-patterns.md for complete config tables

Focus: Essential CLI patterns for Spark/data engineering development and notebook code authoring, with intelligent routing to specialized resources. For comprehensive patterns, always reference COMMON-* files and resource documents.

microsoft의 다른 스킬

oss-growth
microsoft
OSS 성장 해커 페르소나
official
microsoft-foundry
microsoft
Foundry 에이전트를 엔드투엔드로 배포, 평가 및 관리: Docker 빌드, ACR 푸시, 호스팅/프롬프트 에이전트 생성, 컨테이너 시작, 배치 평가, 지속적 평가, 프롬프트 최적화 워크플로, agent.yaml, 트레이스에서 데이터셋 큐레이션. 용도: Foundry에 에이전트 배포, 호스팅 에이전트, 에이전트 생성, 에이전트 호출, 에이전트 평가, 배치 평가 실행, 지속적 평가, 지속적 모니터링, 지속적 평가 상태, 프롬프트 최적화, 프롬프트 개선, 프롬프트 최적화 도구, 에이전트 지침 최적화, 에이전트 개선...
officialdevelopmentdevops
azure-ai
microsoft
Azure AI: Search, Speech, OpenAI, Document Intelligence에 사용됩니다. 검색, 벡터/하이브리드 검색, 음성-텍스트 변환, 텍스트-음성 변환, 전사, OCR을 지원합니다. 사용 시점: AI Search, 쿼리 검색, 벡터 검색, 하이브리드 검색, 의미 검색, 음성-텍스트 변환, 텍스트-음성 변환, 전사, OCR, 텍스트를 음성으로 변환.
officialdevelopmentapi
azure-deploy
microsoft
이미 준비된 애플리케이션에 대해 기존 .azure/deployment-plan.md 및 인프라 파일이 있는 경우 Azure 배포를 실행합니다. 사용자가 새 애플리케이션 생성을 요청할 때는 이 스킬을 사용하지 말고 azure-prepare를 사용하세요. 이 스킬은 azd up, azd deploy, terraform apply, az deployment 명령을 내장된 오류 복구 기능과 함께 실행합니다. azure-prepare의 .azure/deployment-plan.md와 azure-validate의 검증 상태가 필요합니다. 사용 시점: "run azd up", "run azd deploy", "execute deployment",...
officialdevopsaws
azure-storage
microsoft
Azure Storage Services는 Blob Storage, File Shares, Queue Storage, Table Storage, Data Lake를 포함합니다. 스토리지 액세스 계층(hot, cool, cold, archive), 각 계층 사용 시기 및 계층 비교에 대한 질문에 답변합니다. 객체 스토리지, SMB 파일 공유, 비동기 메시징, NoSQL 키-값, 빅데이터 분석을 제공합니다. 수명 주기 관리를 포함합니다. 사용 용도: blob 스토리지, 파일 공유, 큐 스토리지, 테이블 스토리지, 데이터 레이크, 파일 업로드, blob 다운로드, 스토리지 계정, 액세스 계층,...
officialdevelopmentdatabase
azure-diagnostics
microsoft
Azure에서 AppLens, Azure Monitor, 리소스 상태 및 안전한 트라이지를 사용하여 Azure 프로덕션 문제를 디버그합니다. 사용 시기: 프로덕션 문제 디버그, 앱 서비스 문제 해결, 앱 서비스 높은 CPU, 앱 서비스 배포 실패, 컨테이너 앱 문제 해결, 함수 문제 해결, AKS 문제 해결, kubectl 연결 불가, kube-system/CoreDNS 오류, pod 보류 중, crashloop, 노드 준비 안 됨, 업그레이드 실패, 로그 분석, KQL, 인사이트, 이미지 풀 실패, 콜드 스타트 문제, 상태 프로브 실패,...
officialdevopsdevelopment
azure-prepare
microsoft
Azure 앱을 배포용으로 준비합니다(인프라 Bicep/Terraform, azure.yaml, Dockerfiles). 생성/현대화 또는 생성+배포에 사용하며, 크로스 클라우드 마이그레이션에는 사용하지 않습니다(azure-cloud-migrate 사용). 다음에는 사용하지 마십시오: copilot-sdk 앱(azure-hosted-copilot-sdk 사용). 사용 시점: "앱 생성", "웹 앱 빌드", "API 생성", "서버리스 HTTP API 생성", "프론트엔드 생성", "백엔드 생성", "서비스 빌드", "애플리케이션 현대화", "애플리케이션 업데이트", "인증 추가", "캐싱 추가", "Azure에 호스팅", "생성 및...
officialdevelopmentdevops
azure-validate
microsoft
Azure 배포 전 준비 상태 검증. 구성, 인프라(Bicep 또는 Terraform), RBAC 역할 할당, 관리 ID 권한, 사전 요구 사항에 대한 심층 점검을 실행합니다. 사용 시점: 내 앱 검증, 배포 준비 상태 확인, 사전 점검 실행, 구성 확인, 배포 가능 여부 확인, azure.yaml 검증, Bicep 검증, 배포 전 테스트, 배포 오류 문제 해결, Azure Functions 검증, 함수 앱 검증, 서버리스 검증...
officialdevopstesting