fabric-lakehouse

作者: github

Microsoft Fabric Lakehouse 儲存體,用於統一的表格與非表格資料,具備 Delta Lake、SQL 分析及細微性安全性。透過 Delta Lake 格式、ACID 交易、版本控制及用於 T-SQL 查詢的 SQL 端點,結合資料湖的靈活性與資料倉儲的管理功能。透過結構描述(Tables 下的資料夾)、捷徑(指向內外部來源的虛擬連結)及最佳化查詢效能的具體化檢視來組織資料。支援多種資料格式:Delta...

npx skills add https://github.com/github/awesome-copilot --skill fabric-lakehouse

When to Use This Skill

Use this skill when you need to:

  • Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities.
  • Design, build, and optimize Lakehouse solutions using best practices.
  • Understand the core concepts and components of a Lakehouse in Microsoft Fabric.
  • Learn how to manage tabular and non-tabular data within a Lakehouse.

Fabric Lakehouse

Core Concepts

What is a Lakehouse?

Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides:

  • Unified storage in OneLake for structured and unstructured data
  • Delta Lake format for ACID transactions, versioning, and time travel
  • SQL analytics endpoint for T-SQL queries
  • Semantic model for Power BI integration
  • Support for other table formats like CSV, Parquet
  • Support for any file formats
  • Tools for table optimization and data management

Key Components

  • Delta Tables: Managed tables with ACID compliance and schema enforcement
  • Files: Unstructured/semi-structured data in the Files section
  • SQL Endpoint: Auto-generated read-only SQL interface for querying
  • Shortcuts: Virtual links to external/internal data without copying
  • Fabric Materialized Views: Pre-computed tables for fast query performance

Tabular data in a Lakehouse

Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric).

Schemas for tables in a Lakehouse

When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut.

Files in a Lakehouse

Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse.

Fabric Materialized Views

Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook.

Spark Views

Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables.

Security

Item access or control plane security

Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse.

Data access or OneLake Security

For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table.

Lakehouse Shortcuts

Shortcuts create virtual links to data without copying:

Types of Shortcuts

  • Internal: Link to other Fabric Lakehouses/tables, cross-workspace data sharing
  • ADLS Gen2: Link to ADLS Gen2 containers in Azure
  • Amazon S3: AWS S3 buckets, cross-cloud data access
  • Dataverse: Microsoft Dataverse, business application data
  • Google Cloud Storage: GCS buckets, cross-cloud data access

Performance Optimization

V-Order Optimization

For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns.

Table Optimization

Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes.

Lineage

The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies.

PySpark Code Examples

See PySpark code for details.

Getting data into Lakehouse

See Get data for details.

來自 github 的更多技能

console-rendering
github
在 Go 中使用基於結構體標籤的控制台渲染系統的說明
official
acquire-codebase-knowledge
github
當使用者明確要求對現有程式碼庫進行映射、文件化或入門引導時,使用此技能。觸發詞如「映射此程式碼庫」、「文件化…」等提示。
official
acreadiness-assess
github
Run the AgentRC readiness assessment on the current repository and produce a static HTML dashboard at reports/index.html. Wraps `npx github:microsoft/agentrc…
official
acreadiness-generate-instructions
github
透過 AgentRC 指令命令生成量身打造的 AI 代理指令檔案。產生 .github/copilot-instructions.md(預設,建議用於 VS Code 中的 Copilot…
official
acreadiness-policy
github
幫助使用者選取、撰寫或套用 AgentRC 政策。政策可透過停用不相關的檢查、覆寫影響/等級、設定…來自訂整備度評分。
official
add-educational-comments
github
為程式碼檔案添加教育性註解,將其轉化為有效的學習資源。根據三個可設定的知識層級(初學者、中級、進階)調整解釋深度與語氣。若未提供檔案,會自動請求提供,並以編號清單對應以便快速選取。僅透過教育性註解將檔案擴充最多125%(嚴格上限:400行新註解;超過1,000行的檔案上限為300行)。保留檔案編碼、縮排風格、語法正確性及……
official
adobe-illustrator-scripting
github
使用 ExtendScript (JavaScript/JSX) 編寫、除錯及最佳化 Adobe Illustrator 自動化腳本。適用於建立或修改操控…的腳本時。
official
agent-governance
github
宣告式政策、意圖分類與稽核軌跡,用於控制AI代理工具存取與行為。可組合的治理政策定義允許/封鎖的工具、內容過濾器、速率限制與核准要求——以配置而非程式碼形式儲存。語意意圖分類在工具執行前,透過基於模式的訊號偵測危險提示(資料外洩、權限提升、提示注入)。工具層級治理裝飾器在函式層級強制執行政策……
official