📘 Fabric Prerequisites

  • Based on: Microsoft Fabric documentation (Microsoft Learn)
  • 📁 ← Back to Home

Background knowledge you need before diving into DP-600 exam topics. This page covers the foundational Fabric concepts that every Analytics Engineer question assumes you already understand.

Table of contents

🏗️ Microsoft Fabric Architecture Overview

Microsoft Fabric is a SaaS analytics platform that unifies data engineering, data warehousing, data science, real-time intelligence, data integration, and business intelligence into a single product. Instead of stitching together separate Azure services, Fabric provides one integrated experience built on a shared foundation.

📌 Workloads at a Glance

Workload Purpose
Data Engineering Spark notebooks, lakehouses, data pipelines
Data Warehouse Full T-SQL warehouse with DML/DDL
Data Science ML models, experiments, notebooks with MLflow
Real-Time Intelligence KQL databases, eventstreams, Real-Time dashboards
Data Factory Dataflows Gen2, data pipelines (orchestration)
Power BI Semantic models, reports, dashboards, paginated reports
graph TD
    Fabric["Microsoft Fabric (SaaS)"]
    Fabric --> DE["Data Engineering"]
    Fabric --> DW["Data Warehouse"]
    Fabric --> DS["Data Science"]
    Fabric --> RTI["Real-Time Intelligence"]
    Fabric --> DF["Data Factory"]
    Fabric --> PBI["Power BI"]
    DE --> OL["OneLake (ADLS Gen2)"]
    DW --> OL
    DS --> OL
    RTI --> OL
    DF --> OL
    PBI --> OL

All workloads read from and write to OneLake, which is the key architectural differentiator. There is no data duplication across workloads — one copy of data serves all engines.

Exam Tip: Questions often test whether you know which workload owns a given item. For example, a KQL database belongs to Real-Time Intelligence, not Data Engineering.


🗄️ OneLake

OneLake is Fabric’s single, unified data lake for the entire tenant. Think of it as “OneDrive for data.” Under the hood it runs on Azure Data Lake Storage Gen2 with a hierarchical namespace enabled.

💎 Key Characteristics

  • One copy of data — all Fabric workloads access the same physical files, eliminating silos and redundant copies.
  • Delta Parquet format — tables are stored as Delta Lake (Parquet files plus a _delta_log transaction log).
  • Shortcuts — virtual pointers to data in external locations (other OneLake paths, ADLS Gen2, Amazon S3, Google Cloud Storage) without copying.
  • Hierarchical namespace — organizes data as onelake://<workspace>/<item>/<path>.
  • Automatic discovery — every Fabric item that produces data lands in OneLake automatically.

Exam Caveat: Shortcuts do not copy data. They create a reference. If the source is deleted or access is revoked, the shortcut breaks. Expect scenario questions about this.


📂 Workspaces

A workspace is the primary container for all Fabric items (lakehouses, warehouses, notebooks, reports, pipelines, etc.). Every workspace is assigned to exactly one capacity.

👤 Workspace Roles

Role Permissions
Admin Full control — manage membership, delete the workspace, manage all items
Member Create, edit, delete items; share items; cannot manage workspace settings or membership
Contributor Create, edit, delete items; cannot share items or manage membership
Viewer View items only; cannot edit, create, or delete

Exam Tip: The difference between Member and Contributor is sharing. Members can share items with other users; Contributors cannot. This distinction appears frequently in permission-related questions.

🔗 Capacity Assignment

Every workspace must be backed by a capacity (or trial/PPU for Power BI-only workloads). Workspace admins assign the workspace to a specific capacity, which determines available compute and the SKU feature set.


🏠 Lakehouses

A lakehouse combines the flexibility of a data lake with the query power of a warehouse. It is the central storage item for Data Engineering workloads.

🧩 Anatomy of a Lakehouse

  • Tables section — managed Delta Lake tables queryable via Spark and the SQL analytics endpoint.
  • Files section — unstructured or semi-structured files (CSV, JSON, images) stored in OneLake but not registered as tables.
  • SQL analytics endpoint — an auto-generated, read-only T-SQL endpoint that lets you query Delta tables using SQL without Spark. Supports SELECT statements, views, and SQL functions but no INSERT, UPDATE, DELETE, or DDL.

Exam Caveat: The SQL analytics endpoint is read-only. You cannot run DML against a lakehouse via T-SQL. If a question asks you to perform T-SQL writes, the answer involves a warehouse, not a lakehouse endpoint.

✅ When to Use a Lakehouse

  • Unstructured or semi-structured data that needs flexible exploration.
  • Spark-based data engineering transformations (PySpark, Spark SQL).
  • Data science workloads that read Delta tables via notebooks.
  • Scenarios where you want one storage layer serving both engineers and analysts.

🏢 Warehouses

A Fabric warehouse is a full relational data warehouse with complete T-SQL DDL and DML support. Data is still stored as Delta Parquet in OneLake, but the engine provides a traditional SQL experience.

🔧 Key Capabilities

  • Full DML — INSERT, UPDATE, DELETE, MERGE, COPY INTO.
  • Full DDL — CREATE TABLE, ALTER TABLE, CREATE VIEW, CREATE SCHEMA, stored procedures.
  • Cross-database queries — query tables across warehouses and lakehouse SQL endpoints in the same workspace using three-part naming.
  • Security — column-level security, row-level security, dynamic data masking, object-level permissions.
graph TD
    Q["Need to write data via T-SQL?"]
    Q -->|Yes| WH["Use Warehouse"]
    Q -->|No| Q2["Primary engine is Spark?"]
    Q2 -->|Yes| LH["Use Lakehouse"]
    Q2 -->|No| Q3["Need fine-grained SQL security?"]
    Q3 -->|Yes| WH
    Q3 -->|No| LH

⚖️ Lakehouse vs Warehouse Comparison

Feature Lakehouse Warehouse
Storage format Delta Parquet in OneLake Delta Parquet in OneLake
Write via T-SQL No (read-only SQL endpoint) Yes (full DML/DDL)
Write via Spark Yes No
Cross-database queries As a source (read) Full support
Row/column-level security Not via SQL endpoint Yes
Best for Engineers, data scientists Analysts, SQL-centric teams

Exam Tip: Both lakehouse and warehouse store data as Delta Parquet in OneLake. The difference is the engine and write path, not the storage format.


💰 Capacities and Licensing

Fabric compute is metered through Capacity Units (CUs). Every operation — Spark jobs, SQL queries, dataflows, Power BI refreshes — consumes CUs from the assigned capacity.

💲 SKU Overview

SKU Type CUs Typical Use
F2 Fabric (Azure) 2 Dev/test
F4 Fabric (Azure) 4 Small workloads
F8 Fabric (Azure) 8 Small team
F16 Fabric (Azure) 16 Departmental
F32 Fabric (Azure) 32 Departmental
F64 Fabric (Azure) 64 Enterprise
F128+ Fabric (Azure) 128+ Large enterprise
P1 Power BI Premium 8 CU equiv. Power BI + limited Fabric
P2 Power BI Premium 16 CU equiv. Power BI + limited Fabric
  • F SKUs — purchased through Azure, pay-as-you-go or reserved (1-year / 3-year). Support all Fabric workloads. Can be paused to stop billing.
  • P SKUs — legacy Power BI Premium per-capacity. P1 maps roughly to F8. Microsoft is encouraging migration to F SKUs.
  • Trial capacity — free 60-day trial with F64-equivalent capacity; one trial per user.

Exam Caveat: F SKUs can be paused (stopping all compute billing), but P SKUs cannot. Also, only F64 and above (or P1 and above) support Fabric features beyond Power BI. Lower F SKUs still run all workloads but with less throughput.


🔺 Delta Lake Fundamentals

Delta Lake is the default table format across all of Fabric. Understanding it is non-negotiable for the DP-600.

⚙️ Core Properties

  • ACID transactions — every write is atomic; readers never see partial writes. The _delta_log folder contains JSON commit files that track each transaction.
  • Time travel — query previous versions of a table using VERSION AS OF or TIMESTAMP AS OF in Spark SQL. Useful for auditing and rollback.
  • Schema enforcement — writes that do not match the table schema are rejected by default, preventing corrupt data.
  • Schema evolution — optionally allow new columns by setting .option("mergeSchema", "true") on a write operation.

🚀 Fabric-Specific Optimizations

  • V-Order — a Fabric-specific write-time optimization that reorders Parquet row groups for faster reads across all engines (Spark, SQL, Power BI Direct Lake). V-Order is enabled by default.
  • Optimize Write — dynamically coalesces small files into larger ones during write to reduce the small-file problem. Also enabled by default in Fabric.

Exam Tip: V-Order and Optimize Write are enabled by default in Fabric Spark. You do not need to turn them on. Expect questions that try to trick you into manually enabling settings that are already active.


📊 Power BI in Fabric

Power BI is both a standalone workload and the analytics layer that sits on top of every other Fabric item.

🧱 Key Items

  • Semantic model (dataset) — the data model that defines tables, relationships, measures, and calculations. This is what reports query against.
  • Report — interactive visualizations built on a semantic model.
  • Dashboard — a single canvas of pinned tiles from one or more reports.
  • Paginated report — pixel-perfect, print-ready reports (SSRS-style) for invoices, statements, and operational documents.

🔌 Integration with Fabric Items

  • Direct Lake mode — a new storage mode where Power BI reads Delta Parquet files directly from OneLake without import or DirectQuery. Combines import-speed performance with DirectQuery-level freshness.
  • Default semantic model — every lakehouse and warehouse auto-generates a default semantic model that analysts can connect to immediately.
  • SQL analytics endpoint — Power BI can also use DirectQuery against a lakehouse SQL endpoint or warehouse.

Exam Caveat: Direct Lake is not the same as DirectQuery. Direct Lake reads columnar data from Parquet files in memory — it does not issue SQL queries to a source. If Direct Lake cannot load data (e.g., unsupported column type), it falls back to DirectQuery automatically.


🔑 Key Concepts for Analytics Engineers

📐 Star Schema Basics

The star schema is the recommended modeling pattern for analytical workloads in Fabric and Power BI.

  • Fact tables — store measurable events (sales transactions, page views, shipments). Contain foreign keys and numeric measures. Typically large and narrow.
  • Dimension tables — store descriptive attributes (customer name, product category, date). Contain surrogate keys and text/category columns. Typically small and wide.
  • Relationships — fact tables reference dimension tables via foreign keys, forming a star shape.

🔄 Slowly Changing Dimensions (SCD)

Type Behavior Implementation
SCD Type 1 Overwrite old value UPDATE the dimension row in place
SCD Type 2 Keep full history Add new row with effective/expiry dates and a current flag
SCD Type 3 Track limited history Add a “previous value” column alongside the current value

Exam Tip: SCD Type 2 is the most commonly tested. Know that it requires surrogate keys (not natural keys) because the same business entity will have multiple rows.

✅ Data Modeling Best Practices

  • Prefer star schemas over flat/wide tables for Power BI performance.
  • Avoid bi-directional relationships unless absolutely necessary.
  • Use integer surrogate keys for joins instead of large text business keys.
  • Keep fact tables narrow — push descriptive attributes into dimensions.
  • Define explicit date dimensions rather than relying on auto date/time.

📋 Scenario Quick Reference

Scenario Recommended Item Why
Ingest raw CSV/JSON and transform with Spark Lakehouse Spark-native, files section for raw landing
SQL analysts need to write stored procedures Warehouse Full T-SQL DDL/DML support
Data scientists need to train ML models on tabular data Lakehouse Spark notebooks read Delta tables natively
Build a star schema with row-level security via T-SQL Warehouse RLS and fine-grained SQL security
Create a Power BI report with near-real-time data from OneLake Lakehouse + Direct Lake Direct Lake reads Delta files with no import lag
Reference external S3 data without copying Lakehouse shortcut Shortcuts point to external storage without data movement
Orchestrate a multi-step ELT pipeline Data Factory pipeline Pipelines coordinate activities across items
Stream IoT events and query with KQL Real-Time Intelligence Eventstreams and KQL databases handle streaming data
Need cross-database SQL joins across items Warehouse Cross-database queries with three-part naming
Store unstructured images alongside tabular data Lakehouse (Files section) Files section holds any format; tables hold Delta only