๐Ÿ‘ค Who is this for?

Data Engineer Data Architect โ€” This section covers the foundational architecture of Microsoft Fabric, including OneLake, workspaces, capacities, and the medallion (Bronze/Silver/Gold) data organization pattern.

Section 01

Core Architecture

Understanding the foundational building blocks of Microsoft Fabric.

Architecture Overview

Microsoft Fabric is organized around three key concepts: Capacities (the compute engine), Workspaces (the organizational unit), and Experiences (the specialized tools for different data roles).

Microsoft Fabric Architecture
Fabric Experiences
๐Ÿ”„ Data Factory โš™๏ธ Data Engineering ๐Ÿงช Data Science ๐Ÿข Data Warehouse โšก Real-Time Intelligence ๐Ÿ“Š Power BI
โฌ‡
Compute & Processing
Apache Spark SQL Engine KQL Engine Analysis Services Data Pipelines
โฌ‡
OneLake โ€” Unified Storage
Delta / Parquet Shortcuts ADLS Gen2 Multi-cloud
โฌ‡
Governance & Security
Microsoft Purview Entra ID Sensitivity Labels Managed VNets

Key Concepts

Capacities

A capacity is a dedicated set of compute resources. All Fabric workloads (Spark, SQL, Power BI, etc.) share the same capacity pool measured in Capacity Units (CUs). You choose a SKU (F2, F4, F8, โ€ฆ F2048) based on your workload needs.

Workspaces

Workspaces are the primary organizational and security boundary in Fabric. Think of them as folders that contain your artifacts (lakehouses, warehouses, notebooks, pipelines, reports). Each workspace is mapped to a capacity and has its own access control.

OneLake

OneLake is Fabric's built-in data lake โ€” a single, unified storage layer for the entire organization. It's built on Azure Data Lake Storage Gen2 and uses the Delta Lake open format. Every Fabric tenant gets exactly one OneLake, and every workspace automatically gets a folder in OneLake.

โœ… Best Practice

Use Shortcuts to reference data in external storage (AWS S3, Google Cloud Storage, other ADLS accounts) without copying it. This lets you unify your data view without moving data.

Experiences

ExperiencePurposeKey Artifacts
Data FactoryData ingestion and orchestrationPipelines, Dataflows Gen2
Data EngineeringBig data transformation with SparkLakehouse, Notebooks, Spark Jobs
Data ScienceMachine learning and experimentationNotebooks, Experiments, Models
Data WarehouseEnterprise data warehousing with T-SQLWarehouse, SQL Queries
Real-Time IntelligenceStreaming and time-series analyticsEventhouse, KQL Queryset
Power BIBusiness intelligence and reportingSemantic Models, Reports, Dashboards
Section 02

Medallion Architecture

The proven data organization pattern for building reliable and scalable data lakehouses.

What is the Medallion Architecture?

The medallion architecture (also known as the "multi-hop" architecture) is a data design pattern that organizes data into three logical layers: Bronze, Silver, and Gold. Each layer represents an increasing level of data quality and business readiness.

Medallion Architecture Flow
Bronze Raw Data
Raw ingestion from sources. Append-only, immutable, exact copy of source data.
โ†’
Silver Cleansed
Validated, deduplicated, enriched. Conformed data models and business rules applied.
โ†’
Gold Business-Ready
Aggregated, curated for reporting. Star schemas, KPIs, and consumption-ready datasets.

Layer Details

๐Ÿฅ‰ Bronze Layer (Raw)

๐Ÿฅˆ Silver Layer (Cleansed & Conformed)

๐Ÿฅ‡ Gold Layer (Business-Ready)

Naming Conventions

Recommended naming pattern
Lakehouses:
  lh_bronze          โ€” Raw ingestion lakehouse
  lh_silver          โ€” Cleansed and conformed data
  lh_gold            โ€” Business-ready consumption layer

Tables (Bronze):
  bronze_crm_customers          โ€” Source: CRM, Table: customers
  bronze_erp_sales_orders       โ€” Source: ERP, Table: sales_orders

Tables (Silver):
  silver_dim_customer           โ€” Conformed customer dimension
  silver_fact_sales             โ€” Conformed sales fact

Tables (Gold):
  gold_sales_summary_daily      โ€” Daily sales aggregation
  gold_customer_360             โ€” Customer 360 view
โš ๏ธ Common Pitfall

Don't skip the Silver layer. Going directly from Bronze to Gold creates brittle pipelines and makes it harder to add new consumers later. The Silver layer provides a stable contract between producers and consumers.

When to Use Medallion vs. Other Patterns

PatternBest ForConsiderations
Medallion (Bronze/Silver/Gold)Most Fabric implementations; batch and micro-batch workloadsClear separation of concerns; well-understood pattern
Data MeshLarge orgs with domain-oriented teamsCombine with medallion within each domain
Lambda / KappaDual batch + real-time pipelinesUse Real-Time Intelligence alongside medallion
Section 03

Real-Time Intelligence

Streaming analytics, event processing, and time-series workloads in Microsoft Fabric โ€” from ingestion to live dashboards in seconds.

๐Ÿ’ก When to use Real-Time Intelligence

Use Real-Time Intelligence when you need sub-second latency on streaming data โ€” IoT telemetry, clickstreams, fraud detection, operational monitoring, or log analytics. For batch/micro-batch workloads, the Lakehouse + medallion pattern is more appropriate.

Core Components

๐Ÿ“ก Eventstreams

No-code event ingestion from 30+ sources โ€” Azure Event Hubs, Kafka, IoT Hub, custom apps, CDC streams. Transform events in-flight with built-in processors (filter, aggregate, union).

๐Ÿ  Eventhouse

The primary database for real-time data. Built on Azure Data Explorer (Kusto) engine โ€” optimized for append-heavy, time-series workloads with automatic indexing and compression.

๐Ÿ” KQL Queryset

Kusto Query Language for exploring streaming data. Purpose-built for time-series: summarize, make-series, render timechart, anomaly detection, and pattern matching.

๐Ÿ“Š Real-Time Dashboards

Live dashboards with auto-refresh down to 1-second intervals. Pinned KQL visuals, parameters, and cross-filtering โ€” no import or refresh schedule needed.

๐Ÿ”” Activator (Data Activator)

No-code trigger engine โ€” monitor streaming data and fire actions (emails, Teams messages, Power Automate flows) when conditions are met. Think "alerts as a service."

๐ŸŒ Real-Time Hub

Centralized catalog of all streaming data in your organization. Discover, subscribe to, and share real-time event streams across workspaces and domains.

Architecture Pattern: Event-Driven Pipeline

Event Sources IoT ยท Kafka ยท CDC App Events ยท Logs Eventstreams Ingest ยท Transform Filter ยท Route Eventhouse KQL Database Store ยท Index ยท Query RT Dashboards Live visuals ยท Auto-refresh Data Activator Triggers ยท Alerts ยท Actions OneLake Delta mirroring Gold layer sync

Eventhouse vs. Lakehouse: When to Use What

AspectEventhouse (KQL)Lakehouse (Spark/SQL)
Data patternAppend-heavy, time-series, streamingBatch, micro-batch, full reloads
LatencySub-second ingestion to queryMinutes (Spark jobs) to seconds (Direct Lake)
Query languageKQL (Kusto Query Language)Spark SQL, PySpark, T-SQL
Best forLogs, IoT, clickstream, monitoring, fraudData warehousing, ML feature stores, reports
RetentionHot/warm caching with auto-purge policiesPersistent Delta tables in OneLake
IntegrationEventstreams, Real-Time Hub, ActivatorData Factory, notebooks, Power BI Direct Lake
OneLakeCan mirror data to OneLake as Delta for cross-engine accessNative OneLake storage
๐ŸŽฏ Architecture tip

Use both together: Eventstreams routes hot data to Eventhouse for real-time dashboards and alerts, while simultaneously landing the same events into a Lakehouse Bronze layer for historical analytics and ML. This "lambda-like" pattern is natively supported in Fabric.