Data Engineer Data Architect โ This section covers the foundational architecture of Microsoft Fabric, including OneLake, workspaces, capacities, and the medallion (Bronze/Silver/Gold) data organization pattern.
Core Architecture
Understanding the foundational building blocks of Microsoft Fabric.
Architecture Overview
Microsoft Fabric is organized around three key concepts: Capacities (the compute engine), Workspaces (the organizational unit), and Experiences (the specialized tools for different data roles).
Key Concepts
Capacities
A capacity is a dedicated set of compute resources. All Fabric workloads (Spark, SQL, Power BI, etc.) share the same capacity pool measured in Capacity Units (CUs). You choose a SKU (F2, F4, F8, โฆ F2048) based on your workload needs.
Workspaces
Workspaces are the primary organizational and security boundary in Fabric. Think of them as folders that contain your artifacts (lakehouses, warehouses, notebooks, pipelines, reports). Each workspace is mapped to a capacity and has its own access control.
OneLake
OneLake is Fabric's built-in data lake โ a single, unified storage layer for the entire organization. It's built on Azure Data Lake Storage Gen2 and uses the Delta Lake open format. Every Fabric tenant gets exactly one OneLake, and every workspace automatically gets a folder in OneLake.
Use Shortcuts to reference data in external storage (AWS S3, Google Cloud Storage, other ADLS accounts) without copying it. This lets you unify your data view without moving data.
Experiences
| Experience | Purpose | Key Artifacts |
|---|---|---|
| Data Factory | Data ingestion and orchestration | Pipelines, Dataflows Gen2 |
| Data Engineering | Big data transformation with Spark | Lakehouse, Notebooks, Spark Jobs |
| Data Science | Machine learning and experimentation | Notebooks, Experiments, Models |
| Data Warehouse | Enterprise data warehousing with T-SQL | Warehouse, SQL Queries |
| Real-Time Intelligence | Streaming and time-series analytics | Eventhouse, KQL Queryset |
| Power BI | Business intelligence and reporting | Semantic Models, Reports, Dashboards |
Medallion Architecture
The proven data organization pattern for building reliable and scalable data lakehouses.
What is the Medallion Architecture?
The medallion architecture (also known as the "multi-hop" architecture) is a data design pattern that organizes data into three logical layers: Bronze, Silver, and Gold. Each layer represents an increasing level of data quality and business readiness.
Layer Details
๐ฅ Bronze Layer (Raw)
- Stores data exactly as received from source systems
- Append-only ingestion โ never modify or delete raw records
- Include metadata columns:
_ingestion_timestamp,_source_system,_batch_id - Store in Delta format for time-travel and ACID transactions
- Retain raw data for compliance, auditing, and reprocessing
๐ฅ Silver Layer (Cleansed & Conformed)
- Apply data quality rules: deduplication, null handling, type casting
- Standardize column names and data types across sources
- Join and enrich data from multiple Bronze tables
- Apply slowly changing dimensions (SCD Type 1/2) where needed
- This layer is the "single source of truth" for your organization
๐ฅ Gold Layer (Business-Ready)
- Build star/snowflake schemas with facts and dimensions
- Pre-aggregate KPIs and business metrics
- Optimized for Direct Lake mode in Power BI
- Apply column-level and row-level security as needed
- This layer serves dashboards, reports, and ad-hoc analysis
Naming Conventions
Lakehouses: lh_bronze โ Raw ingestion lakehouse lh_silver โ Cleansed and conformed data lh_gold โ Business-ready consumption layer Tables (Bronze): bronze_crm_customers โ Source: CRM, Table: customers bronze_erp_sales_orders โ Source: ERP, Table: sales_orders Tables (Silver): silver_dim_customer โ Conformed customer dimension silver_fact_sales โ Conformed sales fact Tables (Gold): gold_sales_summary_daily โ Daily sales aggregation gold_customer_360 โ Customer 360 view
Don't skip the Silver layer. Going directly from Bronze to Gold creates brittle pipelines and makes it harder to add new consumers later. The Silver layer provides a stable contract between producers and consumers.
When to Use Medallion vs. Other Patterns
| Pattern | Best For | Considerations |
|---|---|---|
| Medallion (Bronze/Silver/Gold) | Most Fabric implementations; batch and micro-batch workloads | Clear separation of concerns; well-understood pattern |
| Data Mesh | Large orgs with domain-oriented teams | Combine with medallion within each domain |
| Lambda / Kappa | Dual batch + real-time pipelines | Use Real-Time Intelligence alongside medallion |
Real-Time Intelligence
Streaming analytics, event processing, and time-series workloads in Microsoft Fabric โ from ingestion to live dashboards in seconds.
Use Real-Time Intelligence when you need sub-second latency on streaming data โ IoT telemetry, clickstreams, fraud detection, operational monitoring, or log analytics. For batch/micro-batch workloads, the Lakehouse + medallion pattern is more appropriate.
Core Components
๐ก Eventstreams
No-code event ingestion from 30+ sources โ Azure Event Hubs, Kafka, IoT Hub, custom apps, CDC streams. Transform events in-flight with built-in processors (filter, aggregate, union).
๐ Eventhouse
The primary database for real-time data. Built on Azure Data Explorer (Kusto) engine โ optimized for append-heavy, time-series workloads with automatic indexing and compression.
๐ KQL Queryset
Kusto Query Language for exploring streaming data. Purpose-built for time-series: summarize, make-series, render timechart, anomaly detection, and pattern matching.
๐ Real-Time Dashboards
Live dashboards with auto-refresh down to 1-second intervals. Pinned KQL visuals, parameters, and cross-filtering โ no import or refresh schedule needed.
๐ Activator (Data Activator)
No-code trigger engine โ monitor streaming data and fire actions (emails, Teams messages, Power Automate flows) when conditions are met. Think "alerts as a service."
๐ Real-Time Hub
Centralized catalog of all streaming data in your organization. Discover, subscribe to, and share real-time event streams across workspaces and domains.
Architecture Pattern: Event-Driven Pipeline
Eventhouse vs. Lakehouse: When to Use What
| Aspect | Eventhouse (KQL) | Lakehouse (Spark/SQL) |
|---|---|---|
| Data pattern | Append-heavy, time-series, streaming | Batch, micro-batch, full reloads |
| Latency | Sub-second ingestion to query | Minutes (Spark jobs) to seconds (Direct Lake) |
| Query language | KQL (Kusto Query Language) | Spark SQL, PySpark, T-SQL |
| Best for | Logs, IoT, clickstream, monitoring, fraud | Data warehousing, ML feature stores, reports |
| Retention | Hot/warm caching with auto-purge policies | Persistent Delta tables in OneLake |
| Integration | Eventstreams, Real-Time Hub, Activator | Data Factory, notebooks, Power BI Direct Lake |
| OneLake | Can mirror data to OneLake as Delta for cross-engine access | Native OneLake storage |
Use both together: Eventstreams routes hot data to Eventhouse for real-time dashboards and alerts, while simultaneously landing the same events into a Lakehouse Bronze layer for historical analytics and ML. This "lambda-like" pattern is natively supported in Fabric.