Capacity Planning — Microsoft Fabric Guide

Capacity

Capacity & Cost Management

Understanding Fabric's capacity model and optimizing your spend.

Capacity Units (CUs)

Microsoft Fabric uses a universal compute unit called a Capacity Unit (CU). All workloads — Spark, SQL, Power BI, Data Factory — consume from the same CU pool. This simplifies capacity planning compared to provisioning separate services.

SKU Options

SKU	Capacity Units	Use Case	Pay-As-You-Go / month*	1-Year Reservation / month*	Savings
F2	2 CUs	POC / Learning	~$262	~$156	~40%
F4	4 CUs	Small team dev	~$525	~$313	~40%
F8	8 CUs	Small production	~$1,050	~$625	~40%
F16	16 CUs	Medium workloads	~$2,100	~$1,251	~40%
F32	32 CUs	Department-level	~$4,200	~$2,501	~40%
F64	64 CUs	Large workloads	~$8,400	~$5,003	~40%
F128	128 CUs	Enterprise	~$16,800	~$10,005	~40%
F256+	256+ CUs	Large enterprise	~$33,600+	~$20,011+	~40%

💡 Compute ≠ Storage — They Are Billed Separately

The prices above cover compute capacity only (CU processing power for Spark, SQL, Power BI, Data Factory, etc.). OneLake storage is billed separately at standard Azure storage rates (~$0.023/GB/month for hot tier). This means you pay for capacity even when paused if data remains stored. Budget for both compute and storage when planning your Fabric costs.

*Approximate pricing (USD, East US region). Check the official pricing page for current rates. 3-year reservations offer even deeper discounts. F64+ includes free Power BI viewer access (no Pro license needed).

Pricing Models

Pay-As-You-Go

Billed per second of compute used. Ideal for variable workloads. Pause capacity when not needed to stop billing.

Reserved Capacity

1-year or 3-year commitment with up to 40% discount. Best for predictable, always-on production workloads.

Fabric Trial

Free 60-day trial with F64 capacity. Great for evaluation and proof-of-concept before committing.

🧮 Capacity Cost Calculator

Estimate your Fabric capacity needs based on your expected workload mix. Adjust the sliders to see a real-time SKU recommendation.

Spark notebook hours / day 4h

Concurrent BI report viewers 20

Data Factory pipeline runs / day 10

Data Warehouse queries / hour 20/h

Real-Time Intelligence streams 0

Cost Optimization Strategies

Pause/Resume: Pause non-production capacities after business hours and on weekends — savings up to 70%
Right-size capacity: Use the Capacity Metrics App to monitor utilization and resize accordingly
Smoothing & Bursting: Fabric smooths CU usage over 24-hour windows, allowing short bursts without throttling
Optimize Spark jobs: Use V-Order optimization, partition pruning, and right-size Spark sessions
Use Direct Lake: Avoid import-mode datasets that consume memory — Direct Lake reads from OneLake directly
Monitor with Capacity Metrics: Install the Microsoft Fabric Capacity Metrics app to track CU consumption by workload

How Capacity Works: Smoothing, Bursting & Throttling

Fabric doesn't enforce CU limits on a per-second basis. Instead, it uses a smoothing mechanism that spreads your CU consumption over time windows, allowing temporary bursts above your SKU limit. Understanding these mechanics is essential for right-sizing and avoiding unexpected throttling.

📈 Smoothing (24-Hour Window)

Every Fabric operation consumes CU-seconds. Instead of evaluating consumption instantly, Fabric smooths it over a rolling 24-hour window. This means a heavy job at 2 AM can be offset by idle time at 3 AM — your effective utilization is the average, not the peak.

CU Smoothing — Peaks Are Averaged Over 24 Hours

As long as the green smoothed line stays below the blue SKU limit, your capacity is healthy — even if individual spikes (orange) exceed the limit temporarily.

⚡ Bursting

Fabric allows your workloads to burst above your SKU's CU allocation for short periods. This is not extra capacity you pay for — it's borrowed from your future idle time. Bursting is automatic and requires no configuration.

Bursting — Borrow Now, Pay Back with Idle Time Later

✅ Key Insight

Bursting lets you run a heavy Spark job at 150% of your SKU for 2 hours without throttling — as long as the rest of your 24-hour window is quiet enough to bring the smoothed average back under the limit.

🚫 Throttling & Rejection

When your smoothed CU consumption exceeds the SKU limit for too long, Fabric begins throttling. There are two levels of enforcement:

Throttling Stages — From Healthy to Rejected

Stage	Trigger	Impact	What to Do
✅ Healthy	Smoothed CU < 100% of SKU	No impact — all jobs run normally	Keep monitoring
⚠️ Throttled	Smoothed CU exceeds 100% (10-min window overage)	Interactive jobs (queries, reports) delayed 20+ seconds	Reduce concurrent jobs or wait for usage to drop
🔶 Heavy Throttle	Sustained overage across multiple windows	Both interactive & background jobs delayed significantly	Scale up SKU or pause non-critical workloads
🛑 Rejected	Extreme sustained overage (24-hour carry-forward full)	New job submissions fail with errors	Immediately scale up or cancel heavy jobs

🧮 Worked Example: F64 Capacity — A Day in the Life

Here's a concrete scenario for a team running on F64 (64 CUs) to illustrate how smoothing and bursting work together:

F64 Example — 24-Hour CU Usage Timeline

Time Block	Activity	CU Usage	Duration	CU-Hours
12–6 AM	Idle + scheduled refresh	~5 CU	6 hours	30
6–9 AM	Morning ETL pipelines (Spark + Data Factory)	~120 CU ⚡	3 hours	360
9 AM–12 PM	BI queries + Spark notebooks	~50 CU	3 hours	150
12–3 PM	Light BI usage, lunch break	~30 CU	3 hours	90
3–6 PM	Reports + ad-hoc Spark	~55 CU	3 hours	165
6 PM–12 AM	Minimal activity	~10 CU	6 hours	60
Total			24 hours	855 CU-hours
24h Average = 855 ÷ 24				~35.6 CU ✅

✅ Why This Works

Even though the 6–9 AM ETL burst consumed 120 CU (nearly 2× the F64 limit), the 24-hour smoothed average is only ~36 CU — well under the 64 CU limit. The long idle hours from 12–6 AM and 6 PM–12 AM "pay back" the burst. No throttling occurs.

⚠️ When This Breaks

If this same team also ran a second heavy ETL job (100+ CU) from 3–6 PM, the 24-hour average would jump to ~52 CU. Add concurrent Spark notebooks and it could exceed 64 CU — triggering throttling. The fix: either stagger heavy jobs, optimize them, or scale to F128.

⚠️ Watch Out

If your capacity is throttled (CU usage exceeds your allocation), jobs will be queued or rejected. Monitor the overages dashboard in the Capacity Metrics app and consider scaling up or optimizing heavy jobs.

📚 Learn More

Capacity Planning Guide ↗ Plan Capacity Size ↗ SKU Estimator Tool ↗ Capacity Metrics App ↗ Licensing & SKUs ↗ Fabric Pricing ↗

Protection

⚡ Surge Protection

Proactively prevent capacity overload by rejecting background jobs before they cause deep throttling — at both the capacity and workspace level.

Standard throttling is reactive — Fabric delays or rejects jobs after the capacity is already overloaded, requiring a long recovery window. Surge protection is proactive: it rejects new background jobs before the capacity reaches critical levels, keeping interactive workloads (reports, queries) responsive.

💡 Why It Matters

Without surge protection, background jobs (scheduled refreshes, pipeline runs, Spark jobs) keep being accepted until the capacity hits 100% — then everything gets throttled, including interactive queries and Power BI reports. With surge protection, background jobs are rejected early so interactive workloads remain unaffected.

How Surge Protection Works

Admins configure two thresholds that control when background jobs are rejected and when they're allowed to resume:

Surge Protection Thresholds — Background Rejection & Recovery

Threshold	What Happens	Default
🛑 Background Rejection	When the 24-hour rolling average of background CU usage hits this %, new background jobs are rejected	Not set (disabled)
✅ Background Recovery	When background CU usage falls below this %, rejected background jobs are allowed to resume	Not set (disabled)

✅ Key Insight

Only background operations (scheduled refreshes, pipeline runs, Spark batch jobs) are rejected by surge protection. Interactive workloads (Power BI reports, ad-hoc queries, notebook exploration) continue running normally. This ensures end users aren't impacted while the system recovers.

Workspace-Level Surge Protection

Beyond capacity-wide thresholds, admins can set per-workspace CU limits for even more granular control. This prevents a single runaway workspace from monopolizing the entire capacity.

📊 CU Spend Limit

Set a maximum CU consumption per workspace as a percentage of total capacity within a rolling 24-hour window. If a workspace exceeds this, it's automatically blocked from new operations.

🏥 Mission Critical Mode

Tag business-critical workspaces as Mission Critical to exempt them from surge protection rules. These workspaces always get resources — even when other workspaces are being blocked.

🔒 Manual Blocking

Admins can manually block or unblock specific workspaces at any time — useful for emergency situations or during maintenance windows.

📈 Monitoring

Track surge protection triggers, workspace block/unblock events, and CU usage in the Capacity Metrics app and Real-Time Hub for full visibility.

How to Enable Surge Protection

Configuration Steps — Admin Portal

⚙️

Admin Portal

Open Fabric Admin Portal → Capacity settings

→

🎚️

Set Thresholds

Configure rejection & recovery % for background jobs

→

🏢

Workspace Limits

Set per-workspace CU spend % and Mission Critical tags

→

📊

Monitor

Track events in Capacity Metrics app & Real-Time Hub

Surge Protection Best Practices

Start with conservative thresholds: Set the background rejection threshold at 70-80% and recovery at 50-60%. Adjust based on observed patterns.
Tag critical workspaces first: Before enabling workspace-level limits, identify and tag your Mission Critical workspaces to avoid accidentally blocking production workloads.
Monitor before enforcing: Enable surge protection in monitoring mode first (via the Capacity Metrics app) to understand your usage patterns before setting strict thresholds.
Stagger background schedules: Combine surge protection with job scheduling best practices — spread scheduled refreshes across the day to reduce peak background CU consumption.
Review regularly: As workloads grow, revisit your thresholds and workspace limits monthly. What worked at F32 may need adjustment at F64.

⚠️ Error to Watch For

When surge protection rejects a job, users see the error CapacityLimitExceededSurgeProtection. If this happens frequently, either increase your rejection threshold, optimize the heavy background jobs, or scale up your capacity SKU.

Without vs. With Surge Protection

Aspect	Without Surge Protection	With Surge Protection
Background job acceptance	Accepted until 100% capacity → everything throttled	Rejected at admin-set threshold → interactive jobs safe
Interactive workload impact	Delayed or blocked alongside background jobs	Remain responsive — only background jobs are restricted
Recovery time	Long — entire capacity must wind down	Short — fewer jobs in the queue, faster recovery
Workspace isolation	None — any workspace can consume all CUs	Per-workspace CU limits prevent monopolization
Critical workload priority	No differentiation	Mission Critical workspaces exempt from limits

📚 Learn More

Surge Protection — Official Docs ↗ Workspace-Level Controls Blog Post ↗ Capacity Metrics App ↗

Planning

Capacity Sizing Guide

How to right-size your Fabric capacity for new workloads and migrations — avoid over-provisioning and throttling.

Sizing Approach

Capacity sizing is not a one-time exercise — it's an iterative process. Start with an estimate, deploy, monitor, and adjust. The key principle: size for your typical load, not your peak. Fabric's bursting and smoothing mechanisms handle short spikes automatically.

Capacity Sizing Lifecycle

📐

Estimate

Use the SKU Estimator and workload analysis

→

🧪

Pilot

Run representative workloads on trial capacity

→

📊

Monitor

Track CU consumption with Capacity Metrics app

→

⚖️

Adjust

Scale up/down based on actual usage patterns

Sizing for New Workloads

When starting fresh with Fabric, follow this framework:

Step 1: Inventory Your Workloads

Catalog what you plan to run and the expected scale:

Workload Type	Key Sizing Factors	CU Impact
Data Ingestion (Data Factory)	Data volume, number of sources, refresh frequency	Low–Medium
Spark Notebooks	Data volume, transformation complexity, cluster size, concurrency	High
Data Warehouse (T-SQL)	Query complexity, concurrent users, data volume	Medium–High
Power BI (Direct Lake / Import)	Dataset size, concurrent report viewers, refresh rate	Low–Medium
Real-Time Intelligence	Event ingestion rate, query concurrency, retention period	Medium–High
Data Science / ML	Model training size, experiment frequency, serving load	High

Step 2: Use the SKU Estimator

Microsoft provides an official Fabric SKU Estimator tool. Input your expected user count, data volumes, refresh rates, and workload mix to get a recommended starting SKU.

Step 3: Start with Trial or Dev Capacity

Scenario	Recommended Starting SKU	Rationale
POC / Learning (1-3 users)	F2 or Free Trial (F64)	Minimal cost; trial gives 60 days of F64 for free
Small team (5-10 users, <50 tables)	F4 – F8	Enough for light Spark jobs + Power BI
Department (10-50 users, multiple pipelines)	F16 – F32	Concurrent Spark + SQL + BI workloads
Enterprise (50+ users, cross-domain)	F64 – F128	F64+ enables free Power BI viewing; handles concurrency
Large enterprise (multi-region, mission-critical)	F256+	High concurrency, multiple domains, always-on workloads

✅ Rule of Thumb

Add 10-15% headroom above your average expected CU usage to account for growth and unexpected spikes. It's better to start one SKU lower and scale up than to over-provision — Fabric lets you resize capacity at any time.

Sizing for Migrations

When migrating from an existing platform, you have historical usage data to guide your sizing:

From Power BI Premium (P-SKU)

Power BI Premium SKU	Equivalent Fabric SKU	Capacity Units
P1	F64	64 CUs
P2	F128	128 CUs
P3	F256	256 CUs
P4	F512	512 CUs
P5	F1024	1024 CUs

You can enable Fabric on your existing P-SKU capacity — no need to purchase a new one. The same capacity pool now supports all Fabric workloads in addition to Power BI.

From Azure Synapse / Databricks

Inventory current compute: Document your Spark pool sizes, SQL DWU usage, and pipeline activity hours
Map to CUs: There's no exact 1:1 mapping. Run representative workloads on a Fabric trial to benchmark actual CU consumption
Start parallel: Run Fabric and legacy platform side-by-side during migration. Compare performance and CU usage before cutting over
Use Shortcuts: Point Fabric to your existing ADLS storage via shortcuts — this lets you test Fabric compute without moving data

From On-Premises (SQL Server / SSIS)

Measure peak CPU and memory on existing SQL Servers during ETL windows
Start with F8-F16 for most departmental SQL Server migrations
Test with mirroring: Use Fabric mirroring to replicate your SQL databases to OneLake and measure CU impact before full migration
Account for concurrency: Cloud workloads often see higher concurrency than on-prem — size accordingly

Ongoing Optimization

📊 Monitor with Capacity Metrics

Install the Capacity Metrics app from day one. Track CU consumption by workload type, identify throttling events, and spot optimization opportunities.

⏸️ Pause Non-Production

Pause dev/test capacities outside business hours and on weekends. This alone can save 60-70% on non-production capacity costs.

🔄 Leverage Bursting & Smoothing

Fabric smooths CU consumption over 24-hour windows and allows bursting up to the capacity limit. Size for your average load, not peak — bursting handles spikes.

🎯 Optimize Before Scaling

Before upgrading your SKU, optimize: V-Order on tables, efficient Spark sessions, proper partitioning, and well-designed DAX. Fix the bottleneck, not the capacity.

⚠️ Common Sizing Mistakes

1. Over-provisioning from day one — start small, monitor, scale up. 2. Ignoring concurrency — multiple users running Spark jobs simultaneously consume far more CUs than sequential runs. 3. Not using the Capacity Metrics app — flying blind leads to either overspending or throttling. 4. Forgetting that F64 is the minimum for free Power BI viewer access — factor this into your licensing decision.

📚 Learn More

Fabric Licenses & SKUs ↗ Fabric Operations ↗

📎 See Also

Operations — CI/CD & Deployment →

Planning

💰 TCO / ROI Calculator

Compare your current data platform spend against Microsoft Fabric to estimate savings and build a business case.

Enter your current monthly costs for each category below. The calculator estimates the equivalent Fabric cost based on published pricing and typical migration benchmarks, then shows potential savings and a recommended SKU.

Current Monthly Spend

Data Warehouse (Synapse / Snowflake / Redshift) $8,000

ETL / Pipelines (ADF / Databricks / Glue) $5,000

BI & Reporting (Power BI Premium / Tableau) $4,000

Storage (ADLS / S3 / Blob) $2,000

Data Volume (TB) 10 TB

Total Report Consumers 100

💡 About These Estimates

This calculator provides directional estimates based on published Fabric pricing and typical migration savings reported in Microsoft case studies. Actual savings depend on workload complexity, optimization, and usage patterns. For precise estimates, use the official SKU Estimator and run a pilot.

📚 Learn More

Fabric Pricing ↗ SKU Estimator Tool ↗ Customer Scenario Templates →