๐Ÿ‘ค Who is this for?

IT Admin Business Leader Data Architect โ€” This section covers capacity SKU selection, cost optimization strategies, CU smoothing and throttling, capacity sizing for new workloads and migrations, and a TCO/ROI calculator.

Capacity

Capacity & Cost Management

Understanding Fabric's capacity model and optimizing your spend.

Capacity Units (CUs)

Microsoft Fabric uses a universal compute unit called a Capacity Unit (CU). All workloads โ€” Spark, SQL, Power BI, Data Factory โ€” consume from the same CU pool. This simplifies capacity planning compared to provisioning separate services.

SKU Options

SKUCapacity UnitsUse CasePay-As-You-Go / month*1-Year Reservation / month*Savings
F22 CUsPOC / Learning~$262~$156~40%
F44 CUsSmall team dev~$525~$313~40%
F88 CUsSmall production~$1,050~$625~40%
F1616 CUsMedium workloads~$2,100~$1,251~40%
F3232 CUsDepartment-level~$4,200~$2,501~40%
F6464 CUsLarge workloads~$8,400~$5,003~40%
F128128 CUsEnterprise~$16,800~$10,005~40%
F256+256+ CUsLarge enterprise~$33,600+~$20,011+~40%
๐Ÿ’ก Compute โ‰  Storage โ€” They Are Billed Separately

The prices above cover compute capacity only (CU processing power for Spark, SQL, Power BI, Data Factory, etc.). OneLake storage is billed separately at standard Azure storage rates (~$0.023/GB/month for hot tier). This means you pay for capacity even when paused if data remains stored. Budget for both compute and storage when planning your Fabric costs.

*Approximate pricing (USD, East US region). Check the official pricing page for current rates. 3-year reservations offer even deeper discounts. F64+ includes free Power BI viewer access (no Pro license needed).

Pricing Models

Pay-As-You-Go

Billed per second of compute used. Ideal for variable workloads. Pause capacity when not needed to stop billing.

Reserved Capacity

1-year or 3-year commitment with up to 40% discount. Best for predictable, always-on production workloads.

Fabric Trial

Free 60-day trial with F64 capacity. Great for evaluation and proof-of-concept before committing.

๐Ÿงฎ Capacity Cost Calculator

Estimate your Fabric capacity needs based on your expected workload mix. Adjust the sliders to see a real-time SKU recommendation.

4h
20
10
20/h
0

Cost Optimization Strategies

How Capacity Works: Smoothing, Bursting & Throttling

Fabric doesn't enforce CU limits on a per-second basis. Instead, it uses a smoothing mechanism that spreads your CU consumption over time windows, allowing temporary bursts above your SKU limit. Understanding these mechanics is essential for right-sizing and avoiding unexpected throttling.

๐Ÿ“ˆ Smoothing (24-Hour Window)

Every Fabric operation consumes CU-seconds. Instead of evaluating consumption instantly, Fabric smooths it over a rolling 24-hour window. This means a heavy job at 2 AM can be offset by idle time at 3 AM โ€” your effective utilization is the average, not the peak.

CU Smoothing โ€” Peaks Are Averaged Over 24 Hours
CU SKU Limit F64 limit Actual CU usage (spiky) Smoothed average (what Fabric evaluates) SKU limit 12 AM 6 AM 12 PM 6 PM 12 AM

As long as the green smoothed line stays below the blue SKU limit, your capacity is healthy โ€” even if individual spikes (orange) exceed the limit temporarily.

โšก Bursting

Fabric allows your workloads to burst above your SKU's CU allocation for short periods. This is not extra capacity you pay for โ€” it's borrowed from your future idle time. Bursting is automatic and requires no configuration.

Bursting โ€” Borrow Now, Pay Back with Idle Time Later
SKU limit โšก Burst zone ๐Ÿ’ค Payback (idle) Heavy ETL job Low activity period CUs borrowed (burst above limit) CUs returned (idle repays the debt)
โœ… Key Insight

Bursting lets you run a heavy Spark job at 150% of your SKU for 2 hours without throttling โ€” as long as the rest of your 24-hour window is quiet enough to bring the smoothed average back under the limit.

๐Ÿšซ Throttling & Rejection

When your smoothed CU consumption exceeds the SKU limit for too long, Fabric begins throttling. There are two levels of enforcement:

Throttling Stages โ€” From Healthy to Rejected
โœ… Healthy Smoothed CU < SKU limit โ†’ โš ๏ธ Throttled Interactive jobs delayed 20s+ โ†’ ๐Ÿ”ถ Heavy Throttle Background jobs also delayed โ†’ ๐Ÿ›‘ Rejected New requests fail / error out Low overage Sustained overage
StageTriggerImpactWhat to Do
โœ… HealthySmoothed CU < 100% of SKUNo impact โ€” all jobs run normallyKeep monitoring
โš ๏ธ ThrottledSmoothed CU exceeds 100% (10-min window overage)Interactive jobs (queries, reports) delayed 20+ secondsReduce concurrent jobs or wait for usage to drop
๐Ÿ”ถ Heavy ThrottleSustained overage across multiple windowsBoth interactive & background jobs delayed significantlyScale up SKU or pause non-critical workloads
๐Ÿ›‘ RejectedExtreme sustained overage (24-hour carry-forward full)New job submissions fail with errorsImmediately scale up or cancel heavy jobs

๐Ÿงฎ Worked Example: F64 Capacity โ€” A Day in the Life

Here's a concrete scenario for a team running on F64 (64 CUs) to illustrate how smoothing and bursting work together:

F64 Example โ€” 24-Hour CU Usage Timeline
CU 128 96 64 32 0 64 CU ~5 CU 12โ€“6 AM Idle / scheduled refresh ~120 CU โšก BURST 6โ€“9 AM Morning ETL pipelines ~50 CU 9 AMโ€“12 PM BI queries + Spark ~30 CU 12โ€“3 PM Light BI usage ~55 CU 3โ€“6 PM Reports + ad-hoc ~10 CU 6 PMโ€“12 AM Minimal activity ~38 CU avg 24-hour smoothed average: ~38 CU โ†’ well under 64 CU limit โœ…
Time BlockActivityCU UsageDurationCU-Hours
12โ€“6 AMIdle + scheduled refresh~5 CU6 hours30
6โ€“9 AMMorning ETL pipelines (Spark + Data Factory)~120 CU โšก3 hours360
9 AMโ€“12 PMBI queries + Spark notebooks~50 CU3 hours150
12โ€“3 PMLight BI usage, lunch break~30 CU3 hours90
3โ€“6 PMReports + ad-hoc Spark~55 CU3 hours165
6 PMโ€“12 AMMinimal activity~10 CU6 hours60
Total24 hours855 CU-hours
24h Average = 855 รท 24~35.6 CU โœ…
โœ… Why This Works

Even though the 6โ€“9 AM ETL burst consumed 120 CU (nearly 2ร— the F64 limit), the 24-hour smoothed average is only ~36 CU โ€” well under the 64 CU limit. The long idle hours from 12โ€“6 AM and 6 PMโ€“12 AM "pay back" the burst. No throttling occurs.

โš ๏ธ When This Breaks

If this same team also ran a second heavy ETL job (100+ CU) from 3โ€“6 PM, the 24-hour average would jump to ~52 CU. Add concurrent Spark notebooks and it could exceed 64 CU โ€” triggering throttling. The fix: either stagger heavy jobs, optimize them, or scale to F128.

โš ๏ธ Watch Out

If your capacity is throttled (CU usage exceeds your allocation), jobs will be queued or rejected. Monitor the overages dashboard in the Capacity Metrics app and consider scaling up or optimizing heavy jobs.

Protection

โšก Surge Protection

Proactively prevent capacity overload by rejecting background jobs before they cause deep throttling โ€” at both the capacity and workspace level.

Standard throttling is reactive โ€” Fabric delays or rejects jobs after the capacity is already overloaded, requiring a long recovery window. Surge protection is proactive: it rejects new background jobs before the capacity reaches critical levels, keeping interactive workloads (reports, queries) responsive.

๐Ÿ’ก Why It Matters

Without surge protection, background jobs (scheduled refreshes, pipeline runs, Spark jobs) keep being accepted until the capacity hits 100% โ€” then everything gets throttled, including interactive queries and Power BI reports. With surge protection, background jobs are rejected early so interactive workloads remain unaffected.

How Surge Protection Works

Admins configure two thresholds that control when background jobs are rejected and when they're allowed to resume:

Surge Protection Thresholds โ€” Background Rejection & Recovery
CU % 100% 80% 60% 30% 0% Rejection Recovery ๐Ÿ›‘ BG jobs rejected ๐Ÿ›‘ Rejected โœ… BG jobs resume 12 AM 6 AM 12 PM 6 PM 12 AM Rejection threshold (admin-set) Recovery threshold (admin-set) Background CU usage
ThresholdWhat HappensDefault
๐Ÿ›‘ Background RejectionWhen the 24-hour rolling average of background CU usage hits this %, new background jobs are rejectedNot set (disabled)
โœ… Background RecoveryWhen background CU usage falls below this %, rejected background jobs are allowed to resumeNot set (disabled)
โœ… Key Insight

Only background operations (scheduled refreshes, pipeline runs, Spark batch jobs) are rejected by surge protection. Interactive workloads (Power BI reports, ad-hoc queries, notebook exploration) continue running normally. This ensures end users aren't impacted while the system recovers.

Workspace-Level Surge Protection

Beyond capacity-wide thresholds, admins can set per-workspace CU limits for even more granular control. This prevents a single runaway workspace from monopolizing the entire capacity.

๐Ÿ“Š CU Spend Limit

Set a maximum CU consumption per workspace as a percentage of total capacity within a rolling 24-hour window. If a workspace exceeds this, it's automatically blocked from new operations.

๐Ÿฅ Mission Critical Mode

Tag business-critical workspaces as Mission Critical to exempt them from surge protection rules. These workspaces always get resources โ€” even when other workspaces are being blocked.

๐Ÿ”’ Manual Blocking

Admins can manually block or unblock specific workspaces at any time โ€” useful for emergency situations or during maintenance windows.

๐Ÿ“ˆ Monitoring

Track surge protection triggers, workspace block/unblock events, and CU usage in the Capacity Metrics app and Real-Time Hub for full visibility.

How to Enable Surge Protection

Configuration Steps โ€” Admin Portal
โš™๏ธ
Admin Portal
Open Fabric Admin Portal โ†’ Capacity settings
โ†’
๐ŸŽš๏ธ
Set Thresholds
Configure rejection & recovery % for background jobs
โ†’
๐Ÿข
Workspace Limits
Set per-workspace CU spend % and Mission Critical tags
โ†’
๐Ÿ“Š
Monitor
Track events in Capacity Metrics app & Real-Time Hub

Surge Protection Best Practices

โš ๏ธ Error to Watch For

When surge protection rejects a job, users see the error CapacityLimitExceededSurgeProtection. If this happens frequently, either increase your rejection threshold, optimize the heavy background jobs, or scale up your capacity SKU.

Without vs. With Surge Protection

AspectWithout Surge ProtectionWith Surge Protection
Background job acceptanceAccepted until 100% capacity โ†’ everything throttledRejected at admin-set threshold โ†’ interactive jobs safe
Interactive workload impactDelayed or blocked alongside background jobsRemain responsive โ€” only background jobs are restricted
Recovery timeLong โ€” entire capacity must wind downShort โ€” fewer jobs in the queue, faster recovery
Workspace isolationNone โ€” any workspace can consume all CUsPer-workspace CU limits prevent monopolization
Critical workload priorityNo differentiationMission Critical workspaces exempt from limits
Planning

Capacity Sizing Guide

How to right-size your Fabric capacity for new workloads and migrations โ€” avoid over-provisioning and throttling.

Sizing Approach

Capacity sizing is not a one-time exercise โ€” it's an iterative process. Start with an estimate, deploy, monitor, and adjust. The key principle: size for your typical load, not your peak. Fabric's bursting and smoothing mechanisms handle short spikes automatically.

Capacity Sizing Lifecycle
๐Ÿ“
Estimate
Use the SKU Estimator and workload analysis
โ†’
๐Ÿงช
Pilot
Run representative workloads on trial capacity
โ†’
๐Ÿ“Š
Monitor
Track CU consumption with Capacity Metrics app
โ†’
โš–๏ธ
Adjust
Scale up/down based on actual usage patterns

Sizing for New Workloads

When starting fresh with Fabric, follow this framework:

Step 1: Inventory Your Workloads

Catalog what you plan to run and the expected scale:

Workload TypeKey Sizing FactorsCU Impact
Data Ingestion (Data Factory)Data volume, number of sources, refresh frequencyLowโ€“Medium
Spark NotebooksData volume, transformation complexity, cluster size, concurrencyHigh
Data Warehouse (T-SQL)Query complexity, concurrent users, data volumeMediumโ€“High
Power BI (Direct Lake / Import)Dataset size, concurrent report viewers, refresh rateLowโ€“Medium
Real-Time IntelligenceEvent ingestion rate, query concurrency, retention periodMediumโ€“High
Data Science / MLModel training size, experiment frequency, serving loadHigh

Step 2: Use the SKU Estimator

Microsoft provides an official Fabric SKU Estimator tool. Input your expected user count, data volumes, refresh rates, and workload mix to get a recommended starting SKU.

Step 3: Start with Trial or Dev Capacity

ScenarioRecommended Starting SKURationale
POC / Learning (1-3 users)F2 or Free Trial (F64)Minimal cost; trial gives 60 days of F64 for free
Small team (5-10 users, <50 tables)F4 โ€“ F8Enough for light Spark jobs + Power BI
Department (10-50 users, multiple pipelines)F16 โ€“ F32Concurrent Spark + SQL + BI workloads
Enterprise (50+ users, cross-domain)F64 โ€“ F128F64+ enables free Power BI viewing; handles concurrency
Large enterprise (multi-region, mission-critical)F256+High concurrency, multiple domains, always-on workloads
โœ… Rule of Thumb

Add 10-15% headroom above your average expected CU usage to account for growth and unexpected spikes. It's better to start one SKU lower and scale up than to over-provision โ€” Fabric lets you resize capacity at any time.

Sizing for Migrations

When migrating from an existing platform, you have historical usage data to guide your sizing:

From Power BI Premium (P-SKU)

Power BI Premium SKUEquivalent Fabric SKUCapacity Units
P1F6464 CUs
P2F128128 CUs
P3F256256 CUs
P4F512512 CUs
P5F10241024 CUs

You can enable Fabric on your existing P-SKU capacity โ€” no need to purchase a new one. The same capacity pool now supports all Fabric workloads in addition to Power BI.

From Azure Synapse / Databricks

From On-Premises (SQL Server / SSIS)

Ongoing Optimization

๐Ÿ“Š Monitor with Capacity Metrics

Install the Capacity Metrics app from day one. Track CU consumption by workload type, identify throttling events, and spot optimization opportunities.

โธ๏ธ Pause Non-Production

Pause dev/test capacities outside business hours and on weekends. This alone can save 60-70% on non-production capacity costs.

๐Ÿ”„ Leverage Bursting & Smoothing

Fabric smooths CU consumption over 24-hour windows and allows bursting up to the capacity limit. Size for your average load, not peak โ€” bursting handles spikes.

๐ŸŽฏ Optimize Before Scaling

Before upgrading your SKU, optimize: V-Order on tables, efficient Spark sessions, proper partitioning, and well-designed DAX. Fix the bottleneck, not the capacity.

โš ๏ธ Common Sizing Mistakes

1. Over-provisioning from day one โ€” start small, monitor, scale up. 2. Ignoring concurrency โ€” multiple users running Spark jobs simultaneously consume far more CUs than sequential runs. 3. Not using the Capacity Metrics app โ€” flying blind leads to either overspending or throttling. 4. Forgetting that F64 is the minimum for free Power BI viewer access โ€” factor this into your licensing decision.

Planning

๐Ÿ’ฐ TCO / ROI Calculator

Compare your current data platform spend against Microsoft Fabric to estimate savings and build a business case.

Enter your current monthly costs for each category below. The calculator estimates the equivalent Fabric cost based on published pricing and typical migration benchmarks, then shows potential savings and a recommended SKU.

Current Monthly Spend

$8,000
$5,000
$4,000
$2,000
10 TB
100
๐Ÿ’ก About These Estimates

This calculator provides directional estimates based on published Fabric pricing and typical migration savings reported in Microsoft case studies. Actual savings depend on workload complexity, optimization, and usage patterns. For precise estimates, use the official SKU Estimator and run a pilot.