πŸ‘€ Who is this for?

Data Engineer Application Developer Data Architect DBA β€” This page covers all flavors of User Defined Functions available in Microsoft Fabric: T-SQL functions in Warehouse and SQL Database, serverless Python User Data Functions, and Spark UDFs in notebooks.

Overview

User Defined Functions in Fabric

Encapsulate reusable logic across warehouses, notebooks, pipelines, and serverless endpoints.

User Defined Functions (UDFs) let you package custom business logic so it can be reused, tested, and governed independently from the queries or pipelines that call it. Microsoft Fabric supports UDFs across three distinct engines β€” each with different maturity levels, syntax, and use cases.

πŸ“ T-SQL Functions

Inline Table-Valued Functions (GA) and Scalar UDFs (Preview) in Fabric Warehouse and the SQL analytics endpoint. Full T-SQL functions available in Fabric SQL Database.

⚑ User Data Functions

A new first-class Fabric item type β€” serverless Python functions callable via REST, from notebooks (notebookutils.udf), pipelines, Activator, and GraphQL.

πŸ”₯ Spark UDFs

PySpark and Scala UDFs for transforming data within notebooks and Spark jobs. Session-scoped by default, shareable via Environments and libraries.

🎯 Why it matters

UDFs reduce duplication, enforce consistent business rules, simplify testing, and let platform teams provide reusable building blocks to downstream consumers.

⚠️ Maturity varies by engine

Not all UDF types are GA. User Data Functions and Scalar UDFs in Warehouse are currently in Preview. Plan accordingly for production workloads β€” see the status table below.

Feature Maturity Matrix

UDF TypeEngineStatusKey Limitation
Inline Table-Valued FunctionsFabric Warehouse / SQL Endpointβœ… GASingle SELECT only
Scalar UDFsFabric Warehouse / SQL Endpoint⚠️ PreviewMust be inlineable; max 10 per query
Full T-SQL FunctionsFabric SQL Databaseβœ… GANone (Azure SQL DB parity)
User Data FunctionsData Engineering (Python serverless)⚠️ PreviewOwner-only editing; 240s timeout
notebookutils.udfSpark Notebooks⚠️ PreviewDepends on User Data Functions item
Spark UDFs (PySpark/Scala)Notebooks / Spark Jobsβœ… GASession-scoped; no shared registry
Pandas UDFs (vectorized)Notebooks / Spark Jobsβœ… GAApache Arrow serialization required
Multi-Statement TVFsFabric Warehouse❌ Not SupportedUse iTVF or Spark instead
T-SQL

T-SQL Functions in Warehouse

CREATE FUNCTION for Fabric Data Warehouse and the Lakehouse SQL analytics endpoint.

Inline Table-Valued Functions (GA)

Inline TVFs are the most performant and fully supported function type in Fabric Warehouse. They return a table result and consist of a single SELECT statement β€” the optimizer can inline them directly into the calling query plan.

Use Case: Regional Sales Filter
-- A retail team needs to filter sales by region across multiple reports
-- Instead of repeating the JOIN + WHERE in every query, wrap it in an iTVF

CREATE FUNCTION dbo.GetRegionalSales(@region NVARCHAR(50), @startDate DATE)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
    SELECT 
        o.OrderID,
        o.CustomerID,
        o.TotalAmount,
        o.OrderDate,
        s.StoreName,
        s.Region
    FROM dbo.Orders AS o
    INNER JOIN dbo.Stores AS s ON o.StoreID = s.StoreID
    WHERE s.Region = @region
      AND o.OrderDate >= @startDate
);

-- Usage: analysts can now query regionally without knowing the join logic
SELECT * FROM dbo.GetRegionalSales('West', '2025-01-01')
WHERE TotalAmount > 500
ORDER BY OrderDate DESC;

Scalar UDFs (Preview β€” Inlineable Only)

Scalar UDFs return a single value. In Fabric Warehouse, they only work when the query optimizer can inline them β€” meaning the function body is transformed into a subquery at compile time. Non-inlineable scalar UDFs will fail at execution.

Use Case: Dynamic Pricing Calculation
-- E-commerce team applies tiered discounts based on customer loyalty level
-- This logic is reused across order processing, reporting, and forecasting

CREATE FUNCTION dbo.CalculateDiscount(
    @basePrice DECIMAL(12,2),
    @loyaltyTier INT
)
RETURNS DECIMAL(12,2)
AS
BEGIN
    RETURN @basePrice * (
        CASE 
            WHEN @loyaltyTier >= 4 THEN 0.80  -- 20% off for platinum
            WHEN @loyaltyTier = 3  THEN 0.85  -- 15% off for gold
            WHEN @loyaltyTier = 2  THEN 0.90  -- 10% off for silver
            ELSE 1.00                          -- no discount
        END
    );
END;

-- Usage in a report query
SELECT 
    ProductName,
    ListPrice,
    dbo.CalculateDiscount(ListPrice, c.LoyaltyTier) AS FinalPrice
FROM dbo.Products p
INNER JOIN dbo.Customers c ON p.LastBuyerID = c.CustomerID;
πŸ’‘ Check inlineability

Verify your scalar UDF can be inlined before relying on it in production:

Check if a UDF is inlineable
SELECT 
    b.name AS function_name, 
    a.is_inlineable
FROM sys.sql_modules a
INNER JOIN sys.objects b ON a.object_id = b.object_id
WHERE b.type = 'FN';

-- If is_inlineable = 0, the UDF will FAIL when called on user tables

What Breaks Inlineability

❌ WHILE loops

Any iterative logic makes the function non-inlineable. Use CASE expressions or mathematical formulas instead.

❌ Nondeterministic calls

GETDATE(), RAND(), NEWID() prevent inlining. Pass these as parameters from the calling query.

❌ CTEs and subqueries

Common table expressions inside scalar UDFs break the inlining engine. Keep the body as a single expression.

❌ Table variable access

Any reference to table variables or temp tables in a scalar function blocks inlining.

Key Constraints

Fabric SQL Database β€” Full T-SQL Parity

If you're using SQL Database in Fabric (the OLTP offering, not Warehouse), you get full Azure SQL Database parity β€” including multi-statement TVFs, non-inlineable scalar functions, and all T-SQL constructs. This is because SQL Database in Fabric runs on the full SQL Server engine.

Use Case: Customer Health Score (Fabric SQL Database)
-- SaaS application calculates customer health score from multiple signals
-- Multi-statement TVF aggregates across tables β€” only works in Fabric SQL Database

CREATE FUNCTION dbo.GetCustomerHealthScores(@tenantId INT)
RETURNS @results TABLE (
    CustomerID INT,
    HealthScore DECIMAL(5,2),
    RiskLevel NVARCHAR(20),
    LastActivity DATETIME2
)
AS
BEGIN
    INSERT INTO @results
    SELECT 
        c.CustomerID,
        -- Weighted score from engagement, support, and usage signals
        (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) AS HealthScore,
        CASE 
            WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 80 THEN 'Healthy'
            WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 50 THEN 'At Risk'
            ELSE 'Critical'
        END,
        c.LastActivityDate
    FROM dbo.CustomerMetrics c
    WHERE c.TenantID = @tenantId
      AND c.IsActive = 1;
    RETURN;
END;

-- Used by the app's customer success dashboard
SELECT * FROM dbo.GetCustomerHealthScores(42) WHERE RiskLevel = 'Critical';
Python Serverless

User Data Functions

A first-class Fabric item: managed, serverless Python functions accessible across workloads.

⚠️ Preview Feature

User Data Functions are in Preview as of mid-2025 (announced at Ignite 2025). Expect limitations around ownership, regional availability, and service principal support. Not recommended for mission-critical production workloads yet.

What Are User Data Functions?

User Data Functions are a new Fabric item type under Data Engineering that hosts managed Python functions. Unlike Spark UDFs (which are session-scoped), these functions become shareable, versioned, and callable from any Fabric surface β€” notebooks, pipelines, Activator rules, GraphQL APIs, and external apps via REST.

🌐 REST-accessible

Each function gets an HTTP endpoint, making it callable from any application β€” Power Apps, Logic Apps, external microservices, or frontend apps.

πŸ“¦ Shareable across workspaces

Functions are Fabric items β€” they can be shared, governed, and discovered via the OneLake catalog just like any other artifact.

πŸ”Œ Native integrations

Call from notebooks via notebookutils.udf, trigger from Activator rules, chain in data pipelines, or use as GraphQL authorization hooks.

βš™οΈ Serverless execution

No Spark cluster needed. Functions execute in a lightweight Python runtime (3.11) with a 240-second timeout and up to 30 MB response size.

Programming Model

Use Case: Product Categorization for a Retail Catalog
import fabric.functions as fn

udf = fn.UserDataFunctions()

@udf.function()
def standardize_category(productName: str, rawCategory: str) -> dict:
    """
    Retail team receives product data from multiple suppliers with inconsistent
    categories. This function normalizes them into the company's standard taxonomy.
    Called from data pipelines during ingestion and from Power Apps for manual review.
    """
    category_mapping = {
        "electronics": ["tech", "devices", "gadgets", "computing", "hardware"],
        "clothing": ["apparel", "fashion", "garments", "wear", "textiles"],
        "home": ["furniture", "household", "kitchen", "decor", "garden"],
        "food": ["grocery", "beverages", "snacks", "organic", "fresh"]
    }
    
    raw_lower = rawCategory.lower().strip()
    standardized = next(
        (standard for standard, aliases in category_mapping.items() 
         if raw_lower in aliases),
        "other"
    )
    
    return {
        "product_name": productName,
        "original_category": rawCategory,
        "standardized_category": standardized,
        "confidence": "high" if standardized != "other" else "needs_review"
    }
Use Case: Anomaly Flagging for IoT Sensor Data
import fabric.functions as fn
from datetime import datetime

udf = fn.UserDataFunctions()

@udf.function()
def detect_sensor_anomaly(
    sensorId: str, 
    temperature: float, 
    pressure: float, 
    humidity: float
) -> dict:
    """
    Manufacturing plant monitors equipment via IoT sensors. This function
    is triggered by Fabric Activator when new readings arrive, flagging
    anomalies that need operator attention before equipment damage occurs.
    """
    alerts = []
    severity = "normal"
    
    # Temperature thresholds for industrial equipment
    if temperature > 85.0:
        alerts.append(f"CRITICAL: Temperature {temperature}Β°C exceeds safe limit (85Β°C)")
        severity = "critical"
    elif temperature > 70.0:
        alerts.append(f"WARNING: Temperature {temperature}Β°C approaching limit")
        severity = "warning"
    
    # Pressure differential check
    if pressure < 20.0 or pressure > 150.0:
        alerts.append(f"ALERT: Pressure {pressure} PSI outside normal range (20-150)")
        severity = "critical" if severity != "critical" else severity
    
    # Humidity correlation (high humidity + high temp = corrosion risk)
    if humidity > 80.0 and temperature > 60.0:
        alerts.append("WARNING: High humidity + temperature β€” corrosion risk")
        severity = max(severity, "warning", key=lambda x: ["normal","warning","critical"].index(x))
    
    return {
        "sensor_id": sensorId,
        "timestamp": datetime.utcnow().isoformat(),
        "severity": severity,
        "alerts": alerts,
        "requires_action": severity in ("warning", "critical")
    }

Calling User Data Functions

From a Notebook (notebookutils.udf)
# Call the function from any Fabric notebook β€” no Spark cluster needed for the UDF itself
myFunctions = notebookutils.udf.getFunctions("ProductCategorizer")

# Single invocation
result = myFunctions.standardize_category(
    productName="Wireless Earbuds Pro",
    rawCategory="gadgets"
)
print(result)
# {'product_name': 'Wireless Earbuds Pro', 'standardized_category': 'electronics', 'confidence': 'high'}

# Cross-workspace access (e.g., shared library workspace)
shared = notebookutils.udf.getFunctions("ProductCategorizer", "shared-workspace-id")
result = shared.standardize_category(productName="Desk Lamp", rawCategory="decor")
From a Data Pipeline (Functions Activity)
Pipeline Design:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Copy Activity  │────▢│  Functions Activity   │────▢│  Write to Lake   β”‚
β”‚  (Raw supplier  β”‚     β”‚  standardize_category β”‚     β”‚  (Curated layer) β”‚
β”‚   catalog CSV)  β”‚     β”‚  for each row         β”‚     β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Functions Activity calls your User Data Function as a pipeline step,
allowing you to embed custom logic without spinning up a Spark job.

Connections and Secrets

Use Case: Enrich Data from External API with Secrets
import fabric.functions as fn
import requests

udf = fn.UserDataFunctions()

@udf.function()
@udf.connection(alias="weather-api", argName="apiConnection")
@udf.generic_connection(argName="keyVault", audienceType="KeyVault")
def enrich_with_weather(
    city: str, 
    date: str, 
    apiConnection: fn.ConnectionInfo,
    keyVault: fn.ConnectionInfo
) -> dict:
    """
    Supply chain team enriches delivery forecasts with weather data.
    API key stored in Azure Key Vault, accessed via managed connection.
    """
    api_key = apiConnection.get_secret("api-key")
    response = requests.get(
        f"https://api.weather.example.com/v1/forecast",
        params={"city": city, "date": date},
        headers={"Authorization": f"Bearer {api_key}"}
    )
    data = response.json()
    
    return {
        "city": city,
        "date": date,
        "condition": data.get("condition", "unknown"),
        "risk_level": "high" if data.get("severe_weather") else "normal"
    }

Service Limits

LimitValue
Max request payload4 MB
Max execution timeout240 seconds
Max response size30 MB
Log retention30 days
Max private library size (.whl)28.6 MB
Python version (Runtime)3.11
Publish cooldown2 minutes between publishes

Preview Limitations

Spark

Spark UDFs in Notebooks

PySpark, Scala, and Pandas UDFs for in-session data transformations.

Spark UDFs are the traditional way to extend transformation logic in Fabric notebooks. They execute within your Spark session, operate row-by-row (or vectorized with Pandas UDFs), and are well-suited for data engineering pipelines at scale.

Standard PySpark UDF

Use Case: PII Masking in a Data Pipeline
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import hashlib

@udf(returnType=StringType())
def mask_email(email):
    """
    Healthcare company masks patient emails in the bronze-to-silver layer.
    The masked value is deterministic (same input = same hash) so downstream 
    joins still work, but the original email cannot be recovered.
    """
    if email is None:
        return None
    local, domain = email.split("@")
    hashed = hashlib.sha256(local.encode()).hexdigest()[:8]
    return f"{hashed}@{domain}"

# Apply during bronze β†’ silver transformation
df_patients = spark.read.format("delta").load("Tables/bronze_patients")
df_masked = df_patients.withColumn("email", mask_email("email"))
df_masked.write.format("delta").mode("overwrite").save("Tables/silver_patients")

Pandas UDF (Vectorized) β€” High Performance

Standard Python UDFs process one row at a time with heavy serialization overhead. Pandas UDFs use Apache Arrow to process batches of rows as pandas Series, delivering 10–100Γ— better performance for compute-heavy logic.

Use Case: Real-Time Feature Engineering for ML
import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import DoubleType

@pandas_udf(DoubleType())
def rolling_avg_spend(amounts: pd.Series) -> pd.Series:
    """
    Financial services company calculates rolling average transaction amounts
    for fraud detection features. Pandas UDF makes this 50x faster than 
    row-by-row processing on 100M+ transaction records.
    """
    return amounts.rolling(window=7, min_periods=1).mean()

# Apply to transaction history for ML feature store
df_transactions = spark.read.format("delta").load("Tables/transactions")
df_features = df_transactions.withColumn(
    "rolling_avg_7d", 
    rolling_avg_spend("amount")
)
df_features.write.format("delta").mode("overwrite").save("Tables/ml_features")
Use Case: Text Normalization at Scale
import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import StringType
import re

@pandas_udf(StringType())
def normalize_address(addresses: pd.Series) -> pd.Series:
    """
    Logistics company normalizes millions of shipping addresses to reduce 
    duplicate deliveries. Vectorized processing handles 50M rows in minutes.
    """
    def clean(addr):
        if addr is None:
            return None
        addr = addr.upper().strip()
        addr = re.sub(r'\bSTREET\b', 'ST', addr)
        addr = re.sub(r'\bAVENUE\b', 'AVE', addr)
        addr = re.sub(r'\bBOULEVARD\b', 'BLVD', addr)
        addr = re.sub(r'\bAPARTMENT\b', 'APT', addr)
        addr = re.sub(r'\s+', ' ', addr)
        return addr
    
    return addresses.apply(clean)

df_shipments = spark.read.format("delta").load("Tables/raw_shipments")
df_clean = df_shipments.withColumn("normalized_address", normalize_address("shipping_address"))

Sharing Spark UDFs Across Notebooks

Spark UDFs are session-scoped β€” they don't persist across notebooks by default. Here are the recommended patterns for reuse:

πŸ“¦ Fabric Environments

Package UDFs in a Python wheel (.whl) and install via a Fabric Environment. All notebooks attached to that environment can import your functions.

πŸ““ %run magic

Use %run /path/to/shared_udfs to execute a shared notebook that registers UDFs in the current session. Simple but creates a dependency chain.

πŸ”— notebookutils.udf (Preview)

The new cross-notebook pattern: call User Data Functions items from any notebook via notebookutils.udf.getFunctions(). No Spark cluster needed for the function itself.

πŸ—οΈ Workspace Libraries

Upload .whl or .jar files directly to the workspace default environment. Quick mode publishes in ~5 seconds.

Packaging UDFs as a Wheel for Environment Install
# Project structure for a shared UDF library
my_fabric_udfs/
β”œβ”€β”€ setup.py
β”œβ”€β”€ my_udfs/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ masking.py        # mask_email, mask_phone, mask_ssn
β”‚   β”œβ”€β”€ validation.py     # validate_email, validate_phone
β”‚   └── enrichment.py     # geocode_address, currency_convert
└── tests/
    β”œβ”€β”€ test_masking.py
    └── test_validation.py

# Build the wheel
python setup.py bdist_wheel

# Upload the .whl to your Fabric Environment β†’ Custom Libraries
# All notebooks using that environment can then:
from my_udfs.masking import mask_email
from my_udfs.validation import validate_email
Scenarios

Real-World Use Cases

End-to-end examples showing when and how to apply each UDF type in practice.

πŸ₯ Healthcare: Patient Data Standardization Pipeline

Scenario

A hospital system receives patient records from 12 different clinics. Each uses different formats for phone numbers, dates, and diagnosis codes. The pipeline must standardize everything before loading into the analytics lakehouse.

StepUDF Type UsedWhy
Phone normalization (50M records)Pandas UDFVectorized processing for raw volume; 100x faster than row-by-row
ICD-10 code validationUser Data FunctionCalls external medical coding API; needs secret management; shared across teams
Patient risk scoring (SQL reports)T-SQL Scalar UDFAnalysts query Warehouse directly; formula embedded in familiar SQL

🏭 Manufacturing: Equipment Predictive Maintenance

Scenario

A factory floor has 500 IoT sensors streaming data every 30 seconds. The team needs real-time anomaly detection plus historical trend analysis in the warehouse for monthly reviews.

StepUDF Type UsedWhy
Real-time anomaly detectionUser Data Function + ActivatorActivator triggers the UDF on each batch; sub-second response; alerts operator
Rolling averages for ML featuresPandas UDFNightly batch over 500M sensor readings needs vectorized performance
Monthly downtime cost calculationT-SQL iTVFAnalysts run ad-hoc queries in Warehouse; function encapsulates cost formula

πŸ›’ Retail: Omnichannel Order Processing

Scenario

A retailer processes orders from website, mobile app, and in-store POS. Each channel sends data in different formats. Business logic for pricing, tax, and loyalty points must be consistent everywhere.

StepUDF Type UsedWhy
Loyalty discount calculationT-SQL Scalar UDF (SQL DB) or User Data FunctionSame logic used by the app backend (SQL DB) and the analytics pipeline (UDF via REST)
Product category standardizationUser Data FunctionCalled from pipeline during ingestion AND from Power Apps for supplier onboarding
Address normalization (batch)Pandas UDFMillions of shipping records normalized nightly for delivery optimization
Tax calculation by jurisdictionT-SQL iTVF (Fabric SQL Database)Full MSTVF support in SQL DB; complex multi-state tax rules with table output

πŸ’° Financial Services: Anti-Fraud Pipeline

Scenario

A bank processes 2M card transactions daily and needs sub-minute fraud scoring for real-time alerts, plus batch analysis for model training.

StepUDF Type UsedWhy
Real-time velocity checkUser Data Function + ActivatorFires on each transaction event; checks 5 rules in 240ms; flags suspicious activity
Feature engineering (batch)Pandas UDFComputes rolling stats, geo-distance, time patterns across 2M daily rows
Account risk summaryT-SQL iTVFCompliance analysts query Warehouse with account-level risk view; parameterized by date range
Model scoring endpointUser Data FunctionWraps ML model inference; accessible from pipeline, notebook, or external fraud API
Decision Guide

Which UDF Type Should You Use?

Choose the right function approach based on your workload, team skills, and integration needs.

Requirement T-SQL iTVF T-SQL Scalar User Data Function Spark UDF Pandas UDF
Production-ready (GA)βœ…βš οΈ Preview⚠️ Previewβœ…βœ…
Best for SQL teamsβœ…βœ…βŒβŒβŒ
Best for Python teamsβŒβŒβœ…βœ…βœ…
Cross-workload sharing❌ Same DB❌ Same DBβœ… REST + notebook❌ Session-scoped❌ Session-scoped
External API callsβŒβŒβœ… (with connections)βœ… (with libraries)βœ… (with libraries)
High-volume batch (10M+ rows)βœ… (set-based)⚠️ Max 10/query❌ (per-call model)⚠️ Slowβœ… (vectorized)
Real-time event responseβŒβŒβœ… (Activator)❌❌
No Spark cluster neededβœ…βœ…βœ…βŒβŒ
Secret/connection mgmtβŒβŒβœ… Key VaultManualManual
Git integration / CI/CDβœ… (DDL in Git)βœ… (DDL in Git)βœ… (Fabric Git)Via EnvironmentVia Environment

Quick Decision Flowchart

Decision Logic
START: What's your primary need?
β”‚
β”œβ”€ "Transform data inside a SQL query" 
β”‚   β”œβ”€ Returns a table? ──────────────▢ T-SQL Inline TVF (GA)
β”‚   └─ Returns a single value? ───────▢ T-SQL Scalar UDF (Preview, inlineable only)
β”‚
β”œβ”€ "Share reusable logic across multiple Fabric items"
β”‚   β”œβ”€ Need REST endpoint? ───────────▢ User Data Function (Preview)
β”‚   β”œβ”€ Need Activator trigger? ───────▢ User Data Function (Preview)
β”‚   └─ Only notebooks? ──────────────▢ Environment + .whl package (GA)
β”‚
β”œβ”€ "Process millions of rows in a Spark pipeline"
β”‚   β”œβ”€ Simple row logic? ─────────────▢ Standard PySpark UDF (GA)
β”‚   └─ Vectorizable computation? ─────▢ Pandas UDF (GA, 10-100x faster)
β”‚
└─ "Full T-SQL function support (MSTVF, triggers, etc.)"
    └─ ───────────────────────────────▢ Fabric SQL Database (GA, full Azure SQL parity)
πŸ’‘ Practical guidance

If you're starting fresh today and need production reliability, use T-SQL iTVFs for SQL workloads and Pandas UDFs for Spark pipelines. If you need cross-workload sharing and are willing to adopt Preview features, User Data Functions is the strategic direction Microsoft is investing in.

Guidance

Best Practices

Design, test, and govern UDFs for reliability and performance across all Fabric engines.

Performance

Testing

Governance & Naming

Security

Anti-Patterns to Avoid

❌ Scalar UDF in WHERE clause

Calling a scalar UDF in a filter forces row-by-row evaluation. Move the logic into the SELECT or use a CTE with pre-computation.

❌ Python UDF for simple string ops

Using a Python UDF for upper(), trim(), or concat() is 100x slower than native Spark functions. Check pyspark.sql.functions first.

❌ Large closures in Spark UDFs

Capturing large objects (DataFrames, ML models, big dictionaries) in UDF closures causes massive serialization overhead per task. Use broadcast variables instead.

❌ UDF as a service replacement

User Data Functions have a 240s timeout and 30 MB response. Don't use them as a full microservice β€” they're for stateless transformations, not long-running processes.

Resources

Resources

Official documentation and reference material for UDFs in Microsoft Fabric.

πŸ“Œ Preview features evolve rapidly

User Data Functions and Scalar UDFs in Warehouse are actively evolving. Check the official documentation and Fabric What's New page before finalizing production designs that depend on Preview capabilities.