User Defined Functions — Microsoft Fabric Guide

👤 Who is this for?

Data Engineer Application Developer Data Architect DBA — This page covers all flavors of User Defined Functions available in Microsoft Fabric: T-SQL functions in Warehouse and SQL Database, serverless Python User Data Functions, and Spark UDFs in notebooks.

Overview

User Defined Functions in Fabric

Encapsulate reusable logic across warehouses, notebooks, pipelines, and serverless endpoints.

User Defined Functions (UDFs) let you package custom business logic so it can be reused, tested, and governed independently from the queries or pipelines that call it. Microsoft Fabric supports UDFs across three distinct engines — each with different maturity levels, syntax, and use cases.

📐 T-SQL Functions

Inline Table-Valued Functions (GA) and Scalar UDFs (Preview) in Fabric Warehouse and the SQL analytics endpoint. Full T-SQL functions available in Fabric SQL Database.

⚡ User Data Functions

A new first-class Fabric item type — serverless Python functions callable via REST, from notebooks (notebookutils.udf), pipelines, Activator, and GraphQL.

🔥 Spark UDFs

PySpark and Scala UDFs for transforming data within notebooks and Spark jobs. Session-scoped by default, shareable via Environments and libraries.

🎯 Why it matters

UDFs reduce duplication, enforce consistent business rules, simplify testing, and let platform teams provide reusable building blocks to downstream consumers.

⚠️ Maturity varies by engine

Not all UDF types are GA. User Data Functions and Scalar UDFs in Warehouse are currently in Preview. Plan accordingly for production workloads — see the status table below.

Feature Maturity Matrix

UDF Type	Engine	Status	Key Limitation
Inline Table-Valued Functions	Fabric Warehouse / SQL Endpoint	✅ GA	Single SELECT only
Scalar UDFs	Fabric Warehouse / SQL Endpoint	⚠️ Preview	Must be inlineable; max 10 per query
Full T-SQL Functions	Fabric SQL Database	✅ GA	None (Azure SQL DB parity)
User Data Functions	Data Engineering (Python serverless)	⚠️ Preview	Owner-only editing; 240s timeout
notebookutils.udf	Spark Notebooks	⚠️ Preview	Depends on User Data Functions item
Spark UDFs (PySpark/Scala)	Notebooks / Spark Jobs	✅ GA	Session-scoped; no shared registry
Pandas UDFs (vectorized)	Notebooks / Spark Jobs	✅ GA	Apache Arrow serialization required
Multi-Statement TVFs	Fabric Warehouse	❌ Not Supported	Use iTVF or Spark instead

T-SQL

T-SQL Functions in Warehouse

CREATE FUNCTION for Fabric Data Warehouse and the Lakehouse SQL analytics endpoint.

Inline Table-Valued Functions (GA)

Inline TVFs are the most performant and fully supported function type in Fabric Warehouse. They return a table result and consist of a single SELECT statement — the optimizer can inline them directly into the calling query plan.

Use Case: Regional Sales Filter

-- A retail team needs to filter sales by region across multiple reports
-- Instead of repeating the JOIN + WHERE in every query, wrap it in an iTVF

CREATE FUNCTION dbo.GetRegionalSales(@region NVARCHAR(50), @startDate DATE)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
    SELECT 
        o.OrderID,
        o.CustomerID,
        o.TotalAmount,
        o.OrderDate,
        s.StoreName,
        s.Region
    FROM dbo.Orders AS o
    INNER JOIN dbo.Stores AS s ON o.StoreID = s.StoreID
    WHERE s.Region = @region
      AND o.OrderDate >= @startDate
);

-- Usage: analysts can now query regionally without knowing the join logic
SELECT * FROM dbo.GetRegionalSales('West', '2025-01-01')
WHERE TotalAmount > 500
ORDER BY OrderDate DESC;

Scalar UDFs (Preview — Inlineable Only)

Scalar UDFs return a single value. In Fabric Warehouse, they only work when the query optimizer can inline them — meaning the function body is transformed into a subquery at compile time. Non-inlineable scalar UDFs will fail at execution.

Use Case: Dynamic Pricing Calculation

-- E-commerce team applies tiered discounts based on customer loyalty level
-- This logic is reused across order processing, reporting, and forecasting

CREATE FUNCTION dbo.CalculateDiscount(
    @basePrice DECIMAL(12,2),
    @loyaltyTier INT
)
RETURNS DECIMAL(12,2)
AS
BEGIN
    RETURN @basePrice * (
        CASE 
            WHEN @loyaltyTier >= 4 THEN 0.80  -- 20% off for platinum
            WHEN @loyaltyTier = 3  THEN 0.85  -- 15% off for gold
            WHEN @loyaltyTier = 2  THEN 0.90  -- 10% off for silver
            ELSE 1.00                          -- no discount
        END
    );
END;

-- Usage in a report query
SELECT 
    ProductName,
    ListPrice,
    dbo.CalculateDiscount(ListPrice, c.LoyaltyTier) AS FinalPrice
FROM dbo.Products p
INNER JOIN dbo.Customers c ON p.LastBuyerID = c.CustomerID;

💡 Check inlineability

Verify your scalar UDF can be inlined before relying on it in production:

Check if a UDF is inlineable

SELECT 
    b.name AS function_name, 
    a.is_inlineable
FROM sys.sql_modules a
INNER JOIN sys.objects b ON a.object_id = b.object_id
WHERE b.type = 'FN';

-- If is_inlineable = 0, the UDF will FAIL when called on user tables

What Breaks Inlineability

❌ WHILE loops

Any iterative logic makes the function non-inlineable. Use CASE expressions or mathematical formulas instead.

❌ Nondeterministic calls

GETDATE(), RAND(), NEWID() prevent inlining. Pass these as parameters from the calling query.

❌ CTEs and subqueries

Common table expressions inside scalar UDFs break the inlining engine. Keep the body as a single expression.

❌ Table variable access

Any reference to table variables or temp tables in a scalar function blocks inlining.

Key Constraints

Max 10 UDF calls per single query
Max 4 nesting levels for table-referencing UDFs (32 for pure scalar)
ORDER BY, GROUP BY, and CTEs in the calling query may cause failures when invoking scalar UDFs
ENCRYPTION and EXECUTE AS keywords are not supported
Permissions: Workspace Admin, Member, or Contributor roles required to create functions

Fabric SQL Database — Full T-SQL Parity

If you're using SQL Database in Fabric (the OLTP offering, not Warehouse), you get full Azure SQL Database parity — including multi-statement TVFs, non-inlineable scalar functions, and all T-SQL constructs. This is because SQL Database in Fabric runs on the full SQL Server engine.

Use Case: Customer Health Score (Fabric SQL Database)

-- SaaS application calculates customer health score from multiple signals
-- Multi-statement TVF aggregates across tables — only works in Fabric SQL Database

CREATE FUNCTION dbo.GetCustomerHealthScores(@tenantId INT)
RETURNS @results TABLE (
    CustomerID INT,
    HealthScore DECIMAL(5,2),
    RiskLevel NVARCHAR(20),
    LastActivity DATETIME2
)
AS
BEGIN
    INSERT INTO @results
    SELECT 
        c.CustomerID,
        -- Weighted score from engagement, support, and usage signals
        (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) AS HealthScore,
        CASE 
            WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 80 THEN 'Healthy'
            WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 50 THEN 'At Risk'
            ELSE 'Critical'
        END,
        c.LastActivityDate
    FROM dbo.CustomerMetrics c
    WHERE c.TenantID = @tenantId
      AND c.IsActive = 1;
    RETURN;
END;

-- Used by the app's customer success dashboard
SELECT * FROM dbo.GetCustomerHealthScores(42) WHERE RiskLevel = 'Critical';

Python Serverless

User Data Functions

A first-class Fabric item: managed, serverless Python functions accessible across workloads.

⚠️ Preview Feature

User Data Functions are in Preview as of mid-2025 (announced at Ignite 2025). Expect limitations around ownership, regional availability, and service principal support. Not recommended for mission-critical production workloads yet.

What Are User Data Functions?

User Data Functions are a new Fabric item type under Data Engineering that hosts managed Python functions. Unlike Spark UDFs (which are session-scoped), these functions become shareable, versioned, and callable from any Fabric surface — notebooks, pipelines, Activator rules, GraphQL APIs, and external apps via REST.

🌐 REST-accessible

Each function gets an HTTP endpoint, making it callable from any application — Power Apps, Logic Apps, external microservices, or frontend apps.

📦 Shareable across workspaces

Functions are Fabric items — they can be shared, governed, and discovered via the OneLake catalog just like any other artifact.

🔌 Native integrations

Call from notebooks via notebookutils.udf, trigger from Activator rules, chain in data pipelines, or use as GraphQL authorization hooks.

⚙️ Serverless execution

No Spark cluster needed. Functions execute in a lightweight Python runtime (3.11) with a 240-second timeout and up to 30 MB response size.

Programming Model

Use Case: Product Categorization for a Retail Catalog

import fabric.functions as fn

udf = fn.UserDataFunctions()

@udf.function()
def standardize_category(productName: str, rawCategory: str) -> dict:
    """
    Retail team receives product data from multiple suppliers with inconsistent
    categories. This function normalizes them into the company's standard taxonomy.
    Called from data pipelines during ingestion and from Power Apps for manual review.
    """
    category_mapping = {
        "electronics": ["tech", "devices", "gadgets", "computing", "hardware"],
        "clothing": ["apparel", "fashion", "garments", "wear", "textiles"],
        "home": ["furniture", "household", "kitchen", "decor", "garden"],
        "food": ["grocery", "beverages", "snacks", "organic", "fresh"]
    }
    
    raw_lower = rawCategory.lower().strip()
    standardized = next(
        (standard for standard, aliases in category_mapping.items() 
         if raw_lower in aliases),
        "other"
    )
    
    return {
        "product_name": productName,
        "original_category": rawCategory,
        "standardized_category": standardized,
        "confidence": "high" if standardized != "other" else "needs_review"
    }

Use Case: Anomaly Flagging for IoT Sensor Data

import fabric.functions as fn
from datetime import datetime

udf = fn.UserDataFunctions()

@udf.function()
def detect_sensor_anomaly(
    sensorId: str, 
    temperature: float, 
    pressure: float, 
    humidity: float
) -> dict:
    """
    Manufacturing plant monitors equipment via IoT sensors. This function
    is triggered by Fabric Activator when new readings arrive, flagging
    anomalies that need operator attention before equipment damage occurs.
    """
    alerts = []
    severity = "normal"
    
    # Temperature thresholds for industrial equipment
    if temperature > 85.0:
        alerts.append(f"CRITICAL: Temperature {temperature}°C exceeds safe limit (85°C)")
        severity = "critical"
    elif temperature > 70.0:
        alerts.append(f"WARNING: Temperature {temperature}°C approaching limit")
        severity = "warning"
    
    # Pressure differential check
    if pressure < 20.0 or pressure > 150.0:
        alerts.append(f"ALERT: Pressure {pressure} PSI outside normal range (20-150)")
        severity = "critical" if severity != "critical" else severity
    
    # Humidity correlation (high humidity + high temp = corrosion risk)
    if humidity > 80.0 and temperature > 60.0:
        alerts.append("WARNING: High humidity + temperature — corrosion risk")
        severity = max(severity, "warning", key=lambda x: ["normal","warning","critical"].index(x))
    
    return {
        "sensor_id": sensorId,
        "timestamp": datetime.utcnow().isoformat(),
        "severity": severity,
        "alerts": alerts,
        "requires_action": severity in ("warning", "critical")
    }

Calling User Data Functions

From a Notebook (notebookutils.udf)

# Call the function from any Fabric notebook — no Spark cluster needed for the UDF itself
myFunctions = notebookutils.udf.getFunctions("ProductCategorizer")

# Single invocation
result = myFunctions.standardize_category(
    productName="Wireless Earbuds Pro",
    rawCategory="gadgets"
)
print(result)
# {'product_name': 'Wireless Earbuds Pro', 'standardized_category': 'electronics', 'confidence': 'high'}

# Cross-workspace access (e.g., shared library workspace)
shared = notebookutils.udf.getFunctions("ProductCategorizer", "shared-workspace-id")
result = shared.standardize_category(productName="Desk Lamp", rawCategory="decor")

From a Data Pipeline (Functions Activity)

Pipeline Design:
┌─────────────────┐     ┌──────────────────────┐     ┌──────────────────┐
│  Copy Activity  │────▶│  Functions Activity   │────▶│  Write to Lake   │
│  (Raw supplier  │     │  standardize_category │     │  (Curated layer) │
│   catalog CSV)  │     │  for each row         │     │                  │
└─────────────────┘     └──────────────────────┘     └──────────────────┘

The Functions Activity calls your User Data Function as a pipeline step,
allowing you to embed custom logic without spinning up a Spark job.

Connections and Secrets

Use Case: Enrich Data from External API with Secrets

import fabric.functions as fn
import requests

udf = fn.UserDataFunctions()

@udf.function()
@udf.connection(alias="weather-api", argName="apiConnection")
@udf.generic_connection(argName="keyVault", audienceType="KeyVault")
def enrich_with_weather(
    city: str, 
    date: str, 
    apiConnection: fn.ConnectionInfo,
    keyVault: fn.ConnectionInfo
) -> dict:
    """
    Supply chain team enriches delivery forecasts with weather data.
    API key stored in Azure Key Vault, accessed via managed connection.
    """
    api_key = apiConnection.get_secret("api-key")
    response = requests.get(
        f"https://api.weather.example.com/v1/forecast",
        params={"city": city, "date": date},
        headers={"Authorization": f"Bearer {api_key}"}
    )
    data = response.json()
    
    return {
        "city": city,
        "date": date,
        "condition": data.get("condition", "unknown"),
        "risk_level": "high" if data.get("severe_weather") else "normal"
    }

Service Limits

Limit	Value
Max request payload	4 MB
Max execution timeout	240 seconds
Max response size	30 MB
Log retention	30 days
Max private library size (.whl)	28.6 MB
Python version (Runtime)	3.11
Publish cooldown	2 minutes between publishes

Preview Limitations

Owner-only editing: Only the item owner can modify and publish functions
No service principal support: Cannot use managed identity or workspace identity
Fabric-native connections only: External SQL Server and other non-Fabric sources not yet supported via "Manage connections"
Regional limitations: Not available in all Fabric regions
Testing feature: Unavailable in Brazil South, Israel Central, Mexico Central

Spark

Spark UDFs in Notebooks

PySpark, Scala, and Pandas UDFs for in-session data transformations.

Spark UDFs are the traditional way to extend transformation logic in Fabric notebooks. They execute within your Spark session, operate row-by-row (or vectorized with Pandas UDFs), and are well-suited for data engineering pipelines at scale.

Standard PySpark UDF

Use Case: PII Masking in a Data Pipeline

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import hashlib

@udf(returnType=StringType())
def mask_email(email):
    """
    Healthcare company masks patient emails in the bronze-to-silver layer.
    The masked value is deterministic (same input = same hash) so downstream 
    joins still work, but the original email cannot be recovered.
    """
    if email is None:
        return None
    local, domain = email.split("@")
    hashed = hashlib.sha256(local.encode()).hexdigest()[:8]
    return f"{hashed}@{domain}"

# Apply during bronze → silver transformation
df_patients = spark.read.format("delta").load("Tables/bronze_patients")
df_masked = df_patients.withColumn("email", mask_email("email"))
df_masked.write.format("delta").mode("overwrite").save("Tables/silver_patients")

Pandas UDF (Vectorized) — High Performance

Standard Python UDFs process one row at a time with heavy serialization overhead. Pandas UDFs use Apache Arrow to process batches of rows as pandas Series, delivering 10–100× better performance for compute-heavy logic.

Use Case: Real-Time Feature Engineering for ML

import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import DoubleType

@pandas_udf(DoubleType())
def rolling_avg_spend(amounts: pd.Series) -> pd.Series:
    """
    Financial services company calculates rolling average transaction amounts
    for fraud detection features. Pandas UDF makes this 50x faster than 
    row-by-row processing on 100M+ transaction records.
    """
    return amounts.rolling(window=7, min_periods=1).mean()

# Apply to transaction history for ML feature store
df_transactions = spark.read.format("delta").load("Tables/transactions")
df_features = df_transactions.withColumn(
    "rolling_avg_7d", 
    rolling_avg_spend("amount")
)
df_features.write.format("delta").mode("overwrite").save("Tables/ml_features")

Use Case: Text Normalization at Scale

import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import StringType
import re

@pandas_udf(StringType())
def normalize_address(addresses: pd.Series) -> pd.Series:
    """
    Logistics company normalizes millions of shipping addresses to reduce 
    duplicate deliveries. Vectorized processing handles 50M rows in minutes.
    """
    def clean(addr):
        if addr is None:
            return None
        addr = addr.upper().strip()
        addr = re.sub(r'\bSTREET\b', 'ST', addr)
        addr = re.sub(r'\bAVENUE\b', 'AVE', addr)
        addr = re.sub(r'\bBOULEVARD\b', 'BLVD', addr)
        addr = re.sub(r'\bAPARTMENT\b', 'APT', addr)
        addr = re.sub(r'\s+', ' ', addr)
        return addr
    
    return addresses.apply(clean)

df_shipments = spark.read.format("delta").load("Tables/raw_shipments")
df_clean = df_shipments.withColumn("normalized_address", normalize_address("shipping_address"))

Sharing Spark UDFs Across Notebooks

Spark UDFs are session-scoped — they don't persist across notebooks by default. Here are the recommended patterns for reuse:

📦 Fabric Environments

Package UDFs in a Python wheel (.whl) and install via a Fabric Environment. All notebooks attached to that environment can import your functions.

📓 %run magic

Use %run /path/to/shared_udfs to execute a shared notebook that registers UDFs in the current session. Simple but creates a dependency chain.

🔗 notebookutils.udf (Preview)

The new cross-notebook pattern: call User Data Functions items from any notebook via notebookutils.udf.getFunctions(). No Spark cluster needed for the function itself.

🏗️ Workspace Libraries

Upload .whl or .jar files directly to the workspace default environment. Quick mode publishes in ~5 seconds.

Packaging UDFs as a Wheel for Environment Install

# Project structure for a shared UDF library
my_fabric_udfs/
├── setup.py
├── my_udfs/
│   ├── __init__.py
│   ├── masking.py        # mask_email, mask_phone, mask_ssn
│   ├── validation.py     # validate_email, validate_phone
│   └── enrichment.py     # geocode_address, currency_convert
└── tests/
    ├── test_masking.py
    └── test_validation.py

# Build the wheel
python setup.py bdist_wheel

# Upload the .whl to your Fabric Environment → Custom Libraries
# All notebooks using that environment can then:
from my_udfs.masking import mask_email
from my_udfs.validation import validate_email

Scenarios

Real-World Use Cases

End-to-end examples showing when and how to apply each UDF type in practice.

🏥 Healthcare: Patient Data Standardization Pipeline

Scenario

A hospital system receives patient records from 12 different clinics. Each uses different formats for phone numbers, dates, and diagnosis codes. The pipeline must standardize everything before loading into the analytics lakehouse.

Step	UDF Type Used	Why
Phone normalization (50M records)	Pandas UDF	Vectorized processing for raw volume; 100x faster than row-by-row
ICD-10 code validation	User Data Function	Calls external medical coding API; needs secret management; shared across teams
Patient risk scoring (SQL reports)	T-SQL Scalar UDF	Analysts query Warehouse directly; formula embedded in familiar SQL

🏭 Manufacturing: Equipment Predictive Maintenance

Scenario

A factory floor has 500 IoT sensors streaming data every 30 seconds. The team needs real-time anomaly detection plus historical trend analysis in the warehouse for monthly reviews.

Step	UDF Type Used	Why
Real-time anomaly detection	User Data Function + Activator	Activator triggers the UDF on each batch; sub-second response; alerts operator
Rolling averages for ML features	Pandas UDF	Nightly batch over 500M sensor readings needs vectorized performance
Monthly downtime cost calculation	T-SQL iTVF	Analysts run ad-hoc queries in Warehouse; function encapsulates cost formula

🛒 Retail: Omnichannel Order Processing

Scenario

A retailer processes orders from website, mobile app, and in-store POS. Each channel sends data in different formats. Business logic for pricing, tax, and loyalty points must be consistent everywhere.

Step	UDF Type Used	Why
Loyalty discount calculation	T-SQL Scalar UDF (SQL DB) or User Data Function	Same logic used by the app backend (SQL DB) and the analytics pipeline (UDF via REST)
Product category standardization	User Data Function	Called from pipeline during ingestion AND from Power Apps for supplier onboarding
Address normalization (batch)	Pandas UDF	Millions of shipping records normalized nightly for delivery optimization
Tax calculation by jurisdiction	T-SQL iTVF (Fabric SQL Database)	Full MSTVF support in SQL DB; complex multi-state tax rules with table output

💰 Financial Services: Anti-Fraud Pipeline

Scenario

A bank processes 2M card transactions daily and needs sub-minute fraud scoring for real-time alerts, plus batch analysis for model training.

Step	UDF Type Used	Why
Real-time velocity check	User Data Function + Activator	Fires on each transaction event; checks 5 rules in 240ms; flags suspicious activity
Feature engineering (batch)	Pandas UDF	Computes rolling stats, geo-distance, time patterns across 2M daily rows
Account risk summary	T-SQL iTVF	Compliance analysts query Warehouse with account-level risk view; parameterized by date range
Model scoring endpoint	User Data Function	Wraps ML model inference; accessible from pipeline, notebook, or external fraud API

Decision Guide

Which UDF Type Should You Use?

Choose the right function approach based on your workload, team skills, and integration needs.

Requirement	T-SQL iTVF	T-SQL Scalar	User Data Function	Spark UDF	Pandas UDF
Production-ready (GA)	✅	⚠️ Preview	⚠️ Preview	✅	✅
Best for SQL teams	✅	✅	❌	❌	❌
Best for Python teams	❌	❌	✅	✅	✅
Cross-workload sharing	❌ Same DB	❌ Same DB	✅ REST + notebook	❌ Session-scoped	❌ Session-scoped
External API calls	❌	❌	✅ (with connections)	✅ (with libraries)	✅ (with libraries)
High-volume batch (10M+ rows)	✅ (set-based)	⚠️ Max 10/query	❌ (per-call model)	⚠️ Slow	✅ (vectorized)
Real-time event response	❌	❌	✅ (Activator)	❌	❌
No Spark cluster needed	✅	✅	✅	❌	❌
Secret/connection mgmt	❌	❌	✅ Key Vault	Manual	Manual
Git integration / CI/CD	✅ (DDL in Git)	✅ (DDL in Git)	✅ (Fabric Git)	Via Environment	Via Environment

Quick Decision Flowchart

Decision Logic

START: What's your primary need?
│
├─ "Transform data inside a SQL query" 
│   ├─ Returns a table? ──────────────▶ T-SQL Inline TVF (GA)
│   └─ Returns a single value? ───────▶ T-SQL Scalar UDF (Preview, inlineable only)
│
├─ "Share reusable logic across multiple Fabric items"
│   ├─ Need REST endpoint? ───────────▶ User Data Function (Preview)
│   ├─ Need Activator trigger? ───────▶ User Data Function (Preview)
│   └─ Only notebooks? ──────────────▶ Environment + .whl package (GA)
│
├─ "Process millions of rows in a Spark pipeline"
│   ├─ Simple row logic? ─────────────▶ Standard PySpark UDF (GA)
│   └─ Vectorizable computation? ─────▶ Pandas UDF (GA, 10-100x faster)
│
└─ "Full T-SQL function support (MSTVF, triggers, etc.)"
    └─ ───────────────────────────────▶ Fabric SQL Database (GA, full Azure SQL parity)

💡 Practical guidance

If you're starting fresh today and need production reliability, use T-SQL iTVFs for SQL workloads and Pandas UDFs for Spark pipelines. If you need cross-workload sharing and are willing to adopt Preview features, User Data Functions is the strategic direction Microsoft is investing in.

Guidance

Best Practices

Design, test, and govern UDFs for reliability and performance across all Fabric engines.

Performance

Prefer set-based over row-by-row: In SQL, an iTVF that processes data in bulk will always outperform calling a scalar UDF 10M times. In Spark, use Pandas UDFs over standard UDFs for 10–100× improvement.
Keep scalar UDFs simple: Avoid loops, CTEs, and nondeterministic calls. The simpler the body, the more likely it can be inlined by the Warehouse optimizer.
Batch calls to User Data Functions: If processing many items, design the function to accept batches (list input) rather than calling once per row. The 4 MB request payload allows significant batch sizes.
Avoid Python UDFs on large DataFrames when built-in functions exist: Before writing a UDF, check if pyspark.sql.functions already has what you need — built-ins avoid serialization overhead entirely.

Testing

T-SQL functions: Test with known inputs in a dev workspace before promoting. Use Fabric deployment pipelines (Dev → Test → Prod) to ensure consistency.
User Data Functions: Use the built-in Test pane in the Fabric portal or VS Code extension. Design functions to be pure (deterministic output for given input) for easy unit testing.
Spark UDFs: Package UDFs in a wheel with pytest tests. Run tests locally before uploading to the Fabric Environment. Test with edge cases: nulls, empty strings, Unicode, extreme values.

Governance & Naming

Naming conventions: Use clear, descriptive names — dbo.CalculateShippingCost not dbo.fn1. For User Data Functions, the item name is the entry point, so make it self-documenting.
Document parameters: Add docstrings (Python) or comments (T-SQL) explaining what each parameter represents, valid ranges, and expected output format.
Version control: T-SQL functions should be in Git via Fabric Git integration. Python UDFs belong in wheel packages with semantic versioning. User Data Functions support Fabric Git and deployment pipelines.
Centralize shared functions: Create a dedicated "Shared Functions" workspace for organization-wide UDFs. Grant read access broadly, write access narrowly.

Security

Principle of least privilege: Only Workspace Admin/Member/Contributor should create T-SQL functions. Use workspace roles to control who can publish UDFs.
Never hardcode secrets: Use Azure Key Vault connections in User Data Functions. In Spark, use notebookutils.credentials or linked services.
Validate inputs: Treat all UDF parameters as untrusted — validate types, ranges, and lengths before processing to prevent injection or unexpected behavior.

Anti-Patterns to Avoid

❌ Scalar UDF in WHERE clause

Calling a scalar UDF in a filter forces row-by-row evaluation. Move the logic into the SELECT or use a CTE with pre-computation.

❌ Python UDF for simple string ops

Using a Python UDF for upper(), trim(), or concat() is 100x slower than native Spark functions. Check pyspark.sql.functions first.

❌ Large closures in Spark UDFs

Capturing large objects (DataFrames, ML models, big dictionaries) in UDF closures causes massive serialization overhead per task. Use broadcast variables instead.

❌ UDF as a service replacement

User Data Functions have a 240s timeout and 30 MB response. Don't use them as a full microservice — they're for stateless transformations, not long-running processes.

Resources

Official documentation and reference material for UDFs in Microsoft Fabric.

📌 Preview features evolve rapidly

User Data Functions and Scalar UDFs in Warehouse are actively evolving. Check the official documentation and Fabric What's New page before finalizing production designs that depend on Preview capabilities.

User Defined Functions in Fabric

📐 T-SQL Functions

⚡ User Data Functions

🔥 Spark UDFs

🎯 Why it matters

Feature Maturity Matrix

T-SQL Functions in Warehouse

Inline Table-Valued Functions (GA)

Scalar UDFs (Preview — Inlineable Only)

What Breaks Inlineability

❌ WHILE loops

❌ Nondeterministic calls

❌ CTEs and subqueries

❌ Table variable access

Key Constraints

Fabric SQL Database — Full T-SQL Parity

User Data Functions

What Are User Data Functions?

🌐 REST-accessible

📦 Shareable across workspaces

🔌 Native integrations

⚙️ Serverless execution

Programming Model

Calling User Data Functions

Connections and Secrets

Service Limits

Preview Limitations

Spark UDFs in Notebooks

Standard PySpark UDF

Pandas UDF (Vectorized) — High Performance

Sharing Spark UDFs Across Notebooks

📦 Fabric Environments

📓 %run magic

🔗 notebookutils.udf (Preview)

🏗️ Workspace Libraries

Real-World Use Cases

🏥 Healthcare: Patient Data Standardization Pipeline

🏭 Manufacturing: Equipment Predictive Maintenance

🛒 Retail: Omnichannel Order Processing

💰 Financial Services: Anti-Fraud Pipeline

Which UDF Type Should You Use?

Quick Decision Flowchart

Best Practices

Performance

Testing

Governance & Naming

Security

Anti-Patterns to Avoid

❌ Scalar UDF in WHERE clause

❌ Python UDF for simple string ops

❌ Large closures in Spark UDFs

❌ UDF as a service replacement

Resources

📚 T-SQL Functions

⚡ User Data Functions

🔥 Spark UDFs

📎 Related Pages in This Guide