Data Engineer Application Developer Data Architect DBA β This page covers all flavors of User Defined Functions available in Microsoft Fabric: T-SQL functions in Warehouse and SQL Database, serverless Python User Data Functions, and Spark UDFs in notebooks.
User Defined Functions in Fabric
Encapsulate reusable logic across warehouses, notebooks, pipelines, and serverless endpoints.
User Defined Functions (UDFs) let you package custom business logic so it can be reused, tested, and governed independently from the queries or pipelines that call it. Microsoft Fabric supports UDFs across three distinct engines β each with different maturity levels, syntax, and use cases.
π T-SQL Functions
Inline Table-Valued Functions (GA) and Scalar UDFs (Preview) in Fabric Warehouse and the SQL analytics endpoint. Full T-SQL functions available in Fabric SQL Database.
β‘ User Data Functions
A new first-class Fabric item type β serverless Python functions callable via REST, from notebooks (notebookutils.udf), pipelines, Activator, and GraphQL.
π₯ Spark UDFs
PySpark and Scala UDFs for transforming data within notebooks and Spark jobs. Session-scoped by default, shareable via Environments and libraries.
π― Why it matters
UDFs reduce duplication, enforce consistent business rules, simplify testing, and let platform teams provide reusable building blocks to downstream consumers.
Not all UDF types are GA. User Data Functions and Scalar UDFs in Warehouse are currently in Preview. Plan accordingly for production workloads β see the status table below.
Feature Maturity Matrix
| UDF Type | Engine | Status | Key Limitation |
|---|---|---|---|
| Inline Table-Valued Functions | Fabric Warehouse / SQL Endpoint | β GA | Single SELECT only |
| Scalar UDFs | Fabric Warehouse / SQL Endpoint | β οΈ Preview | Must be inlineable; max 10 per query |
| Full T-SQL Functions | Fabric SQL Database | β GA | None (Azure SQL DB parity) |
| User Data Functions | Data Engineering (Python serverless) | β οΈ Preview | Owner-only editing; 240s timeout |
| notebookutils.udf | Spark Notebooks | β οΈ Preview | Depends on User Data Functions item |
| Spark UDFs (PySpark/Scala) | Notebooks / Spark Jobs | β GA | Session-scoped; no shared registry |
| Pandas UDFs (vectorized) | Notebooks / Spark Jobs | β GA | Apache Arrow serialization required |
| Multi-Statement TVFs | Fabric Warehouse | β Not Supported | Use iTVF or Spark instead |
T-SQL Functions in Warehouse
CREATE FUNCTION for Fabric Data Warehouse and the Lakehouse SQL analytics endpoint.
Inline Table-Valued Functions (GA)
Inline TVFs are the most performant and fully supported function type in Fabric Warehouse. They return a table result and consist of a single SELECT statement β the optimizer can inline them directly into the calling query plan.
-- A retail team needs to filter sales by region across multiple reports
-- Instead of repeating the JOIN + WHERE in every query, wrap it in an iTVF
CREATE FUNCTION dbo.GetRegionalSales(@region NVARCHAR(50), @startDate DATE)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
SELECT
o.OrderID,
o.CustomerID,
o.TotalAmount,
o.OrderDate,
s.StoreName,
s.Region
FROM dbo.Orders AS o
INNER JOIN dbo.Stores AS s ON o.StoreID = s.StoreID
WHERE s.Region = @region
AND o.OrderDate >= @startDate
);
-- Usage: analysts can now query regionally without knowing the join logic
SELECT * FROM dbo.GetRegionalSales('West', '2025-01-01')
WHERE TotalAmount > 500
ORDER BY OrderDate DESC;
Scalar UDFs (Preview β Inlineable Only)
Scalar UDFs return a single value. In Fabric Warehouse, they only work when the query optimizer can inline them β meaning the function body is transformed into a subquery at compile time. Non-inlineable scalar UDFs will fail at execution.
-- E-commerce team applies tiered discounts based on customer loyalty level
-- This logic is reused across order processing, reporting, and forecasting
CREATE FUNCTION dbo.CalculateDiscount(
@basePrice DECIMAL(12,2),
@loyaltyTier INT
)
RETURNS DECIMAL(12,2)
AS
BEGIN
RETURN @basePrice * (
CASE
WHEN @loyaltyTier >= 4 THEN 0.80 -- 20% off for platinum
WHEN @loyaltyTier = 3 THEN 0.85 -- 15% off for gold
WHEN @loyaltyTier = 2 THEN 0.90 -- 10% off for silver
ELSE 1.00 -- no discount
END
);
END;
-- Usage in a report query
SELECT
ProductName,
ListPrice,
dbo.CalculateDiscount(ListPrice, c.LoyaltyTier) AS FinalPrice
FROM dbo.Products p
INNER JOIN dbo.Customers c ON p.LastBuyerID = c.CustomerID;
Verify your scalar UDF can be inlined before relying on it in production:
SELECT
b.name AS function_name,
a.is_inlineable
FROM sys.sql_modules a
INNER JOIN sys.objects b ON a.object_id = b.object_id
WHERE b.type = 'FN';
-- If is_inlineable = 0, the UDF will FAIL when called on user tables
What Breaks Inlineability
β WHILE loops
Any iterative logic makes the function non-inlineable. Use CASE expressions or mathematical formulas instead.
β Nondeterministic calls
GETDATE(), RAND(), NEWID() prevent inlining. Pass these as parameters from the calling query.
β CTEs and subqueries
Common table expressions inside scalar UDFs break the inlining engine. Keep the body as a single expression.
β Table variable access
Any reference to table variables or temp tables in a scalar function blocks inlining.
Key Constraints
- Max 10 UDF calls per single query
- Max 4 nesting levels for table-referencing UDFs (32 for pure scalar)
ORDER BY,GROUP BY, and CTEs in the calling query may cause failures when invoking scalar UDFsENCRYPTIONandEXECUTE ASkeywords are not supported- Permissions: Workspace Admin, Member, or Contributor roles required to create functions
Fabric SQL Database β Full T-SQL Parity
If you're using SQL Database in Fabric (the OLTP offering, not Warehouse), you get full Azure SQL Database parity β including multi-statement TVFs, non-inlineable scalar functions, and all T-SQL constructs. This is because SQL Database in Fabric runs on the full SQL Server engine.
-- SaaS application calculates customer health score from multiple signals
-- Multi-statement TVF aggregates across tables β only works in Fabric SQL Database
CREATE FUNCTION dbo.GetCustomerHealthScores(@tenantId INT)
RETURNS @results TABLE (
CustomerID INT,
HealthScore DECIMAL(5,2),
RiskLevel NVARCHAR(20),
LastActivity DATETIME2
)
AS
BEGIN
INSERT INTO @results
SELECT
c.CustomerID,
-- Weighted score from engagement, support, and usage signals
(c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) AS HealthScore,
CASE
WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 80 THEN 'Healthy'
WHEN (c.EngagementScore * 0.4) + (c.NPS * 0.3) + (c.UsageIndex * 0.3) > 50 THEN 'At Risk'
ELSE 'Critical'
END,
c.LastActivityDate
FROM dbo.CustomerMetrics c
WHERE c.TenantID = @tenantId
AND c.IsActive = 1;
RETURN;
END;
-- Used by the app's customer success dashboard
SELECT * FROM dbo.GetCustomerHealthScores(42) WHERE RiskLevel = 'Critical';
User Data Functions
A first-class Fabric item: managed, serverless Python functions accessible across workloads.
User Data Functions are in Preview as of mid-2025 (announced at Ignite 2025). Expect limitations around ownership, regional availability, and service principal support. Not recommended for mission-critical production workloads yet.
What Are User Data Functions?
User Data Functions are a new Fabric item type under Data Engineering that hosts managed Python functions. Unlike Spark UDFs (which are session-scoped), these functions become shareable, versioned, and callable from any Fabric surface β notebooks, pipelines, Activator rules, GraphQL APIs, and external apps via REST.
π REST-accessible
Each function gets an HTTP endpoint, making it callable from any application β Power Apps, Logic Apps, external microservices, or frontend apps.
π¦ Shareable across workspaces
Functions are Fabric items β they can be shared, governed, and discovered via the OneLake catalog just like any other artifact.
π Native integrations
Call from notebooks via notebookutils.udf, trigger from Activator rules, chain in data pipelines, or use as GraphQL authorization hooks.
βοΈ Serverless execution
No Spark cluster needed. Functions execute in a lightweight Python runtime (3.11) with a 240-second timeout and up to 30 MB response size.
Programming Model
import fabric.functions as fn
udf = fn.UserDataFunctions()
@udf.function()
def standardize_category(productName: str, rawCategory: str) -> dict:
"""
Retail team receives product data from multiple suppliers with inconsistent
categories. This function normalizes them into the company's standard taxonomy.
Called from data pipelines during ingestion and from Power Apps for manual review.
"""
category_mapping = {
"electronics": ["tech", "devices", "gadgets", "computing", "hardware"],
"clothing": ["apparel", "fashion", "garments", "wear", "textiles"],
"home": ["furniture", "household", "kitchen", "decor", "garden"],
"food": ["grocery", "beverages", "snacks", "organic", "fresh"]
}
raw_lower = rawCategory.lower().strip()
standardized = next(
(standard for standard, aliases in category_mapping.items()
if raw_lower in aliases),
"other"
)
return {
"product_name": productName,
"original_category": rawCategory,
"standardized_category": standardized,
"confidence": "high" if standardized != "other" else "needs_review"
}
import fabric.functions as fn
from datetime import datetime
udf = fn.UserDataFunctions()
@udf.function()
def detect_sensor_anomaly(
sensorId: str,
temperature: float,
pressure: float,
humidity: float
) -> dict:
"""
Manufacturing plant monitors equipment via IoT sensors. This function
is triggered by Fabric Activator when new readings arrive, flagging
anomalies that need operator attention before equipment damage occurs.
"""
alerts = []
severity = "normal"
# Temperature thresholds for industrial equipment
if temperature > 85.0:
alerts.append(f"CRITICAL: Temperature {temperature}Β°C exceeds safe limit (85Β°C)")
severity = "critical"
elif temperature > 70.0:
alerts.append(f"WARNING: Temperature {temperature}Β°C approaching limit")
severity = "warning"
# Pressure differential check
if pressure < 20.0 or pressure > 150.0:
alerts.append(f"ALERT: Pressure {pressure} PSI outside normal range (20-150)")
severity = "critical" if severity != "critical" else severity
# Humidity correlation (high humidity + high temp = corrosion risk)
if humidity > 80.0 and temperature > 60.0:
alerts.append("WARNING: High humidity + temperature β corrosion risk")
severity = max(severity, "warning", key=lambda x: ["normal","warning","critical"].index(x))
return {
"sensor_id": sensorId,
"timestamp": datetime.utcnow().isoformat(),
"severity": severity,
"alerts": alerts,
"requires_action": severity in ("warning", "critical")
}
Calling User Data Functions
# Call the function from any Fabric notebook β no Spark cluster needed for the UDF itself
myFunctions = notebookutils.udf.getFunctions("ProductCategorizer")
# Single invocation
result = myFunctions.standardize_category(
productName="Wireless Earbuds Pro",
rawCategory="gadgets"
)
print(result)
# {'product_name': 'Wireless Earbuds Pro', 'standardized_category': 'electronics', 'confidence': 'high'}
# Cross-workspace access (e.g., shared library workspace)
shared = notebookutils.udf.getFunctions("ProductCategorizer", "shared-workspace-id")
result = shared.standardize_category(productName="Desk Lamp", rawCategory="decor")
Pipeline Design: βββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββ β Copy Activity ββββββΆβ Functions Activity ββββββΆβ Write to Lake β β (Raw supplier β β standardize_category β β (Curated layer) β β catalog CSV) β β for each row β β β βββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββ The Functions Activity calls your User Data Function as a pipeline step, allowing you to embed custom logic without spinning up a Spark job.
Connections and Secrets
import fabric.functions as fn
import requests
udf = fn.UserDataFunctions()
@udf.function()
@udf.connection(alias="weather-api", argName="apiConnection")
@udf.generic_connection(argName="keyVault", audienceType="KeyVault")
def enrich_with_weather(
city: str,
date: str,
apiConnection: fn.ConnectionInfo,
keyVault: fn.ConnectionInfo
) -> dict:
"""
Supply chain team enriches delivery forecasts with weather data.
API key stored in Azure Key Vault, accessed via managed connection.
"""
api_key = apiConnection.get_secret("api-key")
response = requests.get(
f"https://api.weather.example.com/v1/forecast",
params={"city": city, "date": date},
headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()
return {
"city": city,
"date": date,
"condition": data.get("condition", "unknown"),
"risk_level": "high" if data.get("severe_weather") else "normal"
}
Service Limits
| Limit | Value |
|---|---|
| Max request payload | 4 MB |
| Max execution timeout | 240 seconds |
| Max response size | 30 MB |
| Log retention | 30 days |
| Max private library size (.whl) | 28.6 MB |
| Python version (Runtime) | 3.11 |
| Publish cooldown | 2 minutes between publishes |
Preview Limitations
- Owner-only editing: Only the item owner can modify and publish functions
- No service principal support: Cannot use managed identity or workspace identity
- Fabric-native connections only: External SQL Server and other non-Fabric sources not yet supported via "Manage connections"
- Regional limitations: Not available in all Fabric regions
- Testing feature: Unavailable in Brazil South, Israel Central, Mexico Central
Spark UDFs in Notebooks
PySpark, Scala, and Pandas UDFs for in-session data transformations.
Spark UDFs are the traditional way to extend transformation logic in Fabric notebooks. They execute within your Spark session, operate row-by-row (or vectorized with Pandas UDFs), and are well-suited for data engineering pipelines at scale.
Standard PySpark UDF
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import hashlib
@udf(returnType=StringType())
def mask_email(email):
"""
Healthcare company masks patient emails in the bronze-to-silver layer.
The masked value is deterministic (same input = same hash) so downstream
joins still work, but the original email cannot be recovered.
"""
if email is None:
return None
local, domain = email.split("@")
hashed = hashlib.sha256(local.encode()).hexdigest()[:8]
return f"{hashed}@{domain}"
# Apply during bronze β silver transformation
df_patients = spark.read.format("delta").load("Tables/bronze_patients")
df_masked = df_patients.withColumn("email", mask_email("email"))
df_masked.write.format("delta").mode("overwrite").save("Tables/silver_patients")
Pandas UDF (Vectorized) β High Performance
Standard Python UDFs process one row at a time with heavy serialization overhead. Pandas UDFs use Apache Arrow to process batches of rows as pandas Series, delivering 10β100Γ better performance for compute-heavy logic.
import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import DoubleType
@pandas_udf(DoubleType())
def rolling_avg_spend(amounts: pd.Series) -> pd.Series:
"""
Financial services company calculates rolling average transaction amounts
for fraud detection features. Pandas UDF makes this 50x faster than
row-by-row processing on 100M+ transaction records.
"""
return amounts.rolling(window=7, min_periods=1).mean()
# Apply to transaction history for ML feature store
df_transactions = spark.read.format("delta").load("Tables/transactions")
df_features = df_transactions.withColumn(
"rolling_avg_7d",
rolling_avg_spend("amount")
)
df_features.write.format("delta").mode("overwrite").save("Tables/ml_features")
import pandas as pd
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import StringType
import re
@pandas_udf(StringType())
def normalize_address(addresses: pd.Series) -> pd.Series:
"""
Logistics company normalizes millions of shipping addresses to reduce
duplicate deliveries. Vectorized processing handles 50M rows in minutes.
"""
def clean(addr):
if addr is None:
return None
addr = addr.upper().strip()
addr = re.sub(r'\bSTREET\b', 'ST', addr)
addr = re.sub(r'\bAVENUE\b', 'AVE', addr)
addr = re.sub(r'\bBOULEVARD\b', 'BLVD', addr)
addr = re.sub(r'\bAPARTMENT\b', 'APT', addr)
addr = re.sub(r'\s+', ' ', addr)
return addr
return addresses.apply(clean)
df_shipments = spark.read.format("delta").load("Tables/raw_shipments")
df_clean = df_shipments.withColumn("normalized_address", normalize_address("shipping_address"))
Sharing Spark UDFs Across Notebooks
Spark UDFs are session-scoped β they don't persist across notebooks by default. Here are the recommended patterns for reuse:
π¦ Fabric Environments
Package UDFs in a Python wheel (.whl) and install via a Fabric Environment. All notebooks attached to that environment can import your functions.
π %run magic
Use %run /path/to/shared_udfs to execute a shared notebook that registers UDFs in the current session. Simple but creates a dependency chain.
π notebookutils.udf (Preview)
The new cross-notebook pattern: call User Data Functions items from any notebook via notebookutils.udf.getFunctions(). No Spark cluster needed for the function itself.
ποΈ Workspace Libraries
Upload .whl or .jar files directly to the workspace default environment. Quick mode publishes in ~5 seconds.
# Project structure for a shared UDF library
my_fabric_udfs/
βββ setup.py
βββ my_udfs/
β βββ __init__.py
β βββ masking.py # mask_email, mask_phone, mask_ssn
β βββ validation.py # validate_email, validate_phone
β βββ enrichment.py # geocode_address, currency_convert
βββ tests/
βββ test_masking.py
βββ test_validation.py
# Build the wheel
python setup.py bdist_wheel
# Upload the .whl to your Fabric Environment β Custom Libraries
# All notebooks using that environment can then:
from my_udfs.masking import mask_email
from my_udfs.validation import validate_email
Real-World Use Cases
End-to-end examples showing when and how to apply each UDF type in practice.
π₯ Healthcare: Patient Data Standardization Pipeline
A hospital system receives patient records from 12 different clinics. Each uses different formats for phone numbers, dates, and diagnosis codes. The pipeline must standardize everything before loading into the analytics lakehouse.
| Step | UDF Type Used | Why |
|---|---|---|
| Phone normalization (50M records) | Pandas UDF | Vectorized processing for raw volume; 100x faster than row-by-row |
| ICD-10 code validation | User Data Function | Calls external medical coding API; needs secret management; shared across teams |
| Patient risk scoring (SQL reports) | T-SQL Scalar UDF | Analysts query Warehouse directly; formula embedded in familiar SQL |
π Manufacturing: Equipment Predictive Maintenance
A factory floor has 500 IoT sensors streaming data every 30 seconds. The team needs real-time anomaly detection plus historical trend analysis in the warehouse for monthly reviews.
| Step | UDF Type Used | Why |
|---|---|---|
| Real-time anomaly detection | User Data Function + Activator | Activator triggers the UDF on each batch; sub-second response; alerts operator |
| Rolling averages for ML features | Pandas UDF | Nightly batch over 500M sensor readings needs vectorized performance |
| Monthly downtime cost calculation | T-SQL iTVF | Analysts run ad-hoc queries in Warehouse; function encapsulates cost formula |
π Retail: Omnichannel Order Processing
A retailer processes orders from website, mobile app, and in-store POS. Each channel sends data in different formats. Business logic for pricing, tax, and loyalty points must be consistent everywhere.
| Step | UDF Type Used | Why |
|---|---|---|
| Loyalty discount calculation | T-SQL Scalar UDF (SQL DB) or User Data Function | Same logic used by the app backend (SQL DB) and the analytics pipeline (UDF via REST) |
| Product category standardization | User Data Function | Called from pipeline during ingestion AND from Power Apps for supplier onboarding |
| Address normalization (batch) | Pandas UDF | Millions of shipping records normalized nightly for delivery optimization |
| Tax calculation by jurisdiction | T-SQL iTVF (Fabric SQL Database) | Full MSTVF support in SQL DB; complex multi-state tax rules with table output |
π° Financial Services: Anti-Fraud Pipeline
A bank processes 2M card transactions daily and needs sub-minute fraud scoring for real-time alerts, plus batch analysis for model training.
| Step | UDF Type Used | Why |
|---|---|---|
| Real-time velocity check | User Data Function + Activator | Fires on each transaction event; checks 5 rules in 240ms; flags suspicious activity |
| Feature engineering (batch) | Pandas UDF | Computes rolling stats, geo-distance, time patterns across 2M daily rows |
| Account risk summary | T-SQL iTVF | Compliance analysts query Warehouse with account-level risk view; parameterized by date range |
| Model scoring endpoint | User Data Function | Wraps ML model inference; accessible from pipeline, notebook, or external fraud API |
Which UDF Type Should You Use?
Choose the right function approach based on your workload, team skills, and integration needs.
| Requirement | T-SQL iTVF | T-SQL Scalar | User Data Function | Spark UDF | Pandas UDF |
|---|---|---|---|---|---|
| Production-ready (GA) | β | β οΈ Preview | β οΈ Preview | β | β |
| Best for SQL teams | β | β | β | β | β |
| Best for Python teams | β | β | β | β | β |
| Cross-workload sharing | β Same DB | β Same DB | β REST + notebook | β Session-scoped | β Session-scoped |
| External API calls | β | β | β (with connections) | β (with libraries) | β (with libraries) |
| High-volume batch (10M+ rows) | β (set-based) | β οΈ Max 10/query | β (per-call model) | β οΈ Slow | β (vectorized) |
| Real-time event response | β | β | β (Activator) | β | β |
| No Spark cluster needed | β | β | β | β | β |
| Secret/connection mgmt | β | β | β Key Vault | Manual | Manual |
| Git integration / CI/CD | β (DDL in Git) | β (DDL in Git) | β (Fabric Git) | Via Environment | Via Environment |
Quick Decision Flowchart
START: What's your primary need?
β
ββ "Transform data inside a SQL query"
β ββ Returns a table? βββββββββββββββΆ T-SQL Inline TVF (GA)
β ββ Returns a single value? ββββββββΆ T-SQL Scalar UDF (Preview, inlineable only)
β
ββ "Share reusable logic across multiple Fabric items"
β ββ Need REST endpoint? ββββββββββββΆ User Data Function (Preview)
β ββ Need Activator trigger? ββββββββΆ User Data Function (Preview)
β ββ Only notebooks? βββββββββββββββΆ Environment + .whl package (GA)
β
ββ "Process millions of rows in a Spark pipeline"
β ββ Simple row logic? ββββββββββββββΆ Standard PySpark UDF (GA)
β ββ Vectorizable computation? ββββββΆ Pandas UDF (GA, 10-100x faster)
β
ββ "Full T-SQL function support (MSTVF, triggers, etc.)"
ββ ββββββββββββββββββββββββββββββββΆ Fabric SQL Database (GA, full Azure SQL parity)
If you're starting fresh today and need production reliability, use T-SQL iTVFs for SQL workloads and Pandas UDFs for Spark pipelines. If you need cross-workload sharing and are willing to adopt Preview features, User Data Functions is the strategic direction Microsoft is investing in.
Best Practices
Design, test, and govern UDFs for reliability and performance across all Fabric engines.
Performance
- Prefer set-based over row-by-row: In SQL, an iTVF that processes data in bulk will always outperform calling a scalar UDF 10M times. In Spark, use Pandas UDFs over standard UDFs for 10β100Γ improvement.
- Keep scalar UDFs simple: Avoid loops, CTEs, and nondeterministic calls. The simpler the body, the more likely it can be inlined by the Warehouse optimizer.
- Batch calls to User Data Functions: If processing many items, design the function to accept batches (list input) rather than calling once per row. The 4 MB request payload allows significant batch sizes.
- Avoid Python UDFs on large DataFrames when built-in functions exist: Before writing a UDF, check if
pyspark.sql.functionsalready has what you need β built-ins avoid serialization overhead entirely.
Testing
- T-SQL functions: Test with known inputs in a dev workspace before promoting. Use Fabric deployment pipelines (Dev β Test β Prod) to ensure consistency.
- User Data Functions: Use the built-in Test pane in the Fabric portal or VS Code extension. Design functions to be pure (deterministic output for given input) for easy unit testing.
- Spark UDFs: Package UDFs in a wheel with
pytesttests. Run tests locally before uploading to the Fabric Environment. Test with edge cases: nulls, empty strings, Unicode, extreme values.
Governance & Naming
- Naming conventions: Use clear, descriptive names β
dbo.CalculateShippingCostnotdbo.fn1. For User Data Functions, the item name is the entry point, so make it self-documenting. - Document parameters: Add docstrings (Python) or comments (T-SQL) explaining what each parameter represents, valid ranges, and expected output format.
- Version control: T-SQL functions should be in Git via Fabric Git integration. Python UDFs belong in wheel packages with semantic versioning. User Data Functions support Fabric Git and deployment pipelines.
- Centralize shared functions: Create a dedicated "Shared Functions" workspace for organization-wide UDFs. Grant read access broadly, write access narrowly.
Security
- Principle of least privilege: Only Workspace Admin/Member/Contributor should create T-SQL functions. Use workspace roles to control who can publish UDFs.
- Never hardcode secrets: Use Azure Key Vault connections in User Data Functions. In Spark, use
notebookutils.credentialsor linked services. - Validate inputs: Treat all UDF parameters as untrusted β validate types, ranges, and lengths before processing to prevent injection or unexpected behavior.
Anti-Patterns to Avoid
β Scalar UDF in WHERE clause
Calling a scalar UDF in a filter forces row-by-row evaluation. Move the logic into the SELECT or use a CTE with pre-computation.
β Python UDF for simple string ops
Using a Python UDF for upper(), trim(), or concat() is 100x slower than native Spark functions. Check pyspark.sql.functions first.
β Large closures in Spark UDFs
Capturing large objects (DataFrames, ML models, big dictionaries) in UDF closures causes massive serialization overhead per task. Use broadcast variables instead.
β UDF as a service replacement
User Data Functions have a 240s timeout and 30 MB response. Don't use them as a full microservice β they're for stateless transformations, not long-running processes.
Resources
Official documentation and reference material for UDFs in Microsoft Fabric.
π T-SQL Functions
CREATE FUNCTION for Fabric Data Warehouse β T-SQL Surface Area in Fabric β Blog: Scalar UDFs in Warehouse (Preview) ββ‘ User Data Functions
User Data Functions Overview β Python Programming Model β Create UDFs in Portal β Create UDFs in VS Code β notebookutils.udf Reference β Blog: User Data Functions β Ignite 2025 βπ₯ Spark UDFs
Library Management in Fabric β Pandas UDF API Reference (Apache Spark) βπ Related Pages in This Guide
Fabric Databases β full T-SQL function support β Notebook & Spark Optimization β Data Pipelines β orchestration patterns β AI & Copilot β Copilot for SQL and Notebooks βUser Data Functions and Scalar UDFs in Warehouse are actively evolving. Check the official documentation and Fabric What's New page before finalizing production designs that depend on Preview capabilities.