Keeping DRY in SQL Server With Scalar User‑Defined Functions

No one likes to keep repeating themselves. As software engineers, we're often encouraged to follow the DRY principle: Don't Repeat Yourself. We look to refactor repeated blocks of code out into reusable units that encapsulate that logic, and then call that from everywhere it's needed.

This offers a number of benefits. If we need to change that logic in the future, we now have just one place to update and everywhere it's used will automatically benefit. Only needing to make the change in one place not only saves development time, but reduces risk (with duplication, we could miss a place in code to update as well as increasing the surface area for the number of places we need to go in and change). So, it's no wonder that DRY principles form a staple part of code reviews.

But Beware! 

When it comes to the world of SQL Server database development, that instinct to not repeat yourself could end up biting you if you're not careful.  

User defined functions (UDFs) were first introduced in SQL Server 2000 and are a natural concept to non-database developers—a unit of code that encapsulates some logic, and returns some output.  There are three core types of user defined functions in SQL Server: 

  • Scalar—Returns a single value 
  • Multi-Statement Table Valued—Returns a tabular result-set of data, where the body of the function can contain multiple statements 
  • Inline Table Valued—Returns a tabular result-set of data, where the body of function contains a single SELECT statement that directly returns the result-set 

In this post, we'll going to focus on the first of those three core types—the scalar UDF.  

Let's run through what might seem like an obvious approach to encapsulate some logic in SQL Server using the  WorldWideImporters sample OLTP database.  

Important : this applies when running the database at a compatibility level below 150 (SQL Server 2019), as will become clearer in the conclusion. The examples here were run on SQL Server 2019 with the database set to compatibility level SQL Server 2017 (140).

Scenario 

Take the following query that calculates the total value of a particular order:

SELECT SUM(Quantity * UnitPrice) AS OrderTotal  
FROM Sales.OrderLines ol  
WHERE ol.OrderID = 1 

You might decide to encapsulate this into a scalar UDF:

CREATE FUNCTION Sales.CalculateOrderLineTotal ( 
@OrderId INTEGER 
) 
RETURNS DECIMAL(18,2) 
AS 
BEGIN 
DECLARE @Total DECIMAL(18,2) 
 
SELECT @Total = SUM(Quantity * UnitPrice) FROM Sales.OrderLines WHERE OrderId = @OrderId 
 
RETURN @Total 
END 

And then call it like this:

SELECT Sales.CalculateOrderLineTotal(1)

We can now reuse this function anywhere we need to retrieve the value of an order. Seems useful, right?  

There Be Dragons 

Suppose you have the following query which is used as part of an export process to extract all orders from the system:

SELECT o.OrderId, o.OrderDate 
FROM Sales.Orders o 
ORDER BY o.OrderDate ASC

You have a requirement to change it to also include the value of each order. Great—you have a function for that! So, you slip in a call to the scalar UDF like so: 

SELECT o.OrderId, o.OrderDate, Sales.CalculateOrderLineTotal(o.OrderId) AS OrderTotal 
FROM Sales.Orders o 
ORDER BY o.OrderDate ASC

Job done….? 

Well, no. You soon hear back that the export process is performing poorly after this change. Let's take a look at the impact that change has had on the resources needed to fulfil the query and return the 73,595 results.  

Using Plan Explorer, we get the following execution details:

  (A) Original Query (without order total) (B) Query with scalar UDF
Duration (ms) 51 112,913
CPU 108 104,608
Reads 725 12,564,915

Ouch. There's that bite I was saying about. In fact, less of a bite and more of a three-course meal. Huge increases in execution time, CPU utilization and reads. You might ask, "What if that's just the cost of getting that extra Order Total value returned with each row?" Let's check—here is the same query, except we've pulled the query from the scalar UDF out as a correlated subquery into the main query:

SELECT o.OrderId, o.OrderDate,  
(SELECT SUM(Quantity * UnitPrice) FROM Sales.OrderLines ol WHERE ol.OrderId = o.OrderId) AS OrderTotal 
FROM Sales.Orders o 
ORDER BY o.OrderDate

Again running through Plan Explorer to grab the stats, this time we see (C): 

  (A) Original Query (without order total) (B) Query with scalar UDF (C) Query with correlated subquery
Duration (ms) 51 112,913 153
CPU 108 104,608 144
Reads 725 12,564,915 1,093

What a difference! Slightly higher than the original query (A), but that's to be expected given we've introduced a lookup on another table to return the extra data.  

But, they're doing the same thing to get the Order Total aren't they? Let's check the execution plans.

Scalar UDF approach [B]: 

Image1_ScalarUDF_Query_PreCompat150

Plan Explorer shows us we have some warnings. If we hover over the Compute Scalar operation, we see the following warning about the use of the UDF:

Image2_ScalarUDF_Query_ComputeScalar

Inlined correlated subquery approach [C]: 

Image3_ScalarUDF_Query_InlineSubquery

With the scalar UDF approach, there is no mention of the Sales.OrderLines table despite the fact we are now querying that table. Instead, what happens inside the scalar function is hidden behind the single "Compute Scalar" operation—as shown above, we can see something is of concern via the warnings. No matter what you have going on inside the function, there's no detail exposed in the execution plan—just a whole lot of cost. Whereas with the inlined subquery, we have full visibility of exactly what it's doing to get the extra data from Sales.OrderLines. 

Scalar functions can often cause poor performance, just like in this case, for a number of reasons. The scalar function is called iteratively, once per row in the result-set. In this case, that's 73,595 executions, resulting in the poor performance we're seeing here. During the query optimization process, SQL Server does not apply any costings to the scalar function call, whereas clearly we're seeing it has a huge cost. 

Inlining Scalar UDFs in SQL Server 2019

SQL Server 2019 introduced improvements that potentially allow scalar UDFs to be automatically inlined into the calling query, effectively doing some of the work for us that we did for query C in the tests above. I say potentially because:

  • Not all scalar UDFs can be inlined (we’ll cover some of the common reasons why later on).
  • Even if a scalar UDF can be inlined, it doesn’t mean SQL Server will inline it in a given query. And if it does, it still doesn’t mean you’ll end up with the same execution plan and performance as we got with the manual inlining we did.

If you’re using SQL Server 2019 (or later, to those of you in the future), you’ll need to ensure the compatibility level for your database is set to 150 or above.

Do You Even Inline, Bro?

How can you tell if a scalar UDF can be inlined? First, let’s start with some common examples of when they cannot be inlined:

  • When it uses time-based functions like GETDATE() and SYSUTCDATETIME()
  • When it uses a table variable, table valued parameter, or Common Table Expression (CTE)
  • When the scalar UDF is used in an ORDER BY clause

The best way to check is by querying the sys.sql_modules system table which has a bit column (is_inlineable). If that returns 0, then the scalar UDF cannot be inlined.

Using our scenario as an example, the following will check whether we can benefit from automatic inlining in SQL Server 2019 for our UDF:

SELECT * 
FROM sys.sql_modules
WHERE object_id = OBJECT_ID('Sales.CalculateOrderLineTotal')

Great, it returns 1, so it could benefit! Running the query and grabbing the stats for the execution adds D to our previous tests:

  (A) Original Query (without order total) (B) Query with scalar UDF (C) Query with correlated subquery (D) Query with scalar UDF (SQL 2019 auto inlining)
Duration (ms) 51 112,913 153 4,099
CPU 108 104,608 144 7,675
Reads 725 12,564,915 1,093 855,167

We’re now running the exact same query as we did when we originally introduced the scalar UDF into the query (B) that resulted in a huge performance hit. This time, however, performance is noticeably better in terms of Duration, CPU, and Reads. It’s still not as good in this instance when compared to the manual approach to the inline scalar UDF in (C), where all metrics are much lower still. But since the only thing we changed was to switch the database to compatibility level 150 in SQL Server 2019, that’s not too shabby for an out-of-the-box improvement.

If we look at the execution plan now for D, we see:

This looks much more similar to the manually inlined approach. Importantly, we do now have much better visibility of the work the scalar UDF is doing, which we didn’t have at the very start prior to SQL Server 2019/compatibility level 150. But there are some differences compared to the manual approach—we now have a parallel execution plan and a Nested Loop (Inner Join) operation instead of a Hash Match (Right Outer Join) operation for example.

Conclusion

These improvements in SQL Server 2019 could yield some performance improvements when you migrate your database—possibly unexpectedly if you didn’t think you had an issue with scalar UDFs before!

We can never take our eye off the ball when it comes to checking the performance of our queries. It is often the case that there is no silver bullet, and the inlining performed by SQL Server 2019 is very much a case in point. However, as shown above, it’s a welcomed improvement.