T-SQL Tuesday: Shiny New Toys

T-SQL Tuesday

This month's T-SQL Tuesday - hosted by Matt Gordon (@sqlatspeed) - is called "Fixing Old Problems with Shiny New Toys." Matt wants us to write about how we've solved old problems in new ways afforded by enhancements to SQL Server (or new features anywhere in the stack, I would guess).

Over the years I have helped a lot of people solve problems with either splitting strings apart or putting them back together. Both require an inordinate amount of code in older versions of SQL Server, but modern versions make this a cinch with new built-in functions. As an added bonus, these new built-ins perform better than the older kludges, at least in all cases I've tested.

Splitting Strings

The function STRING_SPLIT(), introduced in SQL Server 2016, allows you to pass in a string and a delimiter, and outputs a row for each string in the set.

 There are a few limitations right now: you can only use single-character delimiters, and you can't control the output order. While in practice you will generally see the values come out in the order they appear in the original string, this is not guaranteed.

Concatenating Strings

The function STRING_AGG(), which just missed the cut for SQL Server 2016, will be available in SQL Server v.Next - hopefully later this year. This allows the opposite of STRING_SPLIT(): you can "join" values from multiple rows into a single string, using whatever delimiter you like.

This function is a little more fully-baked: The one-character restriction for the delimiter is not present, and you can dictate the order of concatenation using the WITHIN GROUP clause.

A Quick Example

In a post last year, I talked about Four Practical Use Cases for Grouped Concatenation. One of the examples - where I show how to remove duplicates from a comma-separated string, and then re-assemble the string without the duplicates - was quite messy. It used an expensive string-splitting function (and most are quite expensive), along with the quirky FOR XML PATH syntax many of us use to join strings together:

DECLARE @names nvarchar(4000) = N'bob,sandy,tracy,bob,tracy,frank,tracy';
 
;WITH x AS
(
  SELECT Item -- rather expensive UDF call
    FROM dbo.SplitStrings(@names, N',') 
    GROUP BY Item
)
SELECT STUFF((SELECT N',' + Item
  FROM x
  ORDER BY Item
  FOR XML PATH(''), TYPE
).value(N'.[1]',N'nvarchar(max)'),1,1,'');

This would produce:

bob,frank,sandy,tracy

With the new built-in functions, I can now simplify this substantially, and get the exact same results:

DECLARE @names nvarchar(4000) = N'bob,sandy,tracy,bob,tracy,frank,tracy';
 
SELECT STRING_AGG(Value, N',') FROM 
 (
   SELECT Value FROM STRING_SPLIT(@names, N',') GROUP BY Value
 ) AS x;

That may not look like a massive simplification, but don't forget about all the logic buried behind the table-valued function in the first example. And if you're like several shops I know, if you look across your codebase and see all the messy uses you have for either of these methods, the benefits should be even more clear - and testing should bear that the performance savings compared to traditional, expensive methods are the sweetest part of the deal.

Other Resources

I've written about both of these functions - and the kludges they finally replace - in a lot more detail. Here are a few posts:

String Splitting

String Concatenation

Don't worry, I have no intentions of going back and re-writing the "Old Way" posts to take advantage of the newer functions. But if you have a use case where you're splitting or concatenating now, and would like some help converting your code to work with the new functions, let me know in the comments below or at abertrand@sentryone.com - I'll be happy to help, if I can.

Thwack - Symbolize TM, R, and C