Just before 2015 drew to a close, I challenged myself - in response to a tweet from Tim Ford - to publish an entry level post on the first Wednesday of every month.
Well, today's July 6th, and I apologize for coming in at the 11th hour, but I still wanted to talk briefly about a new set of functions introduced in SQL Server 2012:
We'll focus on just DATEFROMPARTS()
for now, to keep things simple.
The point of these functions is to make it easier to construct a date, or datetime, or datetime2 variable, when you know the individual parts. DATEFROMPARTS()
takes three arguments: year
, month
, and day
, and returns a date
value. So, for example, SELECT DATEFROMPARTS(2016,7,6);
would yield the date 2016-07-06
.
When would you use this? One example is something I often see in reporting stored procedures, where they accept the year and month as individual parameters:
This is often because a web form or application has a dropdown for the year and a separate dropdown for the month. Which is okay, it just leads to passing those integers in directly, and then the query has to fumble around with turning those into a usable range query.
So then what? Well, I see all kinds of fun ways people use these input parameters in the where clause:
To see a more elaborate script you can copy to demonstrate the output of these expressions, click here.
Now, these all get the right answer, so what's the problem? They're ugly. And, more importantly, non-sargable. Which means they won't be able to use an index on DateColumn
or take advantage of partition elimination; SQL Server will have to scan the entire table or index and apply those calculations to every single row in order to perform any comparisons.
The way around this is to use an open-ended date range (see this post to understand why, and this post to see why we don't use BETWEEN
).
But how do we do that with parameters like @Year
and @Month
? Before SQL Server 2012, you'd have a mix of things from above. The following is sargable, but still has that ugly feeling:
And you could do it a little more cleanly (my preference) by using a local variable, so you only have the ugly code once:
But that's still pretty ugly. The beauty of functions like DATEFROMPARTS()
is that I can turn this pair of parameters into a valid date value (the first of the given month) in a very simple and clean way, and still get the same performance as the above method:
(And this works similarly if I only accept the year as a parameter, or all three parts, or the parts that make up the input parameters to one of the other functions listed above.)
The Pudding
Now, to see why NOT being sargable is bad, I threw together this small repro, which only requires the presence of any copy of AdventureWorks, and then manually determining the date ranges present in Sales.SalesOrderHeader
:
-- create a simple table with an index
-- on a date column:
CREATE TABLE dbo.DateEntries
(
id int IDENTITY(1,1) PRIMARY KEY,
DateColumn date
);
CREATE INDEX SeekMe ON dbo.DateEntries(DateColumn);
-- insert 31,645 rows (most copies of AdventureWorks):
INSERT dbo.DateEntries(DateColumn)
SELECT OrderDate
FROM AdventureWorks.Sales.SalesOrderHeader;
GO
-- now create a stored procedure that will
-- attempt a reporting query using the various
-- methods presented above:
CREATE PROCEDURE dbo.MonthlyReport
@Year int,
@Month int
AS
BEGIN
SET NOCOUNT ON;
DECLARE @Start_Convert date = CONVERT(char(4), @year)
+ RIGHT('0' + CONVERT(varchar(2), @Month), 2) + '01';
DECLARE @Start_DateFromParts date = DATEFROMPARTS(@Year, @Month, 1);
SELECT /* year/month */ DateColumn
FROM dbo.DateEntries
WHERE YEAR(DateColumn) = @Year
AND MONTH(DateColumn) = @Month;
SELECT /* datepart(year/month) */ DateColumn
FROM dbo.DateEntries
WHERE DATEPART(YEAR, DateColumn) = @Year
AND DATEPART(MONTH, DateColumn) = @Month;
SELECT /* both sides to string */ DateColumn
FROM dbo.DateEntries
WHERE CONVERT(char(6), DateColumn, 112)
= CONVERT(char(4), @Year) + RIGHT('0'
+ CONVERT(varchar(2), @Month), 2);
SELECT /* more elaborate both string */ DateColumn
FROM dbo.DateEntries
WHERE CONVERT(char(6), DateColumn, 112) + '01'
= CONVERT(char(4), @Year) + RIGHT('0'
+ CONVERT(varchar(2), @Month), 2) + '01';
SELECT /* even more elaborate */ DateColumn
FROM dbo.DateEntries
WHERE CONVERT(date, CONVERT(char(6), DateColumn, 112) + '01')
= CONVERT(date, CONVERT(char(4), @Year)
+ RIGHT('0' + CONVERT(varchar(2), @Month), 2) + '01');
SELECT /* to int then string */ DateColumn
FROM dbo.DateEntries
WHERE CONVERT(char(6), DateColumn, 112)
= CONVERT(char(6), @Year * 100 + @Month);
SELECT /* funky math */ DateColumn
FROM dbo.DateEntries
WHERE CONVERT(date, CONVERT(char(6), DateColumn, 112) + '01')
= DATEADD(MONTH, @Month-1, DATEADD(YEAR, @Year-1900, 0));
SELECT /* using @Start_Convert */ DateColumn
FROM dbo.DateEntries
WHERE DateColumn >= @Start_Convert
AND DateColumn < DATEADD(MONTH, 1, @Start_Convert);
SELECT /* using @Start_DateFromParts */ DateColumn
FROM dbo.DateEntries
WHERE DateColumn >= @Start_DateFromParts
AND DateColumn < DATEADD(MONTH, 1, @Start_DateFromParts);
END
GO
-- now, to figure out what ranges we have
-- note that your range may be different
-- depending on which copy of AdventureWorks
SELECT
MIN(DateColumn), -- 2011-05-31
MAX(DateColumn) -- 2014-06-30
FROM dbo.DateEntries;
GO
-- let's pick a random month in that range (I'll
-- go with July 2012), and examine the output
-- remember, the range you need may be different
EXEC dbo.MonthlyReport @Year = 2012, @Month = 7;
We can run this last stored procedure call on its own in SQL Sentry Plan Explorer, and look at the results grid and the graphical plans. In the results grid, we can see how all of the approaches except for the final two resulted in an index scan and a much higher number of reads:
Results grid, showing scans vs. seeks
And in the graphical plan, we can see the seek on the left (representing the two open-ended range queries), and the scan on the right (representing the other approaches). Note also the warning you get about residual I/O, just a further indicator that you performed a scan when you didn't have to - and this meant plenty of wasted I/O:
Graphical plan, showing more expensive scan and residual warnings on the right
If you're currently performing any of these dirty tricks to get valid date ranges out of individual numeric or string date components, consider using the *FROMPARTS()
family instead. It will make your code a little cleaner, and hopefully will encourage more open-ended range queries, too.
Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.
Aaron's blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.
Comments