BackToBasics: Why I Use Lower Case for Data Type Names (Now)

This is a continuation of my #BackToBasics series, where I pledged to write an entry-level post at the beginning of each month. I promised the first Wednesday but, well, you'll see.

For the longest time, I always considered a data type name to be a T-SQL keyword of sorts. I would write code like this where data type names would be forced to upper case to match my chosen convention of upper-case keywords:

DECLARE @x VARCHAR(32), @y INT;
SELECT @x = CONVERT(INT, '32');

I have literally thousands of code snippets throughout my blog posts, tips, and forum answers peppered all over the Internet - spread over ten years - where every single instance of a data type name will have been forced to upper case, simply because that had always been my habit. I'm not going to go back and fix those - I mean, I have a life to live, and stuff. But for all my code samples going forward, you'll see a stark difference - all the data type names will be in lower case. Here is an example why - another example of the type of code I would write in the past would use upper case for system data type names when expressed as literals, too. For example:

SELECT system_type_id 
  FROM sys.types 
  WHERE [name] IN (N'INT', N'VARCHAR');

In systems where setup was click click next, this worked no problem, returning 56 and 167 as expected. Then I started dealing with more and more case-sensitive and binary collations, which are sometimes implemented in systems outside of your control. Take this database, for example:

CREATE DATABASE floobmort COLLATE Latin1_General_BIN2; 
GO
USE floobmort;

Now, run that same query against sys.types. What happens? Empty result set. That comparison tries to compare N'INT' to what's stored in the table, N'int', and because the comparison is done using a binary collation, it returns false. There are other cases the collation difference will cause unexpected errors, for example when we try to inspect the collation used for sys.types.name:

SELECT db = N'floobmort', coll = collation_name
  FROM floobmort.sys.all_columns 
  WHERE object_id = OBJECT_ID(N'sys.types')
  AND name = N'name'
UNION ALL
SELECT 'tempdb', collation_name
  FROM tempdb.sys.all_columns
  WHERE object_id = OBJECT_ID(N'sys.types') 
  AND name = N'name';

Msg 451, Level 16, State 1
Cannot resolve collation conflict between "SQL_Latin1_General_CP1_CI_AS"
and "Latin1_General_BIN2" in UNION ALL operator occurring in SELECT statement column 2.

I'm not trying be meta about metadata ("meta meta"?) or give you nostalgia about the movie Inception, but the way collation name is stored using database-specific collation illustrates precisely why relying on it can be problematic.

We can fix this easily in this case by applying a COLLATE clause to the first SELECT (and any collation will do, since we just want readable output; we don't need to worry about changing actual Unicode data or sorting rules as a result):

SELECT db = N'floobmort', coll = collation_name COLLATE SQL_Latin1_General_CP1_CI_AS
  FROM floobmort.sys.all_columns 
  WHERE object_id = OBJECT_ID(N'sys.types')
  AND name = N'name'
UNION ALL
SELECT 'tempdb', collation_name
  FROM tempdb.sys.all_columns
  WHERE object_id = OBJECT_ID(N'sys.types') 
  AND name = N'name';
 
/*  RESULTS:
    floobmort      Latin1_General_BIN2
    tempdb         SQL_Latin1_General_CP1_CI_AS
*/

Now we see that they are stored differently, and it gives a better idea why the query above for type names returned an empty result. You will of course want to use a more specific collation clause if you are using a collation and have objects or entities named using characters from those character sets (otherwise they won't look right if that target collation can't display them the same).

Now, admittedly…

...there aren't many real-world scenarios where this can actually lead to problems. Even on a SQL Server instance installed with a binary collation, variable and parameter data types are case insensitive, so you can do this just fine:

CREATE PROCEDURE dbo.what
   @i INT
AS
BEGIN
  SET NOCOUNT ON;
  DECLARE @foo DATETime2(7);
  SELECT TRY_CONVERT(CHar(21),'blat');
  CREATE TABLE #f(a INTeger);
END
GO
EXEC dbo.what @i = 4;

There are other, less obvious places where this can break, though. For example, with the new CLR types like geography, you need to be case sensitive if the database or instance is case sensitive. Try this:

SELECT geography::STGeomFromText('LINESTRING(-10 22, -12 19)', 4326);
GO
SELECT GEOGRAPHY::STGeomFromText('LINESTRING(-10 22, -12 19)', 4326);

The latter fails on a case-sensitive database or instance with:

Msg 243, Level 16, State 4
Type GEOGRAPHY is not a defined system type.

And this is not exclusive to the new CLR types, but rather generic to CLR-related syntax; for example, I can make an old type like char fail in the same way, if I try to invoke invalid methods:

SELECT CHAR::what;

If the database or instance is case sensitive, you get this error message:

Msg 243, Level 16, State 4
Type CHAR is not a defined system type.

On a case insensitive database or instance, it still fails, but with a different message:

Msg 258, Level 15, State 1
Cannot call methods on char.

You have no business trying to call methods on that data type, but the point of that example is just to illustrate that the problem is about checking the name of the type, and that happens before any of the other syntax around it is validated.

Also, this can happen if you're not careful about case sensitivity when dealing with alias types (not that you should use these). For example:

CREATE TYPE dbo.mytype FROM VARCHAR(20);
GO
 
DECLARE @x dbo.MyType;

Again, this works fine in a case-insensitive database or instance, but if you happen to have a case-sensitive or binary collation in play, you'll see this:

Msg 2715, Level 16, State 3
Column, parameter, or variable #1: Cannot find data type dbo.MyType.
Parameter or variable '@x' has an invalid data type.

While you can probably enumerate the individual scenarios where this is riskier, the underlying argument I'm trying to make here is one about consistency. I'm a big fan of always spelling out date parts like DAYMONTH, and MINUTE, instead of using their shorthand, even though there are some cases where ambiguity is not possible. Similarly, here, I feel that striving to always match the exact case that is in sys.types - even in situations where it can't break - leads to better habits that can avoid the above problems.

Summary

You should always code defensively and match the case sensitivity of all objects, variables, and parameters. You really never know when code you've written (or that has helped someone else) will get implemented in a case-sensitive or binary collation. I've explained above why I am very careful to match what's in sys.types, but this extends to other scenarios, too.

For example, try this in a database using a case-sensitive or binary collation, where you have to be accurate with object names (not just for your own objects, but for system objects, too):

SELECT * FROM SYS.Types;

Result:

Msg 208, Level 16, State 1
Invalid object name 'SYS.Types'.

And even worse, the problem extends to parameter and variable names on a case-sensitive or binary instance (and probably if you've somehow managed to change master to be that way, though I'll confess I haven't been motivated enough to confirm). On such an instance, this code:

DECLARE @foo int;
SELECT @FOO;

...fails with this error message:

Msg 137, Level 15, State 2
Must declare the scalar variable "@FOO".

This can lead to quite a surprising troubleshooting session if you develop your solution on a case-insensitive instance and then deploy to one that isn't. One thing I've started doing is developing on a BIN2 collation - yes, if I'm sloppy, it may be painful to deal with those error messages during development, but that's way better, IMHO, than hitting them during or after deployment, or when your app or database gets migrated to a different server. And IntelliSense should negate any sloppiness there anyway...