[See an index of all bad habits / best practices posts]
As a frequent reviewer of queries, and a regular contributor to our answers site and dba.stackexchange.com, I come across a lot of code that seems to have been written, shall we say, quickly. There are a lot of transgressions of course, but one that I notice frequently is inconsistency in case. People will have a table like this:
And then a query like this:
The code compiled and the query returned results, so party time, right?
This can cause problems.
For one, being inconsistent about case can lead to situations like this, where another user executes the exact same logical query, but uses different case:
There are a few subtle differences you might spot right away, and if you look at the plan cache, you'll see that you actually stored two copies of the plan (since the query_hash is case- and whitespace-sensitive). I talked about this a bit more thoroughly and showed an example late last year:
More importantly, though: neither query uses identifiers that exactly match the base table or its columns. And while the query may work on your machine, it will break if it gets deployed to a case-sensitive database or to a server using a case-sensitive collation.
PROOF.
I set up two SQL Server instances, one with the default SQL_Latin1_General_CP1_CI_AS collation, and one with its case-sensitive cousin, SQL_Latin1_General_CP1_CS_AS. On each instance, I set up two databases, one using each collation. What I wanted to do was simulate what would happen if I wrote code - sloppily, on purpose - in the best case scenario (case insensitive collation on a case insensitive instance), then tried to deploy it and run it in the other environments.
So, on that initial CI/CI instance, I created the following objects - a table and a matching TVP, along with a stored procedure that intentionally uses the wrong case for the table and column names, and even uses the wrong case for its own input parameter:
First, I tried deploying the same objects to the case sensitive database on the case insensitive server. I can't get very far, because as soon as I try to create the procedure, I get all kinds of problems:
I next tried the case insensitive database on the case sensitive server. There was only one error this time; interesting to note, though, that the error message uses the lower-case version of the table variable name (which comes from the procedure body), unlike above where it used the upper-case version (which comes from the parameter list):
The case sensitive database on the case sensitive server returned to the more verbose error messages above. So then I tried partially correcting the procedure in that environment, making the parameter casing match and the type name correct:
I was expecting this to actually get created, perhaps with that warning about deferred name resolution, because it shouldn't have been able to find dbo.customers
in the system metadata. But no, it matched the table regardless of case, but noticed that the column names didn't match:
So, this made me investigate another avenue: fixing the column names, but not the table name. Surely then the procedure wouldn't get created?
Lo and behold:
Okay, so what if we call it (with code that doesn't get any of the case wrong, of course)?
The engine must look in a different place now, because this yields:
This is an even bigger problem, of course, because the code was deployed successfully, and now the error might not happen until an end user tries to add (or become) a customer...
A LITTLE SIMPLER.
Here's an even simpler set of code that I wanted to try on each database/instance combination:
Here's how they worked out:
Database / Instance |
variable |
column |
object |
database |
CI / CI |
OK |
OK |
OK |
OK |
CS / CI |
OK |
Invalid column |
Invalid object |
OK |
CI / CS |
Must declare variable |
OK |
OK |
Database does not exist |
CS / CS |
Must declare variable |
Invalid column |
Invalid object |
Database does not exist |
The reasons for these patterns quickly become fairly evident: variables use instance collation, object and column names use the collation of sys.objects
etc. (which use database collation), and database names use sys.databases
(which uses instance collation). If we look at those in each case:
This is what they look like:
Database / Instance |
sys.all_objects |
sys.all_columns |
CI / CI |
SQL_Latin1_General_CP1_CI_AS |
SQL_Latin1_General_CP1_CI_AS |
CS / CI |
SQL_Latin1_General_CP1_CS_AS |
SQL_Latin1_General_CP1_CI_AS |
CI / CS |
SQL_Latin1_General_CP1_CI_AS |
SQL_Latin1_General_CP1_CS_AS |
CS / CS |
SQL_Latin1_General_CP1_CS_AS |
SQL_Latin1_General_CP1_CS_AS |
There are some peculiarities, though. For example, with system objects, you can get the case of a column name wrong no problem, even in a case sensitive database on a case sensitive instance. This works no problem:
However if you get the case wrong for the view itself instead of the column:
Still an error message:
So it looks like different rules are used to determine whether case sensitivity matters for system objects, depending on whether you are looking at the object itself or its columns. Even more reason for you to just strive to have code that uses correct, matching case from the start, in all cases.
SUMMARY.
Long story short, a lot of things can go wrong when you develop in a case insensitive collation but code might ultimately get deployed to a case sensitive database or instance (or both). In most cases the errors will be caught at deployment time, but in a few, the objects could be created successfully anyway, and problems would not be detected until testing/QA time (or, worse, even later). This is one of those things that is fairly easy to get right from the start, and a lot more cumbersome to fix later.
PREVENTION.
There are some code analysis tools that can check for things like this; when I pinged George Mastros (@gmmastros), he agreed that it was a good idea, and is going to try to put it in an upcoming release of SQLCop.
But you can prevent these problems far earlier than that, if you're willing to do it. I have long been a proponent of developing - at least for code that does not cross databases or servers - on a case sensitive collation. This makes sure these errors are caught up front, and not six months from now when a database is migrated or consolidated. "But won't that slow me down?" Initially, maybe, a little. But in all honesty, IntelliSense (or tools like Redgate SQL Prompt) should prevent you from ever having to type this stuff correctly on your own anyway.
[See an index of all bad habits / best practices posts]
Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com.
Aaron's blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.
Comments