Did you know that poor data type choices can have a significant impact on your database design and performance? Developers and DBAs can improve database performance by understanding the data types supported by SQL Server and the implications of choosing different ones. A best practice is to “right size” data types by asking business questions and determining the data types that are best suited to the organization’s and application’s needs.
Right-sizing can result in tremendous storage savings, which can lead to faster database performance. Other considerations to keep in mind are SQL Server’s 8KB data page size limit and the conditions that can lead to page splitting. Also watch out for implicit conversions, an unfortunate by-product of mismatching data types. Taking steps to avoid mismatches and page splitting can greatly enhance performance.
Following are some guidelines for right-sizing your data types—and boosting database performance.
First, let's cover some key points about SQL Server data types.
The following chart shows a handy summary of the storage requirements of fixed-width and variable-length data types:
In contrast, for variable-width columns, the size of each value varies from record to record.
When variable-width columns are used, SQL Server uses pointers to help it jump around and find data. If a table is composed of entirely fixed-width data types, one record in that table will always require a specific amount of storage.
It is important to ask business questions about the organization’s future direction. The goal of right-sizing is to determine whether the data type is appropriate for the application or business.
Following is a table that summarizes storage requirements and value ranges for various data types:
And below are some examples of right-sizing data types for various situations:
In the example below, the minimum size of the right-sized employee table is 104 bytes, compared to 1304 bytes in the employee table that hasn’t been right-sized.
Here's the right-sized employee table:
There are three types of data pages: 1) in-row data, 2) row-overflow data, and 3) LOB data. All pages have a fixed size of 8KB or 8,192 bytes.
Data pages are composed of three components: 1) a 96-byte page header, 2) data records, and 3) a record or slot array that consumes 2 bytes per record.
Developers and DBAs must be aware of the following information:
Remember that SQL Server works within 8KB data page limits because it has real consequences when you are performance tuning.
Another consideration to keep in mind is Transaction Log activity. SQL Server writes to the Transaction Log every time there is an INSERT, UPDATE, or DELETE statement. The amount of Transaction Log written is proportional to the amount of work done by SQL Server.
The more work done on the back end with Transaction Logging, the longer it will take for queries to complete. The Transaction Log volume also affects backup/restore, replication, log shipping, mirroring, and availability groups. On highly transactional systems, Transaction Logging introduces more competition for resources. It's important to be mindful that all I/O translates into more work for SQL Server.
If a SQL Server table is clustered and has a clustered index, page splits can occur. If a new record is inserted into a data page that is already full, SQL Server allocates another page somewhere else. It takes about half of the data records from the original page, moves them to a new page, and tries to insert the new record again. If the new record still doesn’t fit, SQL Server will create another page split. Cascading page splits can be very expensive from a storage perspective.
While page splits typically occur only on clustered tables and only on INSERT operations, there are situations that can inadvertently create page splits. For example:
I recommend creating clustering keys that are:
Auto-incrementing integers and auto-incrementing big integers meet these four criteria.
Some teams are tempted to use NEWSQUENTIALID(). This results in an auto-incrementing GUID. When the organization patches its SQL Servers, it must reboot the underlying Windows Server. The starting point of the NEWSEQUENTIALID() is then reset to a different value. This is potentially dangerous and very risky. If a team really wants to use GUIDs, a best practice is to segregate the clustering key and the primary key.
In SQL Server, if two fields have different data types, their values aren’t considered the same, even if they appear identical to an outside observer. Value conversions in SQL Server follow preset precedence rules. Small data types are always up-converted to larger data types. Then SQL Server can compare the values. This has performance implications for T-SQL code.
Here's a table outlining the precedence rules:
Following are some additional points to keep in mind about implicit conversions:
Developers and DBAs can boost database performance by starting with right-sizing data types. Choosing the appropriate data type can have a significant effect in storage savings. When developers have a good working knowledge of the storage engine, it enables them to write more efficient code and squeeze extra horsepower out of their code.
Andy Yun is a SentryOne Senior Solutions Engineer and a Microsoft MVP. He has been working with SQL Server for 15 years as both a Database Developer and Administrator. Leveraging knowledge of SQL Server Internals and extensive experience in highly transactional environments, he strives to make T-SQL leaner and meaner. Andy is extremely passionate about passing knowledge onto others, regularly speaking at User Groups, SQL Saturdays, and PASS Summit. Andy is a co-founder of the Chicago SQL Association, co-Chapter Leader of the Chicago Suburban User Group, and part of the Chicago SQL Saturday Organizing Committee. This paper was adapted from a presentation Andy delivered at SQLBits XVI.