How Your Data Type Choices Can Affect SQL Server Database Performance

Why Your Data Type Choices Matter

Andy Yun- Small

 

Twitter: @keklineBlog: kekline.com

Did you know that poor data type choices can have a significant impact on your database design and performance? Developers and DBAs can improve database performance by understanding the data types supported by SQL Server and the implications of choosing different ones. A best practice is to “right size” data types by asking business questions and determining the data types that are best suited to the organization’s and application’s needs.

Right-sizing can result in tremendous storage savings, which can lead to faster database performance. Other considerations to keep in mind are SQL Server’s 8KB data page size limit and the conditions that can lead to page splitting. Also watch out for implicit conversions, an unfortunate by-product of mismatching data types. Taking steps to avoid mismatches and page splitting can greatly enhance performance.

Following are some guidelines for right-sizing your data types—and boosting database performance.

 

Understanding SQL Server internals helps developers and DBAs write more efficient code

First, let's cover some key points about SQL Server data types.

1. In SQL Server, fixed and variable-length data types have different storage requirements.

  • Fixed-width data types always require the same amount of storage, regardless of the value stored in those columns or variables.
  • Variable-width data types always have two extra bytes of overhead.
  • With data types starting with N (NCHAR and NVARCHAR), the N stands for the number of characters that are stored. It is N x 2. Since NCHAR and NVARCHAR can store Unicode information, they require two bytes of storage per character.
  • If you are storing a value, a string value that will always be short, or a one-character value, it is better from a storage perspective to use a CHAR(1) instead of a VARCHAR(1). Even if a VARCHAR(1) field is empty, it will still require two bytes of storage.

 

The following chart shows a handy summary of the storage requirements of fixed-width and variable-length data types: 

Fixed_and_Variable_Length_Data_Types

 

 2. The FIXVAR format is the actual physical format SQL Server uses to store a record.

 FixVarFormat

 

 3. With fixed-width columns, it's easy to find data in records because the F-Data size is easily calculated.

 Fixed Width Columns

 

In contrast, for variable-width columns, the size of each value varies from record to record.

When variable-width columns are used, SQL Server uses pointers to help it jump around and find data. If a table is composed of entirely fixed-width data types, one record in that table will always require a specific amount of storage.

 

VariableWidthColumns

 

4. When right-sizing data types, a best practice is to analyze whether a data type is the appropriate container for the value that will be stored. It is important to ask business questions about the organization’s future direction. The goal of right-sizing is to determine whether the data type is appropriate for the application or business.

Following is a table that summarizes storage requirements and value ranges for various data types:

RightSizingDataTypesREV

And below are some examples of right-sizing data types for various situations:

Right Sizing Examples

 

5. Right-sizing a table can result in tremendous storage savings. In the example below, the minimum size of the right-sized employee table is 104 bytes, compared to 1304 bytes in the employee table that hasn’t been right-sized.

Data_Types_Employee_Large

Here's the right-sized employee table:

Data_Types_Employee_Narrow

 

6. SQL Server uses its own container called a data page to store records on disk. There are three types of data pages: 1) in-row data, 2) row-overflow data, and 3) LOB data. All pages have a fixed size of 8KB or 8,192 bytes.

Data pages are composed of three components: 1) a 96-byte page header, 2) data records, and 3) a record or slot array that consumes 2 bytes per record.

Developers and DBAs must be aware of the following information:

  • The 8KB limit affects what can fit in a data page. Typically, data is stored in a row. However, if a record is longer than 8KB bytes, it presents a problem. A single record cannot span multiple data pages. As a result, it is necessary to use row-overflow or LOB data pages.
  • VARCHAR(8000) and NVARCHAR(4000) use row-overflow data pages
  • VARCHAR(MAX) and NVARCHAR(MAX) use LOB data Pages
  • If a string exceeds the value of a second 8KB data page, SQL Server will create a chain of different pages. This can incur additional storage demands and additional I/O overhead.

     

Remember that SQL Server works within 8KB data page limits because it has real consequences when you are performance tuning.

 

 

Transaction Logging activity can create a performance drag for SQL Server.

Another consideration to keep in mind is Transaction Log activity. SQL Server writes to the Transaction Log every time there is an INSERT, UPDATE, or DELETE statement. The amount of Transaction Log written is proportional to the amount of work done by SQL Server.

The more work done on the back end with Transaction Logging, the longer it will take for queries to complete. The Transaction Log volume also affects backup/restore, replication, log shipping, mirroring, and availability groups. On highly transactional systems, Transaction Logging introduces more competition for resources. It's important to be mindful that all I/O translates into more work for SQL Server.

 

Page splitting is expensive from a storage perspective and is best avoided when possible.

If a SQL Server table is clustered and has a clustered index, page splits can occur. If a new record is inserted into a data page that is already full, SQL Server allocates another page somewhere else. It takes about half of the data records from the original page, moves them to a new page, and tries to insert the new record again. If the new record still doesn’t fit, SQL Server will create another page split. Cascading page splits can be very expensive from a storage perspective.

While page splits typically occur only on clustered tables and only on INSERT operations, there are situations that can inadvertently create page splits. For example:

  • Updates to a variable length column. Page splits can also occur if a variable length column is updated. If the record can’t be updated in-place or elsewhere on the same page, the UPDATE operation becomes a DELETE and INSERT. The INSERT operation then triggers a page split.
  • Using GUIDs for clustering keys. If there are a lot of singleton INSERTS and a GUID is used for a clustering key, it is very likely that page splitting will occur on almost every INSERT.

 

I recommend creating clustering keys that are:

  1. Narrow. This saves on space.
  2. Unique. A clustering key shouldn’t repeat itself.
  3. Static. The key for a given record should always remain the same.
  4. Ever-increasing. If a clustering key is ever-increasing, the hot spot is at the end of the clustering key sequence so page splits don’t occur in the middle of the sequence. 

Auto-incrementing integers and auto-incrementing big integers meet these four criteria.

Some teams are tempted to use NEWSQUENTIALID(). This results in an auto-incrementing GUID. When the organization patches its SQL Servers, it must reboot the underlying Windows Server. The starting point of the NEWSEQUENTIALID() is then reset to a different value. This is potentially dangerous and very risky. If a team really wants to use GUIDs, a best practice is to segregate the clustering key and the primary key.

 

Implicit conversions in SQL Server can result in performance degradation. 

In SQL Server, if two fields have different data types, their values aren’t considered the same, even if they appear identical to an outside observer. Value conversions in SQL Server follow preset precedence rules. Small data types are always up-converted to larger data types. Then SQL Server can compare the values. This has performance implications for T-SQL code.

Here's a table outlining the precedence rules:

 

Converting with Data Type Precedence

 

Following are some additional points to keep in mind  about implicit conversions:

  • If you mismatch your data types, you will be unable to seek on your indexes. With mismatched data types, it takes much more work for SQL Server to fulfill queries. If you change a few data types and match them up, you might be able to improve performance significantly.
  • Certain conversions are free. These include conversions between CHAR and VARCHAR, as well as conversions between NCHAR and NVARCHAR.
  • To UNICODE or not to UNICODE is all about tradeoffs. If you default to using VARCHAR, data could be lost and you might need to refactor. If you use NVARCHAR, the storage and I/O requirements are doubled.
  • Using NVARCHAR(500) can bloat the cost of SQL Server execution plans. When SQL Server creates an execution plan, it uses costs and estimations. One of the estimations used in the algorithm is the average row size. With a variable-width data type, SQL Server doesn’t know the average size of the row, nor does it query the data. Instead, it takes the data type definition. If you are using NVARCHAR(500) as the data type definition, it can bloat the execution plan.
  • Tools exist to find implicit conversions. Jonathan Kehayias has written code to scan your execution plan cache and find code that is suffering from implicit conversions. While this tool is helpful, it is not advisable to run it against a production database.

 

Fast Database Performance Starts with the Right Data Type Choice

Developers and DBAs can boost database performance by starting with right-sizing data types. Choosing the appropriate data type can have a significant effect in storage savings. When developers have a good working knowledge of the storage engine, it enables them to write more efficient code and squeeze extra horsepower out of their code.

 

Andy Yun is a SentryOne Senior Solutions Engineer and a Microsoft MVP. He has been working with SQL Server for 15 years as both a Database Developer and Administrator. Leveraging knowledge of SQL Server Internals and extensive experience in highly transactional environments, he strives to make T-SQL leaner and meaner. Andy is extremely passionate about passing knowledge onto others, regularly speaking at User Groups, SQL Saturdays, and PASS Summit. Andy is a co-founder of the Chicago SQL Association, co-Chapter Leader of the Chicago Suburban User Group, and part of the Chicago SQL Saturday Organizing Committee. This paper was adapted from a presentation Andy delivered at SQLBits XVI