SQL Server Storage Best Practices: Choosing Storage Options

Jonathan Kehayias

Published On: September 29, 2021

Categories: 0

Storage is one of the most critical components for any relational database management system, and getting the right storage configuration affects reliability, availability, and performance. When it comes to SQL Server storage best practices, choosing between storage hardware options has changed significantly over the last decade, but that doesn’t necessarily make choosing the correct storage options for SQL Server any easier.

When I started out as a consultant with SQLskills.com in 2011, flash storage was barely being used by most businesses, fiber channel SAN connections were limited at 8Gb/sec, iSCSI 10Gb/sec SANs were the exception, cloud-based solutions were in their infancy (AWS was barely five years old and Azure was just reaching its first year live), and NVMe wasn’t even a thing. Today, hard drives (HDDs) have been replaced by solid-state drives (SSDs), cloud providers have become the go-to solution for many businesses, and the volumes of data have increased exponentially. This post will look at how to best configure storage for SQL Server based on current technology and trends as of its publication.

Understanding SQL Server I/O Workload Patterns

One of the important aspects of choosing the best storage option for SQL Server is understanding the I/O workload patterns SQL Server performs. One of the worst things I’ve encountered over the last twenty years working with SQL Server has been dealing with storage configurations driven solely off of how much space was needed, with no understanding of how the space would actually be used and performance requirements. Sadly, this still occurs today, and the hard reality is getting more performance after storage is in use usually requires provisioning a new configuration and migrating to it, even in the cloud. (See the Living in a Virtual World section below.)

From a SQL Server standpoint, I’m going to break the I/O workload patterns down into five specific areas:

  1. Data files – typically random reads and writes 8KB – 512KB in size
  2. Log files – typically sequential writes 512 bytes – 60KB in size
  3. tempdb – may or may not be used but typically write intensive for data files
  4. Backups – typically sequential reads up to 2MB in size but can also be random reads from data files for log and differential backups, plus sequential writes to backup destination
  5. Availability Group replicas – may have any combination of the above I/O patterns for synchronization, backups, and offloading read-only operations

Traditional best practices for SQL Server would separate 1-4 above to different physical storage devices, which may have different underlying RAID configurations based on the workload. On HDD storage, the rotational latency of the drive and seek latency for the read/write head often became a problem for mixing random access and sequential access workloads. This was why separation of the I/O workloads by type was such an important factor for high throughput SQL Server performance. What’s funny however, is even where log files were separated from data files in the configuration, the best practice wasn’t actually being followed in most environments, since multiple transaction log files placed on the same disk together results in a random I/O access pattern that negatively impacts sequential I/O throughput.

SQL Server Storage Best Practices for Modern Times

One of the challenges with all these changes in such a short period of time has been that published “best practices” for SQL Server haven’t been updated by Microsoft to reflect the changes in technology available, or where newer best practices have been written, they conflict with others published and available online. Here’s the thing with best practices:

  1. They’re not eternal, so stop using best practices from SQL Server 2000 and 2005 timeframes for hardware systems today, except where they’re tested to be providing benefit.
  2. Separate drives historically meant different physical hardware, not separate drive letters. This one gets confusing and tricky on virtual machines, SAN-based storage, and cloud configurations where you don’t control the underlying physical location that backs your storage completely. In some scenarios, multiple disks may still be required for performance, sizing capacity wise, but rarely for isolation of the workload today as was originally intended by the recommended best practice.
  3. Applying one “best practice” while ignoring another can result in sub-optimal configuration and slower performance. I’ve seen this countless times where a SAN vendors “best practice” configuration for SQL Server is ignored to follow the old I/O Best Practices for SQL Server guidance from Microsoft, resulting in an over-complicated configuration that doesn’t perform as well as the SAN vendor’s tried-and-proven recommendations.
  4. Some “best practices” are targeted at edge-case workloads. While these may still be a “best practice,” for smaller workloads for SQL Server, they might not be necessary.

My general recommendation for SQL Server storage best practices is to use the most recently published, vendor/platform/issue-specific best practices for SQL Server available. These are generally available through a basic Google/Bing search for whatever the platform name is followed by “SQL Server Best Practices,” and are geared towards achieving the best overall performance based on thorough testing by the vendor. For example:

While separation of data files and log files physically in storage today would still be recommended, the reality is performance impacts on SSD, NVMe, and even most SAN-based storage configurations today, from mixing random and sequential I/O’s together, don’t exist. I’m a huge advocate of separating log files to RAID 1 or 10 storage for higher redundancy when possible, and I also advocate having completely different storage for backups and maintaining a secondary copy of the backup files (e.g., in the cloud, on a storage appliance, another server) to ensure you can always access the backups in an emergency. In 2021 alone, SQLskills has received countless requests for consulting from businesses impacted by ransomware attacks that encrypted not only their database files but also the only copies of their backups as a result of them only being stored on the same system.

Living in a Virtual World

When I started working with SQL Server, virtualization was in its infancy, and few businesses would consider virtualizing mission-critical workloads like SQL Server due to the limitations and performance requirements. However, today, virtualization is the norm for most operations, and dedicated physical hardware implementations are becoming more of the exception rather than the rule for SQL Servers. While virtualized SQL Servers allow better utilization of hardware resources and much more dynamic scaling, it’s also not without its own challenges, especially where storage is considered. While SQL Server behaves the same whether it is virtualized on-premises or in the cloud, there are very distinct differences and considerations around storage configuration between the two that will greatly affect the performance. Here are some best practices for virtualizing SQL Server.

On-Premises Virtual Machines

With on-premises virtualization of SQL Server, there’s a lot more control over the resources allocated to SQL Server as well as the underlying I/O configuration. One of my favorite things to point out to DBAs when doing a review of a virtualized SQL Server is different drive letters doesn’t necessarily mean separation of the I/O. This is true regardless of the underlying hypervisor, and unless the configuration has been fully configured from the ground up with separation of I/O as a design consideration, at one or multiple points there’s likely not going to be separation of the I/O workloads.

A virtual disk in most cases is simply a large file on the storage available to the host hypervisor on which the virtual machine is running on. In most cases, multiple virtual disk files will be created in a single pool of storage provided by a SAN, and the underlying disks backing the storage will be the same. This doesn’t mean performance problems are going to occur, but it means DBAs need to be educated on the full configuration to know whether a benefit is being realized by having separate virtual disks for data files, log files, and backups. Often, this isn’t the case in most environments I’ve seen over the last 10+ years of consulting. In some cases, virtual machines have multiple virtual disks that “appear” to be following the age-old best practices of I/O separation, but in reality, there’s zero separation or benefit to having multiple disks.

Whether you run SQL Server virtualized on VMware, Hyper-V, or some other hypervisor supported under the SVVP by Microsoft, it’s important to follow the best practices recommendations for that platform when configuring the I/O subsystem for SQL Server. Most VMware virtual machines I look at running SQL Server DO NOT follow VMware recommended configuration best practices for SQL Server. Specifically, page 69-70 of the best practices guide for SQL Server points out that the PVSCSI Controller Queue Depth in Windows is limited and Windows Server is not aware of the increased I/O capabilities by default.

Using the PVSCSI virtual storage controller, Windows Server is not aware of the increased I/O capabilities supported. The queue depth can be adjusted for PVSCSI in Windows Server to 254 for maximum performance. This is achieved by adding the following key in the Windows Server registry “HKLM\SYSTEM\CurrentControlSet\ services\pvscsi\Parameters\Device /v DriverParameter /t REG_SZ /d “RequestRingPa ges=32,MaxQueueDepth=254”

Additionally, most VMs have a single SCSI controller configured, and all the disks for the VM are assigned to that one controller. This can become a I/O bottleneck since the controller has maximum I/O limits shared by all the virtual disks configured for the machine. Page 58 of the best practices guide by VMware recommends using multiple controllers and assigning different virtual disks to different controllers, thereby increasing the I/O capabilities of the virtual machine itself. These two simple changes can have a significant impact to the overall performance of SQL Server in an on-premises implementation on VMware.

Cloud Virtual Machines

While multiple cloud solutions exist, the most common for running SQL Server tend to be Amazon AWS EC2, Azure Virtual Machines, and Google Cloud Platform (GCP) Compute Engine. While each of these platforms also offers fully managed SQL Server solutions, I’m specifically talking about self-managed virtual machines running on the cloud platform. Each of the cloud providers offers template-based deployment of virtual machines of different sizes of vCPU, memory, and with different I/O throughput limitations. I/O configurations in the cloud are significantly more complex than on-premises since multiple factors have to be considered to maximize the performance of the I/O subsystem. The instance type will determine the maximum amount of I/O the instance itself can perform.

Then each of the cloud providers offers various types of virtual disks which support different levels of throughput.

Then for each of the virtual disk types, the size of the disk determines the maximum throughput capability of the disk.

In some cases, you won’t be able to reach the maximum throughput of the instance size/type with a single disk, even when using the most expensive/highest throughput disk type available. This is where Windows features like Storage Spaces need to be considered to create storage pools using multiple cloud disks to increase the throughput through striping and creating Storage Spaces virtual disks with the column count equal to the number of disks in the pool. Redundancy, at least to a certain degree, is built into the cloud storage platform, so we’re only striping for the performance benefits. Where costs are a consideration, you can maximize performance and minimize costs by using multiple lower cost “standard” disks configured to their maximum size for throughput and stripe those using storage spaces to get larger virtual disks with higher throughput as well.

Many of the cloud VM templates also offer non-persistent storage configurations known as ephemeral storage, which provides high throughput/low latency storage temporary to the virtual machine. This can be great for things like SQL Server tempdb, buffer pool extension files, scratch file storage, etc., but it also has the additional requirement of being reconfigured anytime the OS is shut down and then restarted. The drive letter assignments for the ephemeral storage, as well as any file system folder structures or permissions, won’t exist when the system first boots. This generally requires running a script on Windows startup to partition, format, and assign a drive letter to the device, and then creating any folders and permissions required for SQL Server before the SQL Server service is started or it will fail. There are plenty of free example scripts to handle this task and then start the service available online for the different cloud platforms, but this is an additional configuration consideration.

Summary

As you have hopefully figured out by this point, there are a lot of considerations for SQL Server storage best practices today, and as the pace of technology changes increases, they’re only going to get more complicated. The best practices from years ago, while still somewhat applicable today, shouldn’t be considered the only way to configure storage for SQL Server. In some cases, the underlying implementation may be completely outside of the DBA’s control, and constraints and limitations may dictate deviations from the older “best practices” in lieu of newer configuration designs that are now more vendor-specific best practice configurations.

Jonathan Kehayias is a Principal Consultant with SQLskills. He is the youngest person ever to achieve the Microsoft Certified Master – SQL Server certification and has been a SQL Server MVP for many years. Jonathan is heavily involved in the SQL community, including being a top answerer on the MSDN Database Engine forum, writing deep technical content on his blog, and answering questions as @SQLPoolBoy on Twitter.


Comments

Fighting Downtime and Slowdowns