T-SQL Tuesday #66: Babysitting a Slow Backup or Restore

Aaron Bertrand

Published On: May 12, 2015

Categories: T SQL Tuesday, Backups, Monitoring 1

T-SQL Tuesday #66

This month's T-SQL Tuesday event is on the topic of monitoring, and is hosted by Catherine Wilhelmsen (@cathrinew).

If you've ever wondered what is going on after a backup or restore operation reaches 100%, but the database is still not available, you're not alone. When people are running backups interactively, they often add the WITH STATS option, which adds a notification as each specified % of the data movement process is completed. But that doesn't add a whole lot of diagnostics, and it can often represent a very small portion of the overall process (which can include things like undo/redo operations and updates to tables in msdb).

Let's say we have a very simple database like this:

CREATE DATABASE TSQLTuesday;
 

Now, if I want to backup and restore this database, I can run code like this following:

BACKUP DATABASE TSQLTuesday TO DISK = 'c:\temp\tst.bak' 
  WITH INIT, COMPRESSION, STATS = 10;
 
RESTORE DATABASE TSQLTuesday FROM DISK = 'c:\temp\tst.bak' 
  WITH REPLACE, RECOVERY, STATS = 10;
 

The output will look something like this (the system *tries* to report at exact % intervals, but it will sometimes be off by a bit):

11 percent processed.
22 percent processed.
30 percent processed.
41 percent processed.
52 percent processed.
60 percent processed.
71 percent processed.
82 percent processed.
90 percent processed.
100 percent processed.
Processed 288 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Processed 3 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
BACKUP DATABASE successfully processed 291 pages in 0.072 seconds (31.521 MB/sec).
11 percent processed.
22 percent processed.
30 percent processed.
41 percent processed.
52 percent processed.
60 percent processed.
71 percent processed.
82 percent processed.
90 percent processed.
100 percent processed.
Processed 288 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Processed 3 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
RESTORE DATABASE successfully processed 291 pages in 0.027 seconds (84.056 MB/sec).
 

 

There are a couple of undocumented trace flags you can add to get a lot more diagnostic information, either in the messages pane of Management Studio (using trace flag 3604), or in SQL Server's error log (using trace flag 3605).

Trace Flag 3014

The first trace flag for exposing more diagnostic information about the backup or restore operation is 3014. Let's run the backup again, and see the output in the messages pane:

DBCC TRACEON(3604, 3605, 3014);
 
BACKUP DATABASE TSQLTuesday TO DISK = 'c:\temp\tst3014.bak'
  WITH INIT, COMPRESSION, STATS = 10;
 

The output:

Backup(TSQLTuesday): BACKUP DATABASE started
Backup(TSQLTuesday): Opening the database with S lock
Backup(TSQLTuesday): Acquiring bulk-op lock on the database
Backup(TSQLTuesday): Synchronizing with other operations on the database is complete
Backup(TSQLTuesday): Opening the backup media set
Backup(TSQLTuesday): The backup media set is open
Backup/Restore buffer configuration parameters
Memory limit: 1535 MB
BufferCount: 7
Sets Of Buffers: 3
MaxTransferSize: 64 KB
Min MaxTransferSize: 64 KB
Total buffer space: 1 MB
Tabular data device count: 1
Fulltext data device count: 0
Filestream device count: 0
TXF device count: 0
Filesystem i/o alignment: 512
Media Buffer count: 7
Media Buffer size: 64 KB
Encode Buffer count: 7
Backup(TSQLTuesday): Preparing the media set for writing
Backup(TSQLTuesday): The media set is ready for backup
Backup(TSQLTuesday): Effective options: Checksum=0, Compression=1, Encryption=0, BufferCount=7, MaxTransferSize=64 KB
Backup(TSQLTuesday): Clearing differential bitmaps
Backup(TSQLTuesday): Differential bitmaps are cleared
Backup(TSQLTuesday): Writing a checkpoint
Backup(TSQLTuesday): Checkpoint is complete (elapsed = 4 ms)
Backup(TSQLTuesday): Start LSN: 38:240:37, SERepl LSN: 0:0:0
Backup(TSQLTuesday): Scanning allocation bitmaps
Backup(TSQLTuesday): Scanning allocation bitmaps is complete
Backup(TSQLTuesday): Calculating expected size of total data
Backup(TSQLTuesday): FID=1, ExpectedExtents=36, IsDifferentialMapAccurate=0
Backup(TSQLTuesday): TotalSize=2359296 bytes
Backup(TSQLTuesday): Estimated total size = 2375680 bytes (data size = 2359296 bytes, log size = 16384 bytes)
Backup(TSQLTuesday): Work estimation is complete
Backup(TSQLTuesday): Last LSN: 38:272:1
Backup(TSQLTuesday): Writing the leading metadata
Shared Backup BufferQ count: 7
Backup(TSQLTuesday): Calculating expected size of total data
Backup(TSQLTuesday): FID=1, ExpectedExtents=36, IsDifferentialMapAccurate=0
Backup(TSQLTuesday): TotalSize=2359296 bytes
Backup(TSQLTuesday): Copying data files
Backup(TSQLTuesday): Number of data file readers = 1
11 percent processed.
22 percent processed.
30 percent processed.
41 percent processed.
52 percent processed.
60 percent processed.
71 percent processed.
80 percent processed.
91 percent processed.
Backup(TSQLTuesday): InitialExpectedSize=2359296 bytes, FinalSize=2359296 bytes, ExcessMode=0
Backup(TSQLTuesday): Calculating expected size of total data
Backup(TSQLTuesday): FID=1, ExpectedExtents=36, IsDifferentialMapAccurate=0
Backup(TSQLTuesday): TotalSize=2359296 bytes
Backup(TSQLTuesday): Last LSN: 38:272:1
Backup(TSQLTuesday): Copying data files is complete
Processed 288 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Backup(TSQLTuesday): Copying transaction log
Backup(TSQLTuesday): MediaFamily(0): FID=2, VLFID=38, DataStreamSize=65536 bytes
100 percent processed.
Processed 2 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
Backup(TSQLTuesday): Copying transaction log is complete
Backup(TSQLTuesday): Writing the trailing metadata
Backup(TSQLTuesday): Writing the end of backup set
Backup(TSQLTuesday): Writing history records
Backup(TSQLTuesday): Writing history records is complete (elapsed = 19 ms)
BACKUP DATABASE successfully processed 290 pages in 0.053 seconds (42.747 MB/sec).
Backup(TSQLTuesday): BACKUP DATABASE finished
 

For a restore operation:

DBCC TRACEON(3604, 3605, 3014);
 
RESTORE DATABASE TSQLTuesday FROM DISK = 'c:\temp\tst3014.bak'
  WITH REPLACE, RECOVERY, STATS = 10;
 

The output is:

Restore(TSQLTuesday): Acquiring U lock on the database
Restore(TSQLTuesday): RESTORE DATABASE started
Restore(TSQLTuesday): Acquiring bulk-op lock on the database
Restore(TSQLTuesday): Opening the backup set
Restore(TSQLTuesday): Processing the leading metadata
Restore(TSQLTuesday): Planning begins
Restore(TSQLTuesday): Halting Fulltext crawls
Restore(TSQLTuesday): Acquiring X lock on the database
Restore(TSQLTuesday): Acquired X lock on the database
Backup/Restore buffer configuration parameters
Memory limit: 1023 MB
BufferCount: 6
Sets Of Buffers: 2
MaxTransferSize: 64 KB
Min MaxTransferSize: 64 KB
Total buffer space: 0 MB
Tabular data device count: 1
Fulltext data device count: 0
Filestream device count: 0
TXF device count: 0
Filesystem i/o alignment: 512
Media Buffer count: 6
Media Buffer size: 64 KB
Encode Buffer count: 6
Restore(TSQLTuesday): Effective options: Checksum=0, Compression=1, Encryption=0, BufferCount=6, MaxTransferSize=64 KB
Restore(TSQLTuesday): Planning is complete
Restore(TSQLTuesday): Beginning OFFLINE restore
Restore(TSQLTuesday): Preparing containers
Restore(TSQLTuesday): Containers are ready
Restore(TSQLTuesday): Restoring the backup set
Restore(TSQLTuesday): Estimated total size to transfer = 2375680 bytes
Restore(TSQLTuesday): Transferring data
11 percent processed.
22 percent processed.
30 percent processed.
41 percent processed.
52 percent processed.
60 percent processed.
71 percent processed.
80 percent processed.
91 percent processed.
100 percent processed.
Restore(TSQLTuesday): Data transfer is complete
Processed 288 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Processed 2 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
Restore(TSQLTuesday): Backup set is restored
Restore(TSQLTuesday): Offline roll-forward begins
Restore(TSQLTuesday): Processing 3 VLF headers
Restore(TSQLTuesday): Processing VLF headers is complete
Restore(TSQLTuesday): First LSN: 38:240:37, Last LSN: 38:272:1
Restore(TSQLTuesday): Stop LSN: 38:272:1
Restore(TSQLTuesday): Offline roll-forward is complete
Restore(TSQLTuesday): Database fixup is complete
Restore(TSQLTuesday): Transitioning database to ONLINE
Restore(TSQLTuesday): Restarting database for ONLINE
Restore(TSQLTuesday): Database is restarted
Restore(TSQLTuesday): Resuming any halted Fulltext crawls
Restore(TSQLTuesday): Writing history records
Restore(TSQLTuesday): Writing history records is complete (elapsed = 21 ms)
Restore(TSQLTuesday): MSDB maintenance is complete
RESTORE DATABASE successfully processed 290 pages in 0.029 seconds (78.125 MB/sec).
Restore(TSQLTuesday): RESTORE DATABASE finished
 

There are no timestamps on the output to the messages pane, so if you're using just 3604, you kind of have to watch the process and pay attention to when it "gets stuck" - and hope that it isn't a typical SSMS output buffer problem. But if you use 3605, you can see the timestamp for each step in the event log, so after the fact it should be easy to analyze and see where the time was spent.

Trace Flag 3004

Trace flag 3004 adds information to the output about file preparation, bitmaps, and instant file initialization (instant file initialization, which avoids the costly operation about zeroing out files, is only relevant for restore operations, and only for restoring data files).

For a backup:

DBCC TRACEON(3604, 3605, 3004);
 
BACKUP DATABASE TSQLTuesday TO DISK = 'c:\temp\tst3004.bak'
  WITH INIT, COMPRESSION, STATS = 10;
 

Output:

BackupDatabase: Database TSQLTuesday
Backup: Media open
Backup: Media ready to backup
Backup: Clearing differential bitmaps
Backup: Bitmaps cleared
BackupDatabase: Checkpoint done
BackupManager::SyncWithLog: SEReplLSN : 0 (0x00000000:00000000:0000)
BackupManager::SyncWithLog: Start LSN : 44000000012000065 (0x0000002c:00000078:0041)
Backup: Scanning allocation bitmaps
Backup: Done with allocation bitmaps
Calculating expected total data to write
Fid(1) ExpectedExt(38) DiffMapAccurate(0)
Total (2490368)
BackupDatabase: Work estimates done
Backup: Leading metadata section done
Calculating expected total data to write
Fid(1) ExpectedExt(38) DiffMapAccurate(0)
Total (2490368)
Backup:Copying data
Backup: DBReaderCount = 1
10 percent processed.
20 percent processed.
31 percent processed.
41 percent processed.
52 percent processed.
62 percent processed.
70 percent processed.
80 percent processed.
91 percent processed.
Backup data streams: Initial=2490368 Final=2490368 ExcessMode=0
Calculating expected total data to write
Fid(1) ExpectedExt(38) DiffMapAccurate(0)
Total (2490368)
BackupDatabase: Database files done
Processed 304 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Backup:Copying log
BackupLog: Family(0) StreamSize=0x10000
100 percent processed.
BackupLog: Family(0) StreamSize=0x0
BackupLog: Family(0) StreamSize=0x0
Processed 3 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
BackupDatabase : Log files done
Backup: Trailing config done
Backup: MBC done
BackupDatabase: Writing history records
Writing backup history records
BACKUP DATABASE successfully processed 307 pages in 0.044 seconds (54.509 MB/sec).
BackupDatabase: Finished
 

For a restore, the additional information that is included about instant file initialization under 3004 is only directed to the error log, not to the messages pane. For this operation:

DBCC TRACEON(3604, 3605, 3004);
 
RESTORE DATABASE TSQLTuesday FROM DISK = 'c:\temp\tst3004.bak'
  WITH REPLACE, RECOVERY, STATS = 10;
 

In the messages pane, we see this:

RestoreDatabase: Database TSQLTuesday
Opening backup set
Restore: Configuration section loaded
Restore: Backup set is open
Restore: Planning begins
Restore: Planning complete
Restore: BeginRestore (offline) on TSQLTuesday
Restore: Attached database TSQLTuesday as DBID=12
Restore: PreparingContainers
Restore: Containers are ready
Restore: Restoring backup set
Restore: Transferring data to TSQLTuesday
10 percent processed.
20 percent processed.
31 percent processed.
41 percent processed.
52 percent processed.
62 percent processed.
70 percent processed.
80 percent processed.
91 percent processed.
100 percent processed.
FileHandleCache: 0 files opened. CacheSize: 12
Restore: Data transfer complete on TSQLTuesday
Processed 304 pages for database 'TSQLTuesday', file 'TSQLTuesday' on file 1.
Processed 3 pages for database 'TSQLTuesday', file 'TSQLTuesday_log' on file 1.
Restore: Backup set restored
Restore-Redo begins on database TSQLTuesday
Restore VLF headers for 3 headers on database TSQLTuesday
Restore VLF headers completed on database TSQLTuesday
RunOfflineRedo logIter.Init(): FirstLsn(PruId: 0): 0x2c:0x78:0x41
RunOfflineRedo logIter.Init(): LastLsn(PruId: 0): 0x2c:0xa8:0x1
OfflineRollforward: StopLsn/LastLsn(PruId: 0): 0x2c:0xa8:0x1
Rollforward complete on database TSQLTuesday
Restore: Done with fixups
Restore: Transitioning database to ONLINE
Restore: Restarting database for ONLINE
Restore: Database is restarted
Resuming any halted fulltext crawls
Restore: Writing history records
Writing backup history records
Restore: Done with MSDB maintenance
RESTORE DATABASE successfully processed 307 pages in 0.036 seconds (66.623 MB/sec).
RestoreDatabase: Finished
 

If you go to the error log, though, you'll see this information:

Zeroing C:\…\TSQLTuesday_log.ldf from page 1 to 102 (0x2000 to 0xcc000)
Zeroing completed on C:\…\TSQLTuesday_log.ldf
 

 

(Microsoft's Bob Dorr has written about TF 3004 before, but notes that it is undocumented and unsupported.)
If instant file initialization is not enabled, you'll see additional zeroing data for the data files (not necessarily in this order):

Zeroing C:\…\TSQLTuesday.mdf from page 1 to 408 (0x2000 to 0x330000)
Zeroing completed on C:\…\TSQLTuesday.mdf
Zeroing C:\…\TSQLTuesday_log.ldf from page 1 to 102 (0x2000 to 0xcc000)
Zeroing completed on C:\…\TSQLTuesday_log.ldf
 

 

There are some scenarios that can prevent instant file initialization from helping the restore process, such as:

  • the privilege not being granted to the SQL Server service account;
  • the use of Transparent Data Encryption (TDE);
  • the use of trace flag 1806, which disables instant file initialization;

For more details on instant file initialization, see these posts:

Summary

If you are trying to monitor a sluggish backup or restore process, these trace flags should be able to help you determine where exactly the slowdown is occurring - it could be processing undo/redo operations, VLFs, Full-text, or writing to msdb. It could be something as simple as the fact that you expected instant file initialization to be enabled, and it wasn't. There could be other things that affect backup or restore performance but that won't become apparent under these trace flags; for example, if you are using backup compression, but trace flag 3042 is enabled, this flag changes the way the backup file is allocated during the process (documented here).

I don't recommend running these trace flags globally. In fact, I highly recommend running 3226 in general, which does something like the opposite - it prevents all of the successful backup message noise from polluting the error log. Note that 3004 and 3014 are not completely trumped by 3226. Having 3226 enabled globally will not prevent you from using these flags for troubleshooting purposes; the output will just be missing these lines:

BACKUP DATABASE successfully processed 290 pages in 0.053 seconds (42.747 MB/sec).
RESTORE DATABASE successfully processed 290 pages in 0.029 seconds (78.125 MB/sec).
 
For more best practices for restoring databases, you can check out this updated guide on how to restore databases from SQL Server.

Aaron (@AaronBertrand) is a Data Platform MVP with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com. Aaron's blog focuses on T-SQL bad habits and best practices, as well as coverage of updates and new features in Plan Explorer, SentryOne, and SQL Server.


Comments