Introduction

I recently saw an issue where a large number of databases were consolidated onto a single server. Though they were backed up on a proper full -> differential -> log cycle, and the starting day for each cycle was roughly spread evenly over the week, the large variance of characteristics between each database resulted in backups running far too long on some days and terminating too early on others.

But when you have a large list of databases how can you evenly distribute the load over the week without manual calculation and/or guesswork? Wouldn't it be nice if there were a simple solution?

I searched for an algorithm and found it was a known and mathematically complicated problem (which makes it too advanced for me) but that luckily there was a simple approximation at a solution. Interestingly it was based on how children pick players for their sports teams, and it goes like this:

  • Pick a measure for the databases (cost).
  • Construct a table of databases ordered from most cost to least cost.
  • Construct a table of the backup days, with slots for database name and cost.
  • Iterate the database table.
  • Find the next slot in the table of backup days with the least cumulative cost.
  • Move the database into that slot.
  • Repeat until there are no databases left.
  • You are left with a list backup days, databases, and costs that balance out surprisingly well.

You can tweak the algorithm if you feel the need to. For example I did not experiment with this but I read that it may be more efficient to order the input databases from largest to median cost descending, then from smallest to median cost ascending.

The reason I keep repeating cost instead of being specific is that how you balance your backups depends on what aspect of the databases you prefer to optimize for. Some examples of possible costs include:

  • The count of databases that will be backed up (the simplest metric I wanted to avoid).
  • The total size of the data files - sounds reasonable (maybe an IO read metric).
  • The total size of the data files plus the log files - might be reasonable (maybe an IO read metric)
  • The total size of the last full backup that was generated with or without compression (post-processing metrics).

My problem was specifically to do with limited disk space to store, CPU to compress, and bandwidth to transfer the backups to an offsite location. So I chose the last cost, full backup size, or the data file size if this data wasn't available.

Select	sd.name As Database_Name,
		-- Sum(mf.Size / 128.0) As Data_File_Size,
		-- A.Compressed_Backup_Size / 1048576 As Compressed_Database_Backup_Size,
	  	Coalesce(A.Compressed_Backup_Size / 1048576, Sum(mf.Size / 128.0) As Database_Cost
Into	#Database_List
From	sys.sysdatabases sd
Outer Apply (
		Select	Top 1 bs.Compressed_Backup_Size
		From	msdb.dbo.backupset bs
		Where	sd.name = bs.Database_Name
		And		bs.type = 'D' -- Full backup
		Order By bs.Backup_Start_Date Desc
		) A
Join	sys.master_files mf
On		mf.Database_Id = sd.DbId
And     mf.Type_Desc = 'rows' -- Data files only
Where   sd.name  'tempdb' -- This doesn't get backed up ...
Group By sd.name, A.Compressed_Backup_Size

To test it out I created a system with 30 databases of wildly varying costs. Here is how to reconstruct the temporary table so we can get started:

Set NoCount On

If Object_Id('tempdb..#Database_List') Is Not Null
        Drop Table #Database_List
Go

Create Table #Database_List (
	Database_Name sysname Null,
	Database_Cost int Null
)
Go

Insert #Database_List
Select 'T01' As Database_Name, 683 As Database_Cost Union
Select 'T02' As Database_Name, 247 As Database_Cost Union
Select 'T03' As Database_Name, 210 As Database_Cost Union
Select 'T04' As Database_Name, 405 As Database_Cost Union
Select 'T05' As Database_Name, 279 As Database_Cost Union
Select 'T06' As Database_Name, 617 As Database_Cost Union
Select 'T07' As Database_Name, 644 As Database_Cost Union
Select 'T08' As Database_Name, 881 As Database_Cost Union
Select 'T09' As Database_Name, 160 As Database_Cost Union
Select 'T10' As Database_Name, 341 As Database_Cost Union
Select 'T11' As Database_Name,  66 As Database_Cost Union
Select 'T12' As Database_Name, 839 As Database_Cost Union
Select 'T13' As Database_Name,  67 As Database_Cost Union
Select 'T14' As Database_Name, 336 As Database_Cost Union
Select 'T15' As Database_Name, 250 As Database_Cost Union
Select 'T16' As Database_Name, 602 As Database_Cost Union
Select 'T17' As Database_Name, 295 As Database_Cost Union
Select 'T18' As Database_Name, 392 As Database_Cost Union
Select 'T19' As Database_Name, 858 As Database_Cost Union
Select 'T20' As Database_Name, 799 As Database_Cost Union
Select 'T21' As Database_Name, 680 As Database_Cost Union
Select 'T22' As Database_Name, 986 As Database_Cost Union
Select 'T23' As Database_Name, 598 As Database_Cost Union
Select 'T24' As Database_Name, 248 As Database_Cost Union
Select 'T25' As Database_Name, 198 As Database_Cost Union
Select 'T26' As Database_Name, 259 As Database_Cost Union
Select 'T27' As Database_Name, 358 As Database_Cost Union
Select 'T28' As Database_Name, 183 As Database_Cost Union
Select 'T29' As Database_Name, 719 As Database_Cost Union
Select 'T30' As Database_Name, 265 As Database_Cost
Go

As you can imagine it would be very difficult to manually distribute backups across the week on a set of data like this. Until now. Create a second table that will hold our results when they have been sorted into days of the week.

If	Object_id('tempdb..#Backup_List') Is Not Null
	Drop Table #Backup_List
Go

Create Table #Backup_List (
	Backup_Day int,
	Database_Name sysname NULL,
	Database_Cost int
)

Insert	#Backup_List
Select	0, Null, 0 Union
Select	1, Null, 0 Union
Select	2, Null, 0 Union
Select	3, Null, 0 Union
Select	4, Null, 0 Union
Select	5, Null, 0 Union
Select	6, Null, 0

This is the workhorse. You could use a cursor. I didn't want to.

Declare @Backup_Day int
Declare @Database_Name sysname
Declare @Database_Cost int

While (1 = 1)
Begin
	Set		@Database_Name = Null

	Select	Top 1
			@Database_Name = Database_Name,
			@Database_Cost = Database_Cost
	From	#Database_List
	Order By Database_Cost Desc, Database_Name Desc

	If	@Database_Name Is Null
		Break

	Delete
	From	#Database_List
	Where	Database_Name = @Database_Name

	Insert	#Backup_List
	Select	Backup_Day, @Database_Name, @Database_Cost
	From	(
			Select	Top 1 Backup_Day, Sum(Database_Cost) As Database_Cost
			From	#Backup_List
			Group By Backup_Day
			Order	By Sum(Database_Cost) Asc, Backup_Day Asc
	) X
End

We are left with a backup load that is distributed over the days:

Select	'Backup Day Distribution' As Information,
		DateName(dw, Backup_Day) As Backup_Day,
		Count(Database_Name) As Database_Count,
		Sum(Database_Cost) As Total_Cost
From	#Backup_List
Where	Database_Name Is Not Null
Group By Backup_Day
Order By #Backup_List.Backup_Day
Information Backup_Day Database_Count Total_Cost
Backup Day Distribution Monday 4 1921
Backup Day Distribution Tuesday 4 1894
Backup Day Distribution Wednesday 5 1942
Backup Day Distribution Thursday 5 1955
Backup Day Distribution Friday 4 1942
Backup Day Distribution Saturday 4 1902
Backup Day Distribution Sunday 4 1909

And the databases for each day:

Select	'Backup Day Detail' As Information,
		DateName(dw, Backup_Day) As Backup_Day,
		Database_Name,
		Database_Cost
From	#Backup_List
Where	Database_Name Is Not Null
Order By #Backup_List.Backup_Day, Database_Name
Information Backup_Day Database_Name Database_Cost
Backup Day Detail Monday T17 295
Backup Day Detail Monday T18 392
Backup Day Detail Monday T22 986
Backup Day Detail Monday T24 248
Backup Day Detail Tuesday T04 405
Backup Day Detail Tuesday T08 881
Backup Day Detail Tuesday T15 250
Backup Day Detail Tuesday T27 358
Backup Day Detail Wednesday T09 160
Backup Day Detail Wednesday T13 67
Backup Day Detail Wednesday T19 858
Backup Day Detail Wednesday T23 598
Backup Day Detail Wednesday T26 259
Backup Day Detail Thursday T11 66
Backup Day Detail Thursday T12 839
Backup Day Detail Thursday T16 602
Backup Day Detail Thursday T28 183
Backup Day Detail Thursday T30 265
Backup Day Detail Friday T02 247
Backup Day Detail Friday T05 279
Backup Day Detail Friday T06 617
Backup Day Detail Friday T20 799
Backup Day Detail Saturday T07 644
Backup Day Detail Saturday T10 341
Backup Day Detail Saturday T25 198
Backup Day Detail Saturday T29 719
Backup Day Detail Sunday T01 683
Backup Day Detail Sunday T03 210
Backup Day Detail Sunday T14 336
Backup Day Detail Sunday T21 680

Pretty cool. Now you just need to create all of those backup schedules