I recently saw an issue where a large number of databases were consolidated onto a single server. Though they were backed up on a proper full -> differential -> log cycle, and the starting day for each cycle was roughly spread evenly over the week, the large variance of characteristics between each database resulted in backups running far too long on some days and terminating too early on others.
But when you have a large list of databases how can you evenly distribute the load over the week without manual calculation and/or guesswork? Wouldn't it be nice if there were a simple solution?
I searched for an algorithm and found it was a known and mathematically complicated problem (which makes it too advanced for me) but that luckily there was a simple approximation at a solution. Interestingly it was based on how children pick players for their sports teams, and it goes like this:
- Pick a measure for the databases (cost).
- Construct a table of databases ordered from most cost to least cost.
- Construct a table of the backup days, with slots for database name and cost.
- Iterate the database table.
- Find the next slot in the table of backup days with the least cumulative cost.
- Move the database into that slot.
- Repeat until there are no databases left.
- You are left with a list backup days, databases, and costs that balance out surprisingly well.
You can tweak the algorithm if you feel the need to. For example I did not experiment with this but I read that it may be more efficient to order the input databases from largest to median cost descending, then from smallest to median cost ascending.
The reason I keep repeating cost instead of being specific is that how you balance your backups depends on what aspect of the databases you prefer to optimize for. Some examples of possible costs include:
- The count of databases that will be backed up (the simplest metric I wanted to avoid).
- The total size of the data files - sounds reasonable (maybe an IO read metric).
- The total size of the data files plus the log files - might be reasonable (maybe an IO read metric)
- The total size of the last full backup that was generated with or without compression (post-processing metrics).
My problem was specifically to do with limited disk space to store, CPU to compress, and bandwidth to transfer the backups to an offsite location. So I chose the last cost, full backup size, or the data file size if this data wasn't available.
Select sd.name As Database_Name, -- Sum(mf.Size / 128.0) As Data_File_Size, -- A.Compressed_Backup_Size / 1048576 As Compressed_Database_Backup_Size, Coalesce(A.Compressed_Backup_Size / 1048576, Sum(mf.Size / 128.0) As Database_Cost Into #Database_List From sys.sysdatabases sd Outer Apply ( Select Top 1 bs.Compressed_Backup_Size From msdb.dbo.backupset bs Where sd.name = bs.Database_Name And bs.type = 'D' -- Full backup Order By bs.Backup_Start_Date Desc ) A Join sys.master_files mf On mf.Database_Id = sd.DbId And mf.Type_Desc = 'rows' -- Data files only Where sd.name 'tempdb' -- This doesn't get backed up ... Group By sd.name, A.Compressed_Backup_Size
To test it out I created a system with 30 databases of wildly varying costs. Here is how to reconstruct the temporary table so we can get started:
Set NoCount On If Object_Id('tempdb..#Database_List') Is Not Null Drop Table #Database_List Go Create Table #Database_List ( Database_Name sysname Null, Database_Cost int Null ) Go Insert #Database_List Select 'T01' As Database_Name, 683 As Database_Cost Union Select 'T02' As Database_Name, 247 As Database_Cost Union Select 'T03' As Database_Name, 210 As Database_Cost Union Select 'T04' As Database_Name, 405 As Database_Cost Union Select 'T05' As Database_Name, 279 As Database_Cost Union Select 'T06' As Database_Name, 617 As Database_Cost Union Select 'T07' As Database_Name, 644 As Database_Cost Union Select 'T08' As Database_Name, 881 As Database_Cost Union Select 'T09' As Database_Name, 160 As Database_Cost Union Select 'T10' As Database_Name, 341 As Database_Cost Union Select 'T11' As Database_Name, 66 As Database_Cost Union Select 'T12' As Database_Name, 839 As Database_Cost Union Select 'T13' As Database_Name, 67 As Database_Cost Union Select 'T14' As Database_Name, 336 As Database_Cost Union Select 'T15' As Database_Name, 250 As Database_Cost Union Select 'T16' As Database_Name, 602 As Database_Cost Union Select 'T17' As Database_Name, 295 As Database_Cost Union Select 'T18' As Database_Name, 392 As Database_Cost Union Select 'T19' As Database_Name, 858 As Database_Cost Union Select 'T20' As Database_Name, 799 As Database_Cost Union Select 'T21' As Database_Name, 680 As Database_Cost Union Select 'T22' As Database_Name, 986 As Database_Cost Union Select 'T23' As Database_Name, 598 As Database_Cost Union Select 'T24' As Database_Name, 248 As Database_Cost Union Select 'T25' As Database_Name, 198 As Database_Cost Union Select 'T26' As Database_Name, 259 As Database_Cost Union Select 'T27' As Database_Name, 358 As Database_Cost Union Select 'T28' As Database_Name, 183 As Database_Cost Union Select 'T29' As Database_Name, 719 As Database_Cost Union Select 'T30' As Database_Name, 265 As Database_Cost Go
As you can imagine it would be very difficult to manually distribute backups across the week on a set of data like this. Until now. Create a second table that will hold our results when they have been sorted into days of the week.
If Object_id('tempdb..#Backup_List') Is Not Null Drop Table #Backup_List Go Create Table #Backup_List ( Backup_Day int, Database_Name sysname NULL, Database_Cost int ) Insert #Backup_List Select 0, Null, 0 Union Select 1, Null, 0 Union Select 2, Null, 0 Union Select 3, Null, 0 Union Select 4, Null, 0 Union Select 5, Null, 0 Union Select 6, Null, 0
This is the workhorse. You could use a cursor. I didn't want to.
Declare @Backup_Day int Declare @Database_Name sysname Declare @Database_Cost int While (1 = 1) Begin Set @Database_Name = Null Select Top 1 @Database_Name = Database_Name, @Database_Cost = Database_Cost From #Database_List Order By Database_Cost Desc, Database_Name Desc If @Database_Name Is Null Break Delete From #Database_List Where Database_Name = @Database_Name Insert #Backup_List Select Backup_Day, @Database_Name, @Database_Cost From ( Select Top 1 Backup_Day, Sum(Database_Cost) As Database_Cost From #Backup_List Group By Backup_Day Order By Sum(Database_Cost) Asc, Backup_Day Asc ) X End
We are left with a backup load that is distributed over the days:
Select 'Backup Day Distribution' As Information, DateName(dw, Backup_Day) As Backup_Day, Count(Database_Name) As Database_Count, Sum(Database_Cost) As Total_Cost From #Backup_List Where Database_Name Is Not Null Group By Backup_Day Order By #Backup_List.Backup_Day
|Backup Day Distribution||Monday||4||1921|
|Backup Day Distribution||Tuesday||4||1894|
|Backup Day Distribution||Wednesday||5||1942|
|Backup Day Distribution||Thursday||5||1955|
|Backup Day Distribution||Friday||4||1942|
|Backup Day Distribution||Saturday||4||1902|
|Backup Day Distribution||Sunday||4||1909|
And the databases for each day:
Select 'Backup Day Detail' As Information, DateName(dw, Backup_Day) As Backup_Day, Database_Name, Database_Cost From #Backup_List Where Database_Name Is Not Null Order By #Backup_List.Backup_Day, Database_Name
|Backup Day Detail||Monday||T17||295|
|Backup Day Detail||Monday||T18||392|
|Backup Day Detail||Monday||T22||986|
|Backup Day Detail||Monday||T24||248|
|Backup Day Detail||Tuesday||T04||405|
|Backup Day Detail||Tuesday||T08||881|
|Backup Day Detail||Tuesday||T15||250|
|Backup Day Detail||Tuesday||T27||358|
|Backup Day Detail||Wednesday||T09||160|
|Backup Day Detail||Wednesday||T13||67|
|Backup Day Detail||Wednesday||T19||858|
|Backup Day Detail||Wednesday||T23||598|
|Backup Day Detail||Wednesday||T26||259|
|Backup Day Detail||Thursday||T11||66|
|Backup Day Detail||Thursday||T12||839|
|Backup Day Detail||Thursday||T16||602|
|Backup Day Detail||Thursday||T28||183|
|Backup Day Detail||Thursday||T30||265|
|Backup Day Detail||Friday||T02||247|
|Backup Day Detail||Friday||T05||279|
|Backup Day Detail||Friday||T06||617|
|Backup Day Detail||Friday||T20||799|
|Backup Day Detail||Saturday||T07||644|
|Backup Day Detail||Saturday||T10||341|
|Backup Day Detail||Saturday||T25||198|
|Backup Day Detail||Saturday||T29||719|
|Backup Day Detail||Sunday||T01||683|
|Backup Day Detail||Sunday||T03||210|
|Backup Day Detail||Sunday||T14||336|
|Backup Day Detail||Sunday||T21||680|
Pretty cool. Now you just need to create all of those backup schedules