Automatic Availability Groups for Sharepoint on SQL Server

One of the downsides of AlwaysOn Availability Groups (AOAG) is how new databases are not automatically added to a group and protected; unlike traditional Failover Cluster Instances with shared disks where everything is automatically protected from outages and with no further setup required.

But it's not difficult to change this.

Sharepoint is particularly vulnerable without this because it is usually configured to create a new database whenever a new site is created. It adds additional workload or processes for Sharepoint admins to notify DBAs of new databases (or for DBAs to monitor for new databases), and leaves a significant gap where an AOAG failover can occur shortly after a site is created and the site then goes offline.

It's worth pointing out that Microsoft seems to have noticed this and now includes Sharepoint cmdlets so that Sharepoint admins with SQL Server sysadmin privileges can take ownership of this responsibility and run Add-DatabaseToAvailabilityGroup to manage this without further assistance from the DBAs.

But it still requires a bit of chutzpah to detect and automate this.

While those cmdlets aren't available outside of the servers running Sharepoint, I spent some time automating a similar solution on the SQL Server side. At first it didn't come to me because I avoid taking dependencies on SQL Server cmdlets due to working in heavily mixed environments; but with AOAG you can guarantee it's running PowerShell 2.0 and a set of basic cmdlets that will get the job done.

Requirements

AOAG up and running.
A Windows domain account with sysadmin permission on both servers.
A secure network fileshare accessible from each server by the SQL Server engine and the user.
An SQL Server credential for the user.
An SQL Server Agent PowerShell proxy for the user.

Example

I've created a basic two node cluster, a Sharepoint_Dummy database, then used the SQL Server AOAG Wizard to create the AOAG Sharepoint_Group. I've included the script for completeness but you won't need it.

I've also created a fileshare \\W08R2DC\FILESHARE\Adhoc\W08R2S14C1$AOAG_Group. This naming convention comes from Ola Hallengren's backup scripts, so even though they aren't used here, you can just point it at the root path you normally use and go.

I've granted access to the fileshare for the database engine service account and a new Windows user SAFESQL\AOAG_AUTOMATIC. On each node I create a corresponding login, add it to the sysadmin server role, and also create a credential and PowerShell proxy for use later.

Create Login [SAFESQL\AOAG_AUTOMATIC] From Windows
Alter Server Role sysadmin Add Member [SAFESQL\AOAG_AUTOMATIC]

Create Credential [SAFESQL\AOAG_AUTOMATIC] With Identity = 'SAFESQL\AOAG_AUTOMATIC', SECRET = 'My_Password123'

Exec	msdb.dbo.sp_add_proxy 
		@proxy_name=N'SAFESQL\AOAG_AUTOMATIC',
		@credential_name=N'SAFESQL\AOAG_AUTOMATIC', 
		@enabled=1
Exec	msdb.dbo.sp_grant_proxy_to_subsystem 
		@proxy_name=N'SAFESQL\AOAG_AUTOMATIC', 
		@subsystem_id=12

Now we're ready to go. Create an agent job (I name mine Automatic Availability Groups) with a PowerShell step running the below script under your proxy. Schedule it to run on whatever window you feel comfortable with, perhaps once an hour, and it's also useful to tick the checkbox that includes step output in job history to help with investigating and resolving errors.

# Be careful not to reference variables in strings using $ ( ) as this is 
# interpreted as a token in SQL Agent PowerShell jobs. Also on Windows
# Server 2008 R2 only PowerShell 2.0 will be available so remember to do
# $null checks before iterating arrays.
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"

Import-Module SQLPS -DisableNameChecking
Set-Location C:

# User options
$backupRoot = "\\W08R2DC\FILESHARE\Adhoc"
$removeBackupFile = $false

# Suffix for all backup files in this session
$fileSuffix = (Get-Date).ToString("_yyyy-MM-dd-hh-mm-ss")

# If we're called from an agent, only look for databases on that instance, otherwise do all available instances on this server
try {
    $caller = "$(ESCAPE_NONE(SRVR))"
    $instances = Get-Item (Join-Path "SQLSERVER:\SQL" $caller)
} catch {
    $instances = Get-ChildItem (Join-Path "SQLSERVER:\SQL" $env:COMPUTERNAME) | %{
        Get-Item $_.PSPath
    }
}

if (!$instances) {
	Write-Error "Abort: No instances found."
}

foreach ($instance in $instances) {
    $serverInstance = $instance.Name
    $clusterName = $instance.ClusterName
    Write-Output $serverInstance

    # Only required on instances with availability groups
    if ($instance.AvailabilityGroups) {
        # By default we will put everything in the first availability group
        $availabilityGroup = $instance.AvailabilityGroups | Sort Name | Select-Object -First 1
        $availabilityGroupName = $availabilityGroup.Name
        
        # Only continue processing if this is the primary
        if ($availabilityGroup.LocalReplicaRole -eq "Primary") {
            Write-Output "`tCluster $clusterName"
            Write-Output "`tAG      $availabilityGroupName"
                            
            # Construct a root path based on Ola Hallengren's standard
            $backupPath = "$backupRoot\$clusterName`$$availabilityGroupName"
            if (Test-Path $backupPath) {
                # Identify secondary replicas and databases which are not in any group
                $secondaries = $availabilityGroup.AvailabilityReplicas | Where { $_.Role -eq "Secondary" } | %{ $_.Refresh(); $_ } | Select -ExpandProperty Name
                $databases = Get-ChildItem (Join-Path $instance.PSPath "Databases") | %{ $_.Refresh(); $_ } | Where-Object { $_.AvailabilityGroupName -eq "" } | Select -ExpandProperty Name
                
                foreach ($database in ($databases | ?{$_})) {
                    # Ensure that the secondaries are free from other copies of this database, otherwise abort
                    $secondaryDatabasesExist = foreach ($secondary in $secondaries) {
                        if (Get-ChildItem "SQLSERVER:\SQL\$secondary\Databases" | %{ $_.Refresh(); $_ } | Where { $_.Name -eq $database -and $_.Status -ne "Restoring" }) {
                            $secondary
                        }
                    }

                    if (!$secondaryDatabasesExist) {
                        Write-Output "`tDatabase $database"

                        try {
                            # Database must be set to Full recovery model before adding it to the group (some search databases are created in Simple recovery model)
                            $databaseModel = Get-Item (Join-Path (Join-Path $instance.PSPath "Databases") $database)
                            if ($databaseModel.RecoveryModel -ne "Full") {
                                Write-Output "`t`tSwitching to Full recovery model"
                                $databaseModel.RecoveryModel = "Full"
                                $databaseModel.Alter()
                            }

                            # Backup the full and log to the path
                            $backupFile = "$backupPath\$database$fileSuffix.bak"
                            Write-Output "`t`tBacking up to $backupFile"

                            Backup-SqlDatabase -ServerInstance $serverInstance -Database $database -BackupFile $backupFile -BackupAction "Database"
                            Backup-SqlDatabase -ServerInstance $serverInstance -Database $database -BackupFile $backupFile -BackupAction "Log"

                            # Restore on all secondaries
                            foreach ($secondary in $secondaries) {
                                Write-Output "`t`tRestoring on $secondary"
                                Restore-SqlDatabase -ServerInstance $secondary -Database $database -BackupFile $backupFile -FileNumber 1 -NoRecovery
                                Restore-SqlDatabase -ServerInstance $secondary -Database $database -BackupFile $backupFile -FileNumber 2 -NoRecovery
                            }

                            if ($removeBackupFile) {
                                Remove-Item $backupFile
                            }

                            # Join the database to the group on the primary and secondaries
                            Write-Output "`t`tJoining $serverInstance to $availabilityGroupName"
                            Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$serverInstance\AvailabilityGroups\$availabilityGroupName" -Database $database

                            foreach ($secondary in $secondaries) {
                                Write-Output "`t`tJoining $secondary to $availabilityGroupName (Secondary)"
                                Add-SqlAvailabilityDatabase -Path "SQLSERVER:\SQL\$secondary\AvailabilityGroups\$availabilityGroupName" -Database $database
                            }
                        } catch {
                            Write-Error "`t`tAbort: $_"
                        }
                    } else {
                        Write-Error "`t`tAbort: Already exists on $secondaryDatabasesExist"
                    }
                }
            } else {
                Write-Error "`tAbort: Backups path does not exist, $backupPath"
            }
        } else {
            Write-Output "`tSkip: Not the primary."
        }
    } else {
        Write-Output "`tSkip: No availability groups."
    }
}

This is the logic used:

Set the root path for our fileshare. It throws an immediate failure if it's not available, whether it's needed or not, because this is the kind of thing you need to know before you need it.
SQL Server Agent will pass in the instance name so that the script will only run on that instance. However if nothing was passed in (for example if you want to try the script manually) it will iterate all instances on the server (or whatever is in the SQLSERVER: drive cache).
- Check the instance has an Availability Group and is the primary.
- Fetch a list of databases that are not in any existing groups.
  - Make sure the database is not in a non-recovery state on any secondaries.
  - Switch the database to full recovery model if required.
  - Backup the database (full and log) into one file.
  - Restore the database on each secondary.
  - Optionally remove the backup file.
  - Join the primary and secondaries to the group.

The reason I leave the backup file by default is because it may be the only backup available until your scheduled backups kick in. For this reason you would need to add in another cleanup job later.

If any errors occur the job will fail and exit immediately. You can modify it to make it continue and throw an error at the end but it seems safer to just abort so that it can be picked up by your monitoring or daily job checks.

One frequent cause for failures is if log backups occur while this job is running (i.e. after it has taken its initial backups but before it finishes restoring on the secondary and joining the database to the AG). This is more likely if you have log backups scheduled with a high frequency such as every minute. In this case you could add logic to the script to pickup the last log backup and restore that as well.

So what does the output look like after our initial setup? On one server it will tell you that it's not the primary, and exit safely. On the other it will report the instance, cluster, and group details but find no databases to add.

Now we'll test adding a new database to the primary node, as if Sharepoint had just created a site, wait for the job to run, then inspect the output.

Create Database Sharepoint_New

Ta-da.

You've now got Automatic Availability Groups.

It's worthwhile to point out that you still need to take care when making changes on any node to make the same change on other nodes (I'm talking about synchronising the logins and agent jobs). But these aren't normally added through Sharepoint.

Requirements

Example

Share on: