I've been hearing about round-robin read-only routing ever since SQL 2016 came out but whenever I tried to test if it's working it never seemed to be. But now I know exactly how it works and there's a few loopholes where it may not trigger, and they're not the documented ones you're thinking of.

To test the limits of it you're going to need:

  • PowerShell 5.1
  • Pester 4 (Install-Module Pester -Force)
  • DbData (Install-Module DbData -Force)

I'll explain any of the Pester and DbData bits along the way so don't worry. They're minor framework stuff.

Here's my test environment.

  • C1N1, C1N2, C1N3 readable replicas all running on port 1433.
  • C1N3 is the primary with a read-only routing list group of (C1N1, C1N2).
  • There is a listener AG1L on port 1433 with a database named Test.
  • I'm running my tests from a workstation W1.

Let's do a quick Pester test to confirm a single read-only routing occurs.

Describe "Read only routing" {
    $listener = "AG1L"
    $database = "Test"

    Context "SQL 2012 and SQL 2014 one time redirect" {
        It "Using application intent" {
            $readWrite = New-DbConnection $listener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly = New-DbConnection $listener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly"
            $readWrite | Should -Not -Be $readOnly
        }
        # ...
    }

    # ...
}

The Pester portions are these keywords:

  • Describe, Context, and It

    They're nested blocks with a natural language string that describes what you're testing. The blocks are mostly for visual purposes, but also create scopes to insulate variables and things from each other.

  • Should

    This is what triggers an It block to fail if whatever condition is not met.

The DbData portions handle SQL Server and follow a Connect | Command | Data format:

  • New-DbConnection takes a server name and database name to create a SQL connection.
  • New-DbCommand creates a command.
  • Get-DbData executes it and returns a result.

Here's the output:

It works. But for fun let's start also testing things that should not work. For example:

  • You must connect to a database in the AG, so it shouldn't work if you connect to master.
  • You must connect to the listener, so it shouldn't work if you connect to the instance.
        $wrongDatabase = "master"
        It "Not when using the wrong database" {
            $readWrite = New-DbConnection $listener $wrongDatabase -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly = New-DbConnection $listener $wrongDatabase -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly"
            $readWrite | Should -Be $readOnly
        }

        $wrongListener = "C1N3"
        It "Not when using a replica instead of a listener" {
            $readWrite = New-DbConnection $wrongListener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly = New-DbConnection $wrongListener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly"
            $readWrite | Should -Be $readOnly
        }

Excellent. To be thorough I created some other tests like connecting to the listener IP address and using a port number, but these all passed as if using the listener name, and so are boring and I won't reproduce them here

Let's experiment with round-robin read-only routing using a new Context section:

    Context "SQL 2016+ round robin redirect" {
        It "Test when using two connections" {
            $readWrite = New-DbConnection $listener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly = New-DbConnection $listener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly2 = New-DbConnection $listener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly then $readOnly2"
            $readWrite | Should -Not -Be $readOnly
            $readOnly | Should -Not -Be $readOnly2
        }
        # ...
    }

Wah wah! It doesn't work. That's totally not what the blog posts would lead you to believe. Maybe the two reads need to happen over the same connection…?

        It "Test when re-using one connection" {
            $readWrite = New-DbConnection $listener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnlyConnection = New-DbConnection $listener $database -ApplicationIntent ReadOnly
            $readOnly = $readOnlyConnection | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly2 = $readOnlyConnection | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly then $readOnly2"
            $readWrite | Should -Not -Be $readOnly
            $readOnly | Should -Not -Be $readOnly2
        }

Wah wah! This is starting to look a bit shit so let's fall back to using sqlcmd the way they do in the Microsoft blog posts.

        It "Test when using sqlcmd" {
            $readWrite = &sqlcmd.exe -S $listener -d $database -E -W -h -1 -Q "SET NOCOUNT ON; SELECT @@SERVERNAME AS ServerName;"
            $readOnly = &sqlcmd.exe -S $listener -d $database -E -K readonly -W -h -1 -Q "SET NOCOUNT ON; SELECT @@SERVERNAME AS ServerName;"
            $readOnly2 = &sqlcmd.exe -S $listener -d $database -E -K readonly -W -h -1 -Q "SET NOCOUNT ON; SELECT @@SERVERNAME AS ServerName;"
            Write-Host "Connected to $readWrite then $readOnly then $readOnly2"
            $readWrite | Should -Not -Be $readOnly
            $readOnly | Should -Not -Be $readOnly2        
        }

If you're starting to scratch your head wondering what the hell is going on you're not the only one. But either way - most people who read the Microsoft documentation are going to be using either of the first two methods and not spawning sqlcmd sessions for every query they run.

Why does this happen? Well one more context section will show you.

    Context "SQL 2016+ round robin redirect with connection pooling disabled" {
        It "Test when using two connections" {
            $readWrite = New-DbConnection $listener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly = New-DbConnection $listener $database -ApplicationIntent ReadOnly -Pooling $false | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly2 = New-DbConnection $listener $database -ApplicationIntent ReadOnly -Pooling $false | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly then $readOnly2"
            $readWrite | Should -Not -Be $readOnly
            $readOnly | Should -Not -Be $readOnly2
        }

        It "Test when re-using one connection" {
            $readWrite = New-DbConnection $listener $database -ApplicationIntent ReadWrite | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnlyConnection = New-DbConnection $listener $database -ApplicationIntent ReadOnly -Pooling $false 
            $readOnly = $readOnlyConnection | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            $readOnly2 = $readOnlyConnection | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar
            Write-Host "Connected to $readWrite then $readOnly then $readOnly2"
            $readWrite | Should -Not -Be $readOnly
            $readOnly | Should -Not -Be $readOnly2
        }
    }

Surprise! The year 2000 called to let us know ADO.NET connection pooling is still relevant.

So that's pretty much the caveat. If you're using connection pooling (which is on by default) then there's a possibility you are still only using one replica.

In the connection pool documentation it says that these are per-process per-application domain, per-user, and per-connection string. Different connection pools are good. That means we we get a separate connection which will go to a different server.

Let's start up a runspace with multiple concurrent threads using PoshRSJob (Install-Module PoshRSJob) and see if those get different connection pools and so connected to different replicas:

$jobs = 1..100 | Start-RSJob -Throttle 2 { 
    $listener = "AG1L"
    $database = "Test"
    New-DbConnection $listener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar }

$jobs | Wait-RSJob -ShowProgress | Receive-RSJob | Group-Object
$jobs | Remove-RSJob

It looks like it, right?

Well, I'm not so sure. I think that it might be using one connection pool but because there was thread contention for that connection object it spun up more of them. This becomes a little more obvious if you introduce a delay, reducing contention, and which suddenly skews the (seeming-but-not-really-because-it's-cached) replica routing.

$jobs = 1..100 | Start-RSJob -Throttle 2 { 
    $listener = "AG1L"
    $database = "Test"
    Start-Sleep -Milliseconds (Get-Random 1000)
    New-DbConnection $listener $database -ApplicationIntent ReadOnly | New-DbCommand "SELECT @@SERVERNAME AS ServerName;" | Get-DbData -OutputAs Scalar }

$jobs | Wait-RSJob -ShowProgress | Receive-RSJob | Group-Object
$jobs | Remove-RSJob

Try running that in a new PowerShell session, and then an old PowerShell session. You'll see some really weird figures between the two.

But there's one more caveat which also comes into play. To demonstrate this we need to enlist the help of another computer.

$test = { & "C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\sqlcmd.exe" -S AG1L -d Test -U "TestUser" -P "TestPassword123!" -K readonly -W -h -1 -Q "SET NOCOUNT ON; SELECT @@SERVERNAME AS ServerName;" }

"Local"
&$test
&$test
&$test

"Remote"
Invoke-Command C1N1 -ScriptBlock $test

"Local"
& $test

"Remote"
Invoke-Command C1N1 -ScriptBlock $test
Invoke-Command C1N1 -ScriptBlock $test
Invoke-Command C1N1 -ScriptBlock $test

The idea being… we should connect to the replicas in a round-robin predictable fashion if we use sqlcmd. And if we intersperse remote connections then those would probably also be predictable and round-robin right?

The situation is a little more simple than expected. SQL Server isn't keeping track of "what was the last replica you used, and what's the next one you will use" per computer. Instead it has one global pointer and every time any computer requests a connection it returns the next replica. So even though you have two computers requesting connections they are in sequence with each other (not in two separate sequences).

That's good. That means that:

  • If you only have one computer requesting connections, your code may be written in a way that it hits the same replica over and over.
  • But if you have multiple computers requesting connections, they will likely pick up different replicas, and then even if they get "stuck" on it you might consider that a kind of load balancing.

The overall situation is a little complex though and probably shouldn't be taken for granted. It's not magic and it's not really as straightforward as Microsoft makes it seem. Or at least you can't test it's working without knowing exactly what you're doing (or through trial and error like I did!)

Full Code