Recently I migrated from my own runspace module to Boe Prox's PoshRSJob which is pretty much perfect. But today I wanted to share how to integrate PoshRSJob cleanly into your functions through a default -Parallel parameter and using a template.

You can very easily modify this for your own purposes however it's even more awesome as-is if you run parallelised tests for one major input (like a computer name) but where additional information might also be passed in through object properties on a pipeline (I'll explain why you'd want to do that later in the post). Here's what it looks like:

<#

.SYNOPSIS

.DESCRIPTION

.PARAMETER

.INPUTS

.OUTPUTS

.NOTES

.EXAMPLE

#>

function Test-Something {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias("ComputerName")]
        [string] $InputObject,

        [switch] $Parallel = $true,
        [int] $Throttle = $env:NUMBER_OF_PROCESSORS
    )

    begin {
        $batch = [System.Guid]::NewGuid().Guid
    }

    process {
        if (!$Parallel) {
            #region Single run
            Start-Sleep -Seconds 1
            "Did work on $InputObject"
            #endregion
        } else {
            #region Parallel run
            Write-Verbose "Submitting job for $InputObject"
            $jobArguments = @{
                Throttle = $Throttle
                Batch = $batch
                FunctionsToLoad = $PSCmdlet.MyInvocation.MyCommand.Name
                ScriptBlock = [scriptblock]::Create("`$_ | $($PSCmdlet.MyInvocation.MyCommand.Name) -Parallel:`$false")
            }
            @(if ($_ -and $_ -isnot [string]) { $_ } else { $InputObject }) | Start-RSJob @jobArguments | Out-Null
            #endregion
        }
    }
    
    end {
        #region Wait for results and return them
        if ($Parallel) {
            Get-RSJob -Batch $batch | Wait-RSJob | Out-Null
            Get-RSJob -Batch $batch | Receive-RSJob
            Get-RSJob -Batch $batch | Remove-RSJob
        }
        #endregion
    }
}

It looks fairly straightforward but there's a lot hidden in here where every line of code has a very specific and not-so-obvious reason for being there. Let's step through together.

function Test-Something {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
        [Alias("ComputerName")]
        [string] $InputObject,

There's a limited number of ways to pass an input argument here.

  • You can pipeline an object to the function, but only if it is a string or an object with a string type property named ComputerName. If you pass in an array of strings it will act like you've called the function once per string. It will technically also accept an object with a string property of InputObject; but this is just a convenient name for us to use in our template - not for normal use.

    "1" | Test-Something
    "1", "2" | Test-Something
    [PSCustomObject] @{ ComputerName = "1"; Owner = "Cody"; } | Test-Something
    

    You cannot however do this.

    @{ ComputerName = "1"; Owner = "Cody"; } | Test-Something
    

    The hashtable would be implicitly converted into a string and result in "Did work on System.Collections.Hashtable" being returned.

  • Or you can call the function directly with a string. You cannot pass in an array of strings.

    Test-Something "1"
    Test-Something ([PSCustomObject] @{ ComputerName = "1"; Owner = "Cody"; }) # No! It will become a string of "@{ComputerName=1; Owner=Cody}".
    Test-Something "1", "2" # No! Cannot convert value to type System.String.
    

    What some people will do is allow an input of an array (or simply not declare any kind of type for InputObject, which will mean arrays are allowed) and then do a foreach loop in their process {} block which will enumerate/tokenise the input if it is an array. I find this really messes up functions where I just want a string or my object and nothing in-between. It's best to keep things simple.

The way the process block works generally is that if an object is passed in over the pipeline then it will be put inside the $_ variable and with $InputObject being set to just the ComputerName property of that object. If only a string was passed in (either over the pipeline or directly) then $_ will be $null and $InputObject will be the string. This information will be important shortly.

You might also be wondering why I care about the rest of an object if I'm willing to accept only a string.

A great example of this is a complex computer object describing a server; it might have multiple properties like a name, an owner, an operating system, a version, a rack number, and so on. By accepting both a computer name and a computer object:

  • I can run a quick test on a computer by name.
    • When I output debugging/test/failure information I can just use the computer name.
    • Or I can include more information but may have to look it up in a database or elsewhere.
  • Or I can run a test on a computer based on an object with all of its details.
    • When I output debugging/test/failure information I can include all of this in the messages.
    • I don't need to look it up again; I already have it.

It's the difference between, "Test failed on computer Blah", and, "Test failed on computer Blah, it sits on Rack 192D and you should call Hulk Hogan to give it a kick."

        [switch] $Parallel = $true,
        [int] $Throttle = $env:NUMBER_OF_PROCESSORS
    )

You should generally optimise functions for the most common purposes. In this case we want to parallelise everything by default so that users don't need to specify the -Parallel flag every time, and we want to set a job limit based on the number of processors on the computer. This is a sane default but for functions which are primarily waiting on network delays (i.e. firing a query and waiting a long time for a response) you can often double this.

    begin {
        $batch = [System.Guid]::NewGuid().Guid
    }

PoshRSJob will spin up a new runspace for each batch number. We generate one here so that everything for this overall invocation of the function will go into one place; this aids with performance and housekeeping afterwards while reducing code use.

It does mean however that you should pipeline everything into your function rather than calling it directly in a foreach loop otherwise things will really perform badly.

PoshRSJob has the same limitation but gets around it with the Batch parameter which you'll see later. In our case we can just follow the simple rule of calling our function as a pipeline when we want performance, and directly when we want debugging (which I'll show you later in the post).

    process {
        if (!$Parallel) {
            #region Single run
            Start-Sleep -Seconds 1
            "Did work on $InputObject"
            #endregion
        } else {

Dummy code with dummy logic. Your actual code would go in the middle, I've put a sleep here just for demonstration. Note that you can't Write-Host within a PoshRSJob but you can Write-Verbose; though it's a little harder to access it (you'd need to add -Verbose to the Receive-RSJob command later, and it gets polluted with PoshRSJob module verbose output, which will thankfully be removed in a future version of PoshRSJob).

        } else {
            #region Parallel run
            Write-Verbose "Submitting job for $InputObject"
            $jobArguments = @{
                Throttle = $Throttle
                Batch = $batch
                FunctionsToLoad = $PSCmdlet.MyInvocation.MyCommand.Name
                ScriptBlock = [scriptblock]::Create("`$_ | $($PSCmdlet.MyInvocation.MyCommand.Name) -Parallel:`$false")
            }
            @(if ($_ -and $_ -isnot [string]) { $_ } else { $InputObject }) | Start-RSJob @jobArguments | Out-Null
            #endregion
        }
    }

This is the real magic. During a parallelised run, we will spawn a PoshRSJob calling ourselves back again but for a non-parallelised run. Some notes on this:

  • The reason for using $PSCmdlet is to avoid having to rewrite this block for every single procedure I use it in. Instead it can quickly dynamically determine its own name and run accordingly. This is the same reason I used a generic $InputObject for my function instead of $ComputerName.

  • I have however used -FunctionsToLoad here where in normal cases this function would likely be part of a module and you would use -ModulesToLoad instead. I highly recommend that you use -ModulesToLoad directly rather than attempting to run Import-Module within your script block. I've found severe concurrency issues with the SqlServer module when attempting that, and they seem to be buried within PowerShell itself (such that it can affect other modules). But using -ModulesToLoad seems entirely safe.

  • The reasons for using the @() array is that anything inside this section is going to be dynamically executed before passing on in the pipeline. I am making sure that if was piped in a complex object, I will pass on the complex object to the PoshRSJob. If I am called with a string, I will just pass on the string.

    end {
        #region Wait for results and return them
        if ($Parallel) {
            Get-RSJob -Batch $batch | Wait-RSJob
            Get-RSJob -Batch $batch | Receive-RSJob
            Get-RSJob -Batch $batch | Remove-RSJob
        }
        #endregion
    }
}

Once all pipeline input has been evaluated we simply need to wait for the PoshRSJobs to finish and return the output from them.

There are a few caveats here however. One is that if an error occurs in one of our jobs and we have $ErrorActionPreference = "Stop", we might not return all of the output (because Receive-RSJob will stop on the first error read). I have a GitHub issue in to change that and personally change it in my copy of PoshRSJob also.

However it's also for that reason that I now wrap all of my single run regions in code which will capture errors and return everything in a specific object format. I'll go into that in a later post. It's really awesome for use with Jenkins.

If you want to debug your procedure, you can always do so by calling it directly with -Parallel:$false thus bypassing all of the PoshRSJob logic. This is really beneficial compared to other functions which use runspaces in such a way that they can't be debugged anymore.

Here are some examples.

Measure-Command { $result = 1..10 | Test-Something -Parallel:$false } | Select Seconds
$result
Measure-Command { $result = 1..10 | Test-Something } | Select Seconds 
$result
Seconds
---
10

Did work on 1
Did work on 2
Did work on 3
Did work on 4
Did work on 5
Did work on 6
Did work on 7
Did work on 8
Did work on 9
Did work on 10

Seconds
---
2

Did work on 1
Did work on 2
Did work on 3
Did work on 4
Did work on 5
Did work on 6
Did work on 7
Did work on 8
Did work on 9
Did work on 10

That's a single test with a 5x speedup on a two core machine. Now imagine running this on a multi-core server against 500 targets and with dozens of tests. That's my day to day usage of PoshRSJob and now it can be yours too. Give it a try.