In SQL Server there's a default SQL Agent job called syspolicy_purge_history. There's a good description of it here and another one here but it's also fairly well known for having a bunch of problems.

I came across a new issue a few days ago having implemented better SQL Server login auditing and security. At the time this job was scheduled to run I would sometimes see failed logons on other instances on the same host.

To test it I set up a VM named W08R2S08 with three SQL Server 2008 instances.

Instance Agent Service Account
(Default) Agent1
SQL2008 Agent2
SQL2008R2 Agent3

Here's what the job looks like (it varies a little from version to version of SQL Server, and may also be modified to overcome failures when located on clustered installs).

if ('$(ESCAPE_SQUOTE(INST))' -eq 'MSSQLSERVER') {$a = '\DEFAULT'} ELSE {$a = ''};
(Get-Item SQLSERVER:\SQLPolicy\$(ESCAPE_NONE(SRVR))$a).EraseSystemHealthPhantomRecords()

I cycled all 3 error logs and then triggered the syspolicy_purge_history job on each of the instances and recorded where failed logins occurred.

:connect W08R2S08
Exec	dbo.sp_readerrorlog
Go

:connect W08R2S08\SQL2008
Exec	dbo.sp_readerrorlog
Go

:connect W08R2S08\SQL2008R2
Exec	dbo.sp_readerrorlog
Go

Here's what the results were:

Source Instance (Default) SQL2008 SQL2008R2
(Default) Login succeeded
SQL2008 Login failed Login succeeded
SQL2008R2 Login failed Login failed Login succeeded

I took a screenshot of the logs from the last experiment so you can see the failures that are generated as the agent tries to login to instances where it has no authorisation to.

This only happens the first time a PowerShell session accesses the SQLSERVER:\SQLPolicy folder for a host, even when providing the name of the instance, and the enumeration stops at the point that instance is reached. There's a Connect item with a great workaround by Kyle Neier which is to replace the job step with code that calls the SMO object directly.

The only thing I didn't like about it is that it doesn't properly reproduce the Application Name in the connection.

So I made a modification to their code to calculate this and then something to deploy it on affected instances. It works in the cases I've encountered.

Exec	msdb.dbo.sp_update_jobstep @job_name = 'syspolicy_purge_history', @step_name = 'Erase Phantom System Health Records.', @step_id = 3, 
		@command=N'$applicationName = "SQLPS ($env:[email protected]$env:COMPUTERNAME)"
$SQLServerConnection = New-Object System.Data.SqlClient.SqlConnection
$SQLServerConnection.ConnectionString = "Data Source=$(ESCAPE_NONE(SRVR));Initial Catalog=master;Integrated Security=SSPI;Application Name=$applicationName"
$PolicyStoreConnection = New-Object Microsoft.SqlServer.Management.Sdk.Sfc.SqlStoreConnection($SQLServerConnection)
$PolicyStore = New-Object Microsoft.SqlServer.Management.Dmf.PolicyStore ($PolicyStoreConnection)
$PolicyStore.EraseSystemHealthPhantomRecords()
'
Go

Here's how to roll it back (by removing and recreating it as per the Microsoft article). Shout out to Change & Release!

Declare @jobId uniqueidentifier

-- Obtain the current job identifier that is associated with the PurgeHistory
Select	@jobId = Cast(current_value As Uniqueidentifier)
From	msdb.dbo.syspolicy_configuration_internal
Where	name = N'PurgeHistoryJobGuid'

-- Delete the job identifier association in the syspolicy configuration
Delete
From	msdb.dbo.syspolicy_configuration_internal
Where	name = N'PurgeHistoryJobGuid'

-- Delete the offending job
Exec	msdb.dbo.sp_delete_job @job_id = @jobId

-- Re-create the job and its association in the syspolicy configuration table
Exec	msdb.dbo.sp_syspolicy_create_purge_job