Note: This applies largely to ring buffers, not trace files.

The last few weeks I've been capturing application-level database timeouts and dipped my toe into Extended Events to do so. Knowing that the default ADO.NET query timeout is 30s I decided to capture queries taking >= 27s to run and that finished with an error.

Note: This was done on SQL Server 2012/2014, and I also read that the order of the Where clause is important to improve performance which is why I've put the result check first.

If Exists (
		Select	Top 1 0
		From	sys.dm_xe_sessions
		Where	name = 'Query_Timeouts')
Begin
		Alter Event Session [Query_Timeouts] On Server State = Stop
		Drop Event Session [Query_Timeouts] On Server
End
Go

CREATE EVENT SESSION [Query_Timeouts] ON SERVER 
ADD EVENT sqlserver.rpc_completed(
    ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_name,sqlserver.plan_handle,sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.server_principal_name,sqlserver.session_id)
    WHERE ([result]<>(0) AND [duration]>=(27000000))),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_name,sqlserver.plan_handle,sqlserver.query_hash,sqlserver.query_plan_hash,sqlserver.server_principal_name,sqlserver.session_id)
    WHERE ([result]<>(0) AND [duration]>=(27000000))) 
ADD TARGET package0.ring_buffer(SET max_events_limit=(1000))
WITH (MAX_MEMORY=1024 KB,EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

ALTER EVENT SESSION [Query_Timeouts] ON SERVER STATE = START
GO

I needed some way to test this would pick up the kind of timeouts I was looking for and so wrote a little PowerShell script which will populate a hundred or so in 5m.

for ($i = 0; $i -lt 10; $i++) {
    Start-Job {
        for ($i = 0; $i -lt 10; $i++) { 
            Invoke-SqlCmd -Query "Waitfor Delay '00:01:00'" -ServerInstance .\SQL2014 -QueryTimeout 30 -ErrorAction:Continue
        }
    }
}

while (Get-Job -State Running) {
    Start-Sleep -Seconds 1
}

Get-Job | Remove-Job

I don't pretend to have read the SQL Server XML books (who has the time…) but I can hack my way through and plenty of others have posted the code to shred an Extended Event. So I started shredding away and this is where things started going terribly wrong.

Note: I capture a little more in the event than I extract out just in case I need it later.

Select	Switchoffset(c.value('(@timestamp)[1]', 'datetimeoffset'), Datepart(Tzoffset, Sysdatetimeoffset())) As timestamp_adjusted,
		c.value('(@name)[1]', 'varchar(50)') As event_name,
		c.value('(@package)[1]', 'varchar(50)') As package_name,
		c.value('(action[@name = ''client_hostname'']/value)[1]', 'varchar(50)') As client_hostname,
		c.value('(action[@name = ''server_principal_name'']/value)[1]', 'varchar(50)') As server_principal_name,
		c.value('(action[@name = ''database_name'']/value)[1]', 'varchar(50)') As database_name,
		Coalesce(c.value('(data[@name = ''batch_text'']/value)[1]', 'varchar(max)'), c.value('(data[@name = ''statement'']/value)[1]', 'varchar(max)')) As query,
		c.value('(data[@name = ''result'']/text)[1]', 'varchar(50)') As result,
		c.value('(data[@name = ''duration'']/value)[1]', 'bigint') / 1000000.0 As duration_s
From	(
		Select	Cast(target_data As Xml) As target_data
		From	sys.dm_xe_sessions s
		Join	sys.dm_xe_session_targets st
		On		s.address = st.event_session_address
		Where	s.name = 'Query_Timeouts'
		And		st.target_name = 'ring_buffer'
		) a
Cross Apply 
		target_data.nodes('RingBufferTarget/event') As b(c)

Performance is abysmal and takes 6 minutes to run in my VM (on a real server with more cores it was still taking 2-3 minutes to run). Unacceptable!

After reading numerous complaints online about slow XML I started playing around with some of the suggestions:

  • Put the XML into a variable.
  • Shred the XML rows into a temp table.

I found that I only received performance benefits when I used both together. This is what I ended up using:

If Object_Id('tempdb..#Xml') Is Not Null
    Drop Table #Xml
 
Create Table #Xml (
    [Id] Int Identity(1, 1) Primary Key,
    [Event] Xml Not Null
)
 
Declare @Xml Xml
Select  @Xml = Cast(target_data As Xml)
From    sys.dm_xe_sessions s
Join    sys.dm_xe_session_targets st
On      s.address = st.event_session_address
Where   s.name = 'Query_Timeouts'
And     st.target_name = 'ring_buffer'
 
Insert  #Xml (Event)
Select  c.query('.') As Event
From    @Xml.nodes('RingBufferTarget/event') As b(c)
-- Option (Optimize For (@Xml = Null)) -- Allegedly useful for SQL 2008

Select  Switchoffset(Event.value('(event/@timestamp)[1]', 'datetimeoffset'), Datepart(Tzoffset, Sysdatetimeoffset())) As timestamp_adjusted,
        Event.value('(event/@name)[1]', 'nvarchar(50)') As event_name,
        Event.value('(event/action[@name = ''client_hostname'']/value)[1]', 'varchar(32)') As client_hostname,
        Event.value('(event/action[@name = ''server_principal_name'']/value)[1]', 'varchar(128)') As server_principal_name,
		Nullif(Event.value('(event/action[@name = ''plan_handle'']/value)[1]', 'varchar(88)'), Replicate('0', 88)) As plan_handle,
		query_hash, -- These need to be converted later
		query_plan_hash, -- These need to be converted later
        Event.value('(event/action[@name = ''database_name'']/value)[1]', 'nvarchar(128)') As database_name,
        Coalesce(Event.value('(event/data[@name = ''batch_text'']/value)[1]', 'nvarchar(max)'), Event.value('(event/data[@name = ''statement'']/value)[1]', 'nvarchar(max)')) As query,
        Event.value('(event/data[@name = ''result'']/text)[1]', 'nvarchar(50)') As result,
        Event.value('(event/data[@name = ''duration'']/value)[1]', 'bigint') / 1000000.0 As duration_s
From    #Xml
Outer Apply (
		Select	Nullif(Event.value('(event/action[@name = ''query_hash'']/value)[1]', 'decimal(20)'), 0) As query_hash,
		        Nullif(Event.value('(event/action[@name = ''query_plan_hash'']/value)[1]', 'decimal(20)'), 0) As query_plan_hash
		) a
Order By 1

That's an improvement of 6m down to 1s. I'd sure like to know how to improve performance of the first XML query without all the additional code though.