A few quick notes on real world database corruption

I came across a System Center Operations Manager (SCOM) database recently running on SQL 2012 SP1, which was known for data corruption issues.

FIX: Data corruption occurs in clustered index when you run online index rebuild in SQL Server 2012 or SQL Server 2014 @ TechNet
Online index rebuild corruption bug in SQL Server 2012 SP1 @ SQLskills

I was using Ola Hallengren's maintenance solution locked to do index reorganization only, which shouldn't trigger the bug, but it turns out SCOM is constantly evaluating its indexes and running its own rebuilds throughout the day which triggered the fault.

Along the way I came up with some tips I hadn't seen mentioned elsewhere.

What you probably know is that your first task (aside from double-checking backups, the ones you should already be checking daily anyway), is to determine whether the corruption is in clustered indexes and heaps, or nonclustered indexes.

The problem is that Checkdb returns so much text (and it's all indexes) that you'll lose a lot of time trying to correlate what's going on with the sys.indexes catalog view. You can save a lot of time running Dbcc with the Noindex option. If it returns a clean bill of health then you know you're lucky and can "just rebuild the broken indexes".

Yeah… I'll come back to that…
If the database is of any decent size try to avoid using Dbcc with the Repair_Rebuild option. Just like BOL says they are truly last resorts only.
- You can't repair a database without setting it to Single_User mode (or maybe Emergency mode), which means you've started a downtime clock ticking. This process can't run in the normal Multi_User mode.
- It prints about 4 lines of detail for each row in each index as it fixes it. In a data warehouse like SCOM you can easily end up with hundreds of millions of lines of output for just one table. Older versions of SSMS used to crash with just a few million rows so this isn't very safe.
- The entire process is done in a single transaction for all tables together. If you're an hour in and cancel you can expect a lengthy roll back. You may also need additional log space.
- All in all it's extremely slow. It is nothing like just running a complete rebuild of all your indexes… it's far far worse.
But you also can't simply Alter Index Index_Name On dbo.Table_Name Rebuild when an index is corrupted, the engine doesn't let you, or at least it didn't let me! I was given to lots of nonsensical error messages about bulk copy and check error constraints which didn't seem to apply to the tables at all!
Instead I wrote a quick script to script out the nonclustered indexes and Alter Index ... Disable and then Alter Index ... Rebuild them. This worked fine. I also scripted out foreign keys and check constraints, so I could Alter Table ... With Check Check Constraint ... to make sure everything is reset to trusted.

All in all I was able to repair everything and get it running with minimal fuss but next time with these tips in hand I'll have it done much faster and with more uptime.

Share on: