Skip to content

May 17, 2018

My First Almost Corrupt Database

by kwagz91

Well, I thought the day had finally come. My first legit corrupt database.

One production system started firing a ton of alerts for databases suspending and disconnecting. So I log into try and gather whats happening, and all my databases are missing…oh crap. Then I get an error “Database msdb cannot be opened. It has been marked SUSPECT”…DOUBLE CRAP.

Well the error says to check the SQL Logs for information, so I’ll check out what’s going on. Get the error again. Hmm, ok. Well I’ll run a CHECKDB and see if it will give me something. Still can’t interact with the db. Think Kyle, think. I start to research how to fix a SUSPECT msdb and learn that there is template version that I can restore from. After a good hour or so of googling, I take a step back and think.

Ok, so step 1 to any IT investigation I have ever done: What has changed since the issue started occurring? I asked them to remove a USB drive attached to that Hyp a couple days ago and it I got my tracking number today. Are my data files still all present? Did they remove the right drive?

Data files still existed. They removed the wrong USB but it didn’t really matter. Maybe the SAN had a hissy fit and we just need to reboot the server. *cue 8 year waiting period for this unhappy machine to reboot*.

Oh boy. System came back up, but I can’t view 2 of the drives anymore. Both are on the same SAN. Now I’m getting really confused, what do I look at now? WINDOWS EVENT LOG! There I find a different error than I expected, “Operating System error 19(Media is write protected) encountered”. Time to call the datacenter.

So I recap everything I have checked to our contact there and asked him to give things a look and tell me what he’s seeing. He tells me I’m out of space, and then I tell him that I am not. I have 400 GB between the 2 drives. It’s not great, but I am not out. What else do you think is wrong? Then I learned something VERY fun. They put a threshold in to cap my SAN usage at 95%. So instead of running out of space when I expect to run out of space, I “run out of space” faster with no warning since my disk reports aren’t configured to let me know 400GB is actually 0GB.

Around and around we go trying to come up with solutions on how to get us functioning again. They were able to add a nice 10TB NAS device to hold some of our data. With a wave of the ole dbWonderKid wand, we moved some databases, combined those 2 drives into 1, and tadaaaa a functioning mirror once again existed.

Lessons learned:

  • Not all SANs are created equal. Learn your limits and alert on them properly.
  • There are a few ways to fix a corrupt msdb (if it’s actually corrupt and you aren’t just out of space).
  • Inspect all the logs you have available to you when an issue is encountered. I lost so much time going down rabbit holes that didn’t matter. If I would’ve jumped to Windows Logs immediately after I realized I couldn’t do anything in SSMS, I would have saved myself a good hour or so of spinning my tires.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

%d bloggers like this: