Well, I thought the day had finally come. My first legit corrupt database.
One production system started firing a ton of alerts for databases suspending and disconnecting. So I log into try and gather whats happening, and all my databases are missing…oh crap. Then I get an error “Database msdb cannot be opened. It has been marked SUSPECT”…DOUBLE CRAP.
Well the error says to check the SQL Logs for information, so I’ll check out what’s going on. Get the error again. Hmm, ok. Well I’ll run a CHECKDB and see if it will give me something. Still can’t interact with the db. Think Kyle, think. I start to research how to fix a SUSPECT msdb and learn that there is template version that I can restore from. After a good hour or so of googling, I take a step back and think.
Ok, so step 1 to any IT investigation I have ever done: What has changed since the issue started occurring? I asked them to remove a USB drive attached to that Hyp a couple days ago and it I got my tracking number today. Are my data files still all present? Did they remove the right drive?
Data files still existed. They removed the wrong USB but it didn’t really matter. Maybe the SAN had a hissy fit and we just need to reboot the server. *cue 8 year waiting period for this unhappy machine to reboot*.
Oh boy. System came back up, but I can’t view 2 of the drives anymore. Both are on the same SAN. Now I’m getting really confused, what do I look at now? WINDOWS EVENT LOG! There I find a different error than I expected, “Operating System error 19(Media is write protected) encountered”. Time to call the datacenter.
So I recap everything I have checked to our contact there and asked him to give things a look and tell me what he’s seeing. He tells me I’m out of space, and then I tell him that I am not. I have 400 GB between the 2 drives. It’s not great, but I am not out. What else do you think is wrong? Then I learned something VERY fun. They put a threshold in to cap my SAN usage at 95%. So instead of running out of space when I expect to run out of space, I “run out of space” faster with no warning since my disk reports aren’t configured to let me know 400GB is actually 0GB.
Around and around we go trying to come up with solutions on how to get us functioning again. They were able to add a nice 10TB NAS device to hold some of our data. With a wave of the ole dbWonderKid wand, we moved some databases, combined those 2 drives into 1, and tadaaaa a functioning mirror once again existed.
- Not all SANs are created equal. Learn your limits and alert on them properly.
- There are a few ways to fix a corrupt msdb (if it’s actually corrupt and you aren’t just out of space).
- BIG shout-out to John Grover @ MSSQLTips for the lesson if that day ever comes.
- Inspect all the logs you have available to you when an issue is encountered. I lost so much time going down rabbit holes that didn’t matter. If I would’ve jumped to Windows Logs immediately after I realized I couldn’t do anything in SSMS, I would have saved myself a good hour or so of spinning my tires.
**QUICK BRAG ALERT**
When I first started at my current company a couple years ago, I had a Junior title in front of DBA. I was given a lot of manual processes from the previous DBAs that had not yet been automated. One of the most painful tasks was always copying data from multiple production environment to a lab environment, because there were a handful of databases that needed to be moved for the application to run, required a lot of painful tracking and communication, and all was done one at a time.
This led me to my first piece of PowerShell I ever wrote. After weeks and weeks of googling between all the other normal chaos, I had something that ran and automated the majority of the work. All that was needed for me was a ticket requesting the data to be moved and I could plug some values in and hit a button. I called it the MagicDBButton, because the fact that it worked was pure magic to me.
Even this became cumbersome after a while because when development and customer support realized how much less painful it was to get the data they needed, requests came in even quicker. THERE HAS TO BE A BETTER WAY he says rocking back and forth in his cube under the desk.
MagicDBButton v2 has arrived. With the help of a member of the development team making an internal site for me, I was able to create some stored procedures, jobs, and refactor the way data gets entered into the script variables, which allows the system to automatically look for new requests every 15 minutes, email the user when the request starts and finishes, and provide errors it encounters in the email (until I find ways to automatically handle it differently). I am completely hands free for the foreseeable future. I can’t wait to hand it out to them tomorrow.
Quick brag complete. I’ll pull pieces of the code into my GitHub soon!