I have very recently (24hrs ago to be exact) been involved in recovering from a fairly difficult situation. Having come out of the other side, I wanted to share my experience and design.
I cover our physical design and layout in another article, but briefly we run two Data Centers connected via a Lease Line – not fast enough to do any live vMotion but adequate for data transfer and Veeam Backup & Replication. On Site A I have a virtualised vCenter server running SQL 2008, not necessarily best practice, but that is my ‘Platinum VM’ (it has to be protected). Along side it I have a virtualised Veeam Backup & Replication Server with it’s repository on my Nimble SAN (again the details are covered in another article).
What protection is in place?
- The SQL server makes daily backup copies of it’s databases on to a local disk F:
- Veeam runs a VSS aware backup job of vCenter on to a Veeam Repository at Site A
- Veeam also runs a Replication Job which registers vCenter with the VMware Cluster at Site B
- In addition, Veeam runs a VSS aware backup job of the virtualised B&R server to Site A
- And again, a Replication Job registers the B&R Server with the VMware Cluster at Site B
- Finally I run a daily export of my Veeam configuration to the Repository at Site B
Too many Eggs in one Basket?
Possibly, but the way I justify this is that most data recovery needs to be fast and is file based. Hence, having backup copies of my data at Site A and the Veeam Repository sat on my production SAN means that recovery is very fast and efficient. Site B already has replicated copies of my core servers, they just need to have a few checks & tweaks and they can be up and running as replacements.
What happened and How did Veeam Help?
During my Veeam 8 upgrade I encountered an error which not only corrupted the SQL Database, but also made ‘fixing’ the Veeam in-place install quite difficult. As this in effect took my B&R server off line I had no way to recover from any Veeam backup files. As my SQL server also hosts vCentre I couldn’t just recover using my ‘emergency snapshot’. Unfortunately, the automated SQL job had failed so I had no on-line recovery points. This is where Site B came in very handy.
I already had an ‘Isolated Network’ set up within the Cluster, by attaching the replicated vCenter to this network, powering it up and connecting a USB key (via my vCenter client). I was able to manually backup the Veeam SQL Databases (effectively from yesterday) and copy them to my USB. In turn these were then uploaded to the Live SQL server and the Database was recovered.
Fortunately, before the upgrade started I did do a Snapshot of my B&R server (another reason for running it as a VM). This allowed me to roll back to Veeam v7, and connect to the recovered SQL Databases – all good I thought.
Upon checking the Veeam job definitions I discovered that some were missing, and that the last run time was over a week ago – something had gone wrong? Not a problem though, as I automatically export the Veeam config to Site B – simply browsing to this folder allowed me to recover my full list of Backup & Replication job definitions.
All done. back up and fully working. Maybe not the most elegant process, but one which has allowed me to recover from a potentially disastrous situation. I just need to troubleshoot the Veeam 8 upgrade, and try again.
I must also add that during this process, I received wonderful support from Kevin Ridings (UK Enterprise SE, Veeam) who walked through this scenario with me and has subsequently escalated the upgrade issue within Veeam. I am hopeful that a resolution to the upgrade issue will appear very soon..
- So glad I run virtualised vCenter & Veeam installations, and that I replicate these servers NOT JUST back them up.
- I must find a way to monitor the SQL automated backup jobs to find out why they stopped running.
- Document this scenario so that others can read it and either provide feedback or review their own installations.