Disaster Recovery Testing - How to Run Faster

Are you really testing Disaster Recovery…or just checking a box? It’s time to have your DR Test Run Faster.

In the busy and chaotic IT world, finding the time to push and challenge your team beyond what is required takes energy and focus. It’s easy to just check the box and get a DR Test cycle completed using the same template over and over again. What the teams can learn from changing the variables of the test is immensely helpful to your organization. IT leaders should consider how to expand the scope of the testing in a meaningful way.

As we were building a DR program at a Fortune 300 company, we took the opportunity to go through the normal crawl, walk, run and then run faster approach to DR. We looked at each one of these phases to test our teams, processes, and capabilities in meaningful ways in which we learned something. We actually wanted to fail parts of the test to see where we need to improve. The phases took on distinct characteristics.

  • Crawl: Test the recovery processes, procedures on the most critical application and a subset of supporting applications. Test one recovery from one secondary means. In this case we recovered from tape because our primary means was to use mirroring at the DR site.
  • Walk: Expand the testing to all Priority 1 environments, a subset of Priority 2, and then recover from tape
  • Run: Full data center recovery and test one large application recovery from tape.
  • Run Faster: Using the applications from the Walk phase, we picked a three month window where we randomly drew two dates from a hat then picked one of them. While we notified the team of the three month window, they did not know the exact date until we called the DR scenario active at 8 a.m the day of the test. What we learned from this experiment was immensely valuable to our team.

What were some of the key takeaways from the Run Faster approach?

  • Resources go to conferences and take vacation. Who fills the gap? This scenario was ripe for learning.
  • The lack of preparation for the test in that three month window ensured that our teams focused on proper change controls.
  • We created a pressure based scenario and were able to gauge how the team responded. In this case, they grew more cohesive and collaborative than in other tests.
  • Failure was imminent. Testing in this way identified areas where we could improve. While this resulted in a few small remediations, the test achieved it’s goal of finding weaknesses in our processes and technology.

As you begin to mature your DR program, truly consider how you are testing. Create scenarios where the teams are pushed a little bit, processes and controls are validated, and your teams are kept on their toes. Your team will learn and lot and will likely have a bit of fun in the process.

Don’t just check a box, test your program.