MT TL
Follow Us
ITA MEMBERSHIP

RISK / SECURITY

RISK / SECURITY

High and Dry: Insurers Search for Disaster Recovery Plans

Michael P. Voelker | June 04, 2014

It was strangely appropriate—albeit very annoying—that in the midst of working on an article dealing with disaster recovery, I suffered a random computer crash that wreaked havoc with my email database.

Fortunately, having learned my lesson from a similar crash years ago, I have been using a two-tiered recovery strategy that includes an external hard drive in my office for near real-time backup, along with a cloud-based recovery solution that backs up data overnight. When the local backup failed to restore, I went to the cloud for recovery and, although I still lost a half-day worth of email, it was better than losing years’ worth of valuable information.

It would be a stretch to compare my single-computer crisis to the magnitude of restoration efforts faced by insurers after a disaster, but there are some parallels. Just as I learned my lesson from past recovery efforts, insurers likewise have refined their business continuity strategies based on their own experience with catastrophes—from terrorism to natural disasters and everywhere in-between. One of the chief lessons insurers have learned is that they are measured by their customers for their ability to successfully resume normal operations as quickly as possible. 

“What a lot of insurers have discovered as a result of a major disruptive event like Hurricane Sandy, or even problems within their own data center that caused a disruption, is that the reputational impact [of not restoring quickly] accounts for a lot more than it did five or 10 years ago,” says John P. Morency, research vice president at Gartner. “It is less acceptable to customers for their insurer to be down when everyone else is up.”

The need to manage reputational risk, especially when word of company failures spreads virally through social media, has been a driving force behind insurers’ business continuity and disaster recovery strategy today. Acceptable downtime in the minds of customers has dropped from days to hours—even minutes.

“The biggest shift for disaster recovery has been the increased speed by which companies need to bring operations back up,” says Karen Furtado, partner at Strategy Meets Action.

“There has been a lot more investment among insurers in software technology that can do near real-time failover—Site Recovery Manager [VMware], Microsoft Always On, and so on,” Morency says. “For many companies, the new target is ‘failover when you can, recover when you must.’ Having a strategy of reactively responding to a disaster isn’t cutting it any more for a lot of organizations.”

Own It

Hot sites continue to be the strategy of choice for bigger companies, explains Morency. “For most large insurance organizations—ones that have a billion or greater in revenue—there is much less of a focus on subscription-based recovery services,” he says. “Many of those larger organizations maintain at least a data center configuration at a secondary site, although most don’t maintain a fully redundant infrastructure.” 

ACUITY, which writes just over the $1 billion mark, is one company that does maintain a redundant infrastructure. The mutual insurer had maintained a reciprocal services agreement with another insurance company prior to 2004, the year in which it purchased its own hot site at a location about 30 minutes from ACUITY’s headquarters in Sheboygan, Wisc.

In late 2013, ACUITY began a project to expand the data center at its hot site, with completion expected for July of this year. As part of a two-tier disaster recovery strategy, the insurer also contracts with an availability services provider for mainframe capacity and redundant data backup. 

With dedicated power and networking capability, and generator capacity that can be operated independently at the time of a disaster, the data center at ACUITY’s hot site will house exact duplicates of servers maintained at the company’s corporate headquarters. This redundancy is a large expense—ACUITY declines to specify exactly how much—but the cost is worth it, according to company president and CEO Ben Salzmann.

“Our mission is helping people rebuild shattered lives, and in order to fulfill that mission we have to have the most seamless response possible if our own headquarters suffers a disaster,” Salzmann says. “Maintaining a dedicated hot site provides us the best business continuity capability.”

Owning its own facility also ties into ACUITY’s longstanding strategy of developing and maintaining all of the company’s core processing systems.

“We have a tightly integrated application infrastructure, so in order to get the level of functionality out of them we expect, we need identical equipment,” says Marcus Knuth, ACUITY’s vice president of enterprise technology. “We didn’t want to get set in a situation where we are running one version of a server here, and the one at our recovery site is four years older. Our experience has shown that if it’s not 100 percent identical equipment, you can run into issues.”

Data is replicated between the production and hot site environments within five minutes of real time. “In the event of a disaster, we can restore from our replicated backup and in a matter of hours be back up and running with near-current data,” Knuth says.

Outside the hot site’s data center, ACUITY maintains about 50 workstations that can be booted remotely so that systems are fully operational when workers arrive at the site. “We have enough equipment there to be fully functional, providing the same level of service our customers expect from day one, and we have arrangements in place to provision additional resources we need shortly thereafter,” Knuth says.

 

ACUITY’s hot site can accommodate over 400 people. With the company’s existing mobile access for field staff and some home-based employees, almost all of ACUITY’s 1,000-member staff can be back online quickly after a disaster. The company also undergoes regular testing of both partial and full-scale disaster scenarios.

In order to best simulate a real-life event, employees chosen to be affected by a mock disaster drill are notified the evening before a test—which prevents them from taking extra supplies from their regular workstations—and are bused to the hot site location to perform the day’s work.

“We make sure that one-third of the people in each test group have not gone through a recovery exercise before so that, over time, our entire staff has gone through the process and is comfortable with it,” says Laura Conklin, ACUITY’s vice president of business consulting.

Testing has allowed ACUITY to fine-tune both its systems strategy and recovery processes. “We have really sharpened our focus on zero-hour—the minute after a disaster occurs. What do you do? Where do you go? How can we service our agents and policyholders until our systems do come back online?” says Conklin.

For insurers that maintain their own redundant site and systems, the decision is often based on seeking to achieve the greatest level of control.

“Most of the carriers maintaining a recovery site have decided they are better off trying to marshal their own resources and control as much of the process as they can,” says Don H. Donaldson, president of LA Group in Montgomery, Texas.

“In the event of a disaster, or even a system interruption, it’s important that we have an exact replica of our infrastructure configured the way we want it to be, rather than just available computing capacity, which is why we have continually decided against subscription-based services for our server-based systems,” says Knuth.

Comfort in the Cloud

Because Wisconsin’s natural disasters are typically limited in geographic range, ACUITY is comfortable that the location of its hot site is a sufficient distance from its headquarters. However, for insurers in states subject to wide-reaching disasters such as hurricanes or floods, there is an additional consideration.

“After Hurricane Sandy, companies that had recovery sites in the same geographic area couldn’t access those sites,” Furtado says. “Insurers were suddenly realizing the benefit of a cloud-based disaster recovery resources, more than the brick and mortar approach.”

“Our cloud-based disaster recovery allows us to spin up capabilities quickly,” says Brian Flynn, CIO of Crawford & Company, which uses SunGard Availability Services as the foundation for the technology component of its recovery plan. “Combined with server virtualization in our data center and continuous replication between our environment and SunGard, we can restore in a fraction of the time compared to traditional methods.”

Cloud offers potential advantages of lower cost and greater flexibility to respond to business growth than an owned-location strategy. However, there is a potential downside.

“The Achilles heel of the cloud or subscription-based model is less control,” Morency says. “Even if you can perform regular tests at the disaster recovery provider’s site, there is an inherent weakness in terms of potential inconsistency between the production and recovery systems. Those differences can get bigger every day that goes past the last recovery test.” 

However, interest in cloud as a disaster recovery strategy continues to grow, particularly among insurers that also leverage the cloud as an operational strategy. At Island Insurance Hawaii, a private cloud and virtualization using VMware has allowed the company to consolidate 75 physical servers into just six that are connected to Island’s storage area network.

“In addition to the reduction in hardware footprint, there is a definite benefit to failover,” says Jeff Fabry, senior vice president, CIO, and CSO at Island Insurance. “We can lose three out of our six machines without any impact on users.”

 Maintaining a natively cloud-based infrastructure also ties into Island Insurance’s strategy of using cloud-based disaster recovery services. “From a cost and staffing perspective, we don’t have to worry about maintaining our own data center and infrastructure. There were no additional skill requirements we needed in order to use a cloud-based recovery service because we already had the experience from using our own private cloud.”

For several years, Island Insurance used a California-based recovery provider, but the arrangement ultimately proved too costly. “The communications cost from here to the mainland was getting too high because of the amount of data we had to feed over. We also needed to purchase more bandwidth to deal with latency issues,” Fabry says.

In 2013, the company switched cloud providers to Hawaii-based PACXA. “We’re not just backing up data, we’re backing up our whole virtual infrastructure, so there are no configuration issues,” Fabry says. “We don’t have to worry about having our infrastructure set up the same with the cloud provider—we just have to turn it on.”

Recovery tests have shown that the insurer can restore its operations within one to two hours. Having both production and recovery systems in Hawaii does present a risk, but it’s a risk that Island Insurance has assessed and is comfortable in assuming. 

“It came down to a cost-benefit analysis of the worst-case scenario,” Fabry says. “In the event of a building fire or other site disaster, we are well protected due to our arrangement with PACXA and our ability to share space with sister companies [within parent group Island Holdings]. The concern occurs with a hurricane or tidal wave, but we felt the risk of something destroying both locations was remote versus the ongoing cost of having our recovery service located on the mainland.”

People and Process

Recovering from a disaster involves more than bringing systems back online.

“Dealing with a disaster is not just about equipment or technology. It’s the fact that people may be having crises of their own when disaster happens. Even if it’s a crisis isolated to our headquarters, recovery is also being aware that life will not be the same here for quite a while—people won’t be working in the same comfortable location they are used to at our headquarters,” Conklin says.

Part of Crawford’s business continuity plan involves proactively reaching out to both employees and contracted workers anywhere in the world to check on their well being. In 2011, the company automated its emergency contact process with the cloud-based Send Word Now. The system pushes out notifications and instructions via text, email, and social media, monitors employee responses to notifications, and advises Crawford of any employees who do not respond.

“Within minutes, we can complete a quick check of where all our people are and focus our staff resources on locating people who don’t respond,” Flynn says. 

Crawford has also leveraged the experience gained from the push-notification strategy used in its own disaster recovery efforts to the claims management services it provides. Having an automated resource allocation service is important for Crawford in managing a dispersed group of employees and contract workers who are constantly mobile.

“Whenever there is a catastrophe, not only are we impacted, but our clients are as well,” Flynn says. Previously, Crawford relied on dozens of call center staff to contact employees and contractors and identify who was available to respond to regional disasters. In 2013, Crawford automated that process through a proprietary system called Cat Connection, based on Appian’s BPM platform.

With Cat Connection, Crawford pushes notifications to mobile devices used by employees and contractors. Responses allow Crawford to determine personnel availability, pinpoint their exact GPS-based location, and make the best determination of who to deploy to affected areas. After assignments are made, Cat Connection incorporates a Facebook-style interface that allows staff to share information, photos, and collaborate in helping individuals and businesses recover from the catastrophes they face. 

Tried and Tested

Perhaps the most valuable lesson Crawford has learned from dealing with catastrophe is that it is impossible to over-prepare for a disaster.

“Through our own claims management services, we have worked with many companies that have had crises, disasters, and system failures. One thing that sticks out is that no matter how well rounded a plan seems to be, sometimes it’s one little part that can set everything tumbling. You have to consistently test your plan to try to uncover those things,” Flynn says.

Preparation not only helps uncover flaws in planning, but also benefits the most important part of any disaster recovery plan: people.

“Awareness is key, because the last thing you want in a crisis is panic,” Conklin says. “The more that people are aware of our recovery plan and capabilities, and the more people we can pass through a test, the more comfortable they can feel in the event that they do have to react to a crisis.”

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Featured articles

Guidewire MR

PS Gen MR

ELECTRONIC CHAT

The Email Chat is a regular feature of the ITA Pro magazine and website. We send a series of questions to an insurance IT leader in search of thought-provoking responses on important issues facing the insurance industry.

WEB EVENTS

ITA is pleased to present the 2014 Webinar Series. We have many topics for you to choose from and attendance is open to all ITA members. The webinar topics are current and exciting — ranging from predictive analytics to telematics and will focus on the direction insurance carriers need to follow for the future. All webinars are presented by insurance IT professionals along with some of the leading analysts and consultants in the field. There is no cost to attend an ITA webinar. For more information and to register for the webinar, click the “title” of the webinar below.

BLOGS AND COLUMNS

only online

Only Online Archive

ITA Pro Buyers' Guide

Vendor Views

Partner News