Analysis: Best prepared for failure

15 July 2011 | Kavit Majithia


As athletes train hard and fans scramble for tickets, London as a collective city is preparing to welcome the world to the 2012 Olympic Games. AT&T, one of the largest global telecoms operators, is preparing for the event in a very different way.

The company held its first Network Disaster Recovery (NDR) exercise in the UK’s capital in mid-July 2011 to test its network’s response to network disruptions caused by a range of factors, including physical, security, economic, political and social risks – all risks heightened by the games arriving in London. AT&T holds its exercise at least four times a year around the world, outlining its commitment to ensuring customers face the limited amount of downtime in the face of network risks.

“The exercise works by surrounding the network with proactive tooling to help us detect and resolve a lot of the problems anticipated ahead of time,” said Justin Williams, network disaster recovery, international, AT&T Network Operations. “This exercise covers natural hazards, technical hazards and human hazards which can range from a building collapsing, issues raised by latent software bugs and human intervention affecting fibre-optic cables. As with any big event, London 2012 will be an important focus for us.”

Covering 26.9 petabytes of data, on a collective network spanning 162 PoP sites and over 55 countries in 155 cities, AT&T has invested over $600 million in its NDR programme – a level of commitment not replicated by any other global telecoms operator. Perhaps a similar initiative comes from Telecom San Frontières (TSF), an NGO which assists aid agencies by providing broadband internet access, voice communications and IT support in regions affected by war or natural crisis, in effect striving to “connect the stranded”. TSF has been working with AT&T since 2002 on numerous initiatives and were presented with a cheque of $150,000 at the exercise, by the telco, to continue its work in over 60 countries, and support and aid over 600 humanitarian organisations.

“It’s a group we have a great deal of affinity with,” says Williams. “They focus on causes outside the US and through our financial support we can help them expand their programme and continue to save lives.”

AT&T’s disaster recovery team consists of 20 to 30 people, highly skilled engineers who work and travel continuously to reach areas where its network could face downtime. The group works directly with its Global Network Operating Centre (GNOC) in New Jersey, which can anticipate a problem over 48 hours ahead of time, and the NDR team can act according to the level of threat.

“Our response is always the same,” said Kelly Morrison, senior network support at AT&T. “It doesn’t matter if we are faced with a natural disaster or if the problem is caused by man. It is the planning element that consumes the most time.”

It is clear, as businesses become more cyber-reliant, that the collective industry commitment to business continuity becomes more apparent. AT&T’s business continuity survey based on 100 IT executives found 91% of businesses in the UK have a business continuity plan in place and 75% are concerned that the increasing use of devices could have a direct impact on security threats. Adding to those threats is an increasing investment in cloud computing, which 37% of companies declared as part of their corporate infrastructure. In addition, over 70% of executives are concerned about the increasing usage of social networking and its potential for security breaches.

While many of the risks outlined in the survey are fairly new, the threat of hurricanes, tornados, earthquakes, floods and wild fires have always needed a response. AT&T’s most recent deployment came in Chile in March 2011, as a response to an indication by its GNOC that there was likely to be a network impact from a 5.3 magnitude earthquake that hit Santiago on March 16.

“When its GNOC saw the likely impact on the Friday evening we immediately put our plans in place,” reflects Williams. “We recreated the entire Santiago PoP in a car park and deployed our assets to connect the network and ready it for service. The funny thing was we didn’t need our equipment because the impact was limited, but it is a clear testament. We sent 35 tonnes of equipment to Chile to support a possible stricken PoP site, and left it there for eight weeks until geologists confirmed the threat had passed.”

“It is the people that are key,” adds Mark Francis, vice president, GNOC and NDR. “You need people with the right mental state to get a network back on line.”