It’s the stupid software, stupid

O2 on wall.jpg

If you’re running a network on software you need to work in new ways. Alan Burkitt-Gray reports on Ericsson’s expired certificate that brought O2’s and SoftBank’s mobile networks down

The Ericsson software glitch that brought down networks in December 2018 showed just how the industry has to change its thinking from the days of the hardware era.

Telefónica’s O2 UK mobile network and SoftBank’s fixed and mobile networks in Japan were out of operation for hours because of, admitted Ericsson, “an expired certificate in the software versions installed with these customers”. Ericsson told SoftBank that networks in 11 countries were affected. It wasn’t even old software: it had been installed just nine months earlier.

In the UK the outage hit 25 million O2 customers and millions more on virtual networks that run on the Telefónica infrastructure. But, noted William Webb, a consultant who used to run R&D at the UK regulator, Ofcom, it also affected internet of things (IoT) services – London’s bus and cycle-hire networks as well as electric car charging and smart metering.

“I suspect we’ll see this happen in future, possibly with more frequency,” said Webb. “Operators are clearly going to worry a lot about it – the reputational damage and the financial damage. With more IoT networks this would become more and more severe, and it could be life-threatening.”

What should operators do? The simple answer, he said, is: “Don’t let your certificates expire.” But in the age of virtualisation – or softwarisation – networks “are inherently complicated software machines with old stuff and new stuff, and with 2G, 3G and 4G networks operating in parallel from different vendors”.

Dan Pitt of MEF, the former Metro Ethernet Forum, said: “As you start to deliver software instead of hardware there’s a different licensing model. If the licence is out of date, the software stops working by design.”

Sue Rudd, director of service provider analysis at Strategy Analytics, groans: “This is the kind of thing we’re bedevilled with.” The industry is moving to software but it’s adopting habits of the enterprise IT business. That means you can’t ignore alerts that licences are about to expire.

“This is not just an Ericsson problem, it’s an industry problem. Someone should have seen the warning message. Was it ignored? Even if it went to network operations, they probably ignored it too. It’s not OK to do just what the IT guys do. With telecoms, you must always provide users with service.”

Roberto Kompany, who leads the next-generation wireless programme at market research company Analysys Mason, agrees: “The ecosystem is changing: it’s becoming more softwarised, and more complex because there are more vendors.”

Pitt agreed: “In a disaggregated world, the hardware and multiple software programs might be procured from different sources, leading to interaction problems.”

Even with industry standards, “each vendor can find pieces to tweak”, noted Kompany.

The TM Forum is often regarded as the organisation that encourages the telecoms software industry to work together. Chief architect David Milham warned that this is getting harder. “As we move to a software-based environment the velocity of change is speeding up.” Hardware changes are relatively infrequent, “every six months or so”, but “with a virtual environment, you could in principle change the software daily”.

Are different staff needed in operators in the software era? Pitt at MEF gave telcos “three main choices in evolving to the software-based world: hire new staff with the new skills; retrain current staff to acquire new skills; or outsource those parts of the business that require new skills to organisations that specialise in them”.

PCCW Global took the middle way in moving to a software approach, though it also took in new skills through the acquisition of Console Connect.

CTO Paul Gampe said: “We were on-net in 100 days, an amazing achievement. But we didn’t change the people – there is immense value in institutional knowledge. It’s not the people, but how you develop large software projects.”

Gampe added: “Post-acquisition there has been substantial investment in resources and education in how to re-educate the existing staff. Arm them with tools to ensure they can do the job. You can’t dump a bunch of software people into a network. Networks are a living thing.”

But the important lesson of the Ericsson software outage is clear, agreed most of the people I spoke to. “You have to think how to handle error messages that are about a network outage. Make sure the operator sees the alert at high level,” said Rudd.

Pitt agreed: “If you license software you have a responsibility to make sure it works.”

Telcos have to agree with their software companies what happens when a licence expires – and they should trust each other, said Milham. “Certificates and licences shouldn’t be unknowns. You need to build a software asset register.”