Recent, high-profile network outages have shown how the convergence of people, processes and systems, accelerated by the introduction of new technologies, has created the potential for systemic failures to proliferate across the telecommunications ecosystem, impacting multiple communications providers at the same time. Against this backdrop of increased risk and uncertainty, communications providers should ask: ‘Are our existing approaches to resilience sufficient to meet customer and stakeholder expectations for service delivery?’ Neil Bourke, Laura Schmuttermeier and Lucy Jones outline four areas that heads of resilience and executive team members may find helpful to consider within the context of their own communications providers’ approach to resilience.
Align risk management processes, external suppliers and other parties to help identify and manage emerging issues
Communications providers may outsource operational aspects of service delivery, relying on external facilities and procuring network services from other suppliers. However, they cannot outsource the risks associated with these. Communications providers need to understand – and, where possible, mitigate – the concentration risks from increased reliance on shared infrastructure services and common suppliers.
This may mean better aligning risk management systems, such as operational risk, quality management, cyber and business continuity to share information about known risks quickly and easily. It could mean adopting a ‘business services’ view to understand what’s critical to customers and using this view to identify operational dependencies and concentration risks that need to be managed.
It could also mean working with external suppliers and other parties to ensure they have complementary approaches to resilience. These might include a shared understanding of delivery priorities, critical business activities for customers, joint testing and exercises and the acceptance of alternative arrangements that may be used during a period of disruption.
Invest in diverse resilience solutions, but accept that disruptions will still happen
Although it’s preferential for communications providers to invest in measures designed to reduce the possibility of a disruption occurring, it’s equally important to recognise that some disruptions will still happen. As such, it may be helpful for communications providers to build a layered approach towards resilience, catering for disruptions of different type and severity.
This approach could include investing in redundancy where appropriate and/or diversifying technical solutions to reduce risks and technical ‘hotspots’ or focusing on enhancing processes for response, restoration and repair. It might encompass developing resilience capabilities to address reasonable worse case scenarios such as systemic or common-mode failure, as well as customer redress actions.
Build resilience and crisis management capabilities (especially at Boardroom level)
Increased Government and regulatory focus in the telecommunications sector and other industries is placing more emphasis on senior management and Boards to demonstrate commitment to – and accountability for – resilience. Board members may not be experts in operational resilience, but it’s important they have the knowledge to ask the right questions and make informed decisions at critical junctures.
Board-level involvement can be improved by:
*Asking the Board to review and approve the communications provider’s tolerance for disruption. For example, UK financial regulators have recently introduced the concept of ‘impact tolerance statements’ for severe but plausible events in a discussion paper on an approach to improve the operational resilience of firms and financial market infrastructures
*Asking the Board to participate in (or review) the outcome of reasonable worse case scenario testing
*Ensuring Boards consider investment in operational resilience based on deficiencies in resilience arrangements identified through stress-testing. It’s worth noting that full testing of the network is nearly impossible. However, communications providers should use the ‘business services view’ to focus on testing highly critical components whose failure could have the most severe impact across the service
*Ensuring the Board is ‘crisis-ready’. This might include involving Board members in crisis management exercises to improve familiarity with the crisis management framework, exploring which decisions they may be consulted on and need their approval and preparing the chairman for delivering an external media response if required
Focus on operational enhancements to future-proof resilience
Customers may acquire services from separate communications providers believing these to be independent, but might not be aware that communications providers may share infrastructure or rely on the same third party to deliver the service. It can come as a surprise, then, if a leased line or shared infrastructure fails and the customer loses multiple services simultaneously.
To reduce single points of failure in the service and optimise the customer experience, communications providers need to focus on future-proofing resilience. This could include switching from reactive to proactive operational processes. This means that communications providers need to be proactive when it comes to managing faults and service degradation, for example by investing in technologies to improve diagnostic capabilities, sharing documented strategies such as call-gapping and prioritisation techniques for managing network congestion with other communications providers that have a legitimate interest, understanding response, restoration and repair times while always focusing on escalation procedures and timely response management and, last but not least, anticipating how critical services such as 999 and key customers will be prioritised during congestion periods.
There are multiple sector-wide fora and protocols such as NEAT, TIDIE and ResilienceDirect that provide opportunities for communications providers to exchange information proactively about the resilience of the wider network. Where possible, communications providers might also consider participating in sector-wide exercises to identify hidden assumptions and network ‘pinch-points’ and also improve joint response capabilities.
Learning from past mistakes
Communications providers should treat past disruptions as an opportunity to enhance operational arrangements, inviting independent analysis where appropriate and routinely performing post-incident reviews to identify lessons learned and improvements.
The UK Government’s Electronic Communications Resilience and Response Group recently issued updated infrastructure resilience guidelines reflecting an increased interest in the resilience of the UK’s Critical National Infrastructure and the communications providers operating within an evolving and hyper-connected ecosystem.
The guidelines helpfully expand the discussion around what resilience is and distinguish it from traditional business continuity by advocating a more comprehensive framework for managing a broad range of disruptive risks. They also provide detail on how communications providers can build operational resilience within their own organisations and the wider sector, including helpful technical and operational guidance and the standards required to achieve this.
Neil Bourke (Director, Crisis and Resilience), Laura Schmuttermeier (Director) and Lucy Jones (Manager, Crisis and Resilience) are members of the Risk Advisory Practice at Deloitte
*This article has been reproduced with the kind permission of Deloitte