How to: Designing a mission-critical network

Air traffic control networks are one of many mission critical networks that need special considerations (Archives New Zealand, Flickr)

Ask 10 network engineers to design a network and you’ll most likely get 10 different designs each as intricate as their experience. However, ask them to design a mission-critical network and you will probably get many scratching their heads and consulting Google for examples.

What is a mission-critical network

A mission-critical network that many in the ICT industry would be familiar with is that of an Internet Service Provider (ISP) — they provide a critical service, which interrupted, can have a significant impact on the survival and operation of businesses and society.

Other examples include power and water system networks, banking system networks, and aviation system networks; the latter of which I work on and intend to shed light on in this post. And in doing so, I hope to highlight how the same principle and the logic behind the aviation network design can be applied to other mission-critical systems.

Designing an aviation network 101

The first things you need to ask before drawing a network diagram or choosing the network equipment vendor and network technology are: who are the key users of the network and how will we design the network to enable future growth?

In answering these questions, ISPs will consider industry business demographics and growth as well as Internet traffic forecasts, which will dictate investment in additional backhaul bandwidth.

As can be expected, aviation networks have a range of different priorities and influencing factors. These networks help to interconnect all the aviation systems, including radar surveillance systems, which have very little bandwidth requirement. However, low bandwidth doesn’t mean it’s less important — the network needs to comply with the same high availability rules as telecommunications to deliver radar and voice traffic on time and with the least packet loss as it can.

It is worth noting, high availability is not just a network concept; it is a design principle that spans across entire systems, and a network is just part of it.

Take services (applications) for example. All aviation systems can be broken into different services; the collaboration of services then become a system. To protect a service from being interrupted or failed, a standby service is required — a hot standby service is ideally better. Underneath this service are physical servers, which require active/standby redundancy as well.

In the past, one service generally occupied a single physical server. Now with the help of OS-level virtualization (for example, VMware, KVM, Hyper-V) and container-based service level virtualization (for example, Docker, Kubernetes), today’s compute resources utilization efficiency has been greatly improved.

Last, a pool of physical servers will be distributed in different locations, to achieve geographic high availability.

So far, the service has been fully protected by a three-layer redundancy: service layer, physical server layer, and location layer (Figure 1).

Figure 1 — An example of a simple aviation network design showing the three layers of redundancy: service layer, physical server layer, and location layer.

Now it is time to glue each component together (Figure 2)

Figure 2 — Having interlocation networks set up and connected provides redundancy.

To provide resilience and rapid failover reaction for each service, the network has to be rock solid as well as agile.

Let’s start from the physical layer. To make the network strong enough, a network with multiple redundancy links is commonly employed. Within a campus or data centre, fibre media is cheap, flexible and extendable. Interlocation links are another story.

A standard enterprise network may have at least two links or more from one location to another. Typically, there will be multiple single mode fibre as backhaul links from one location to another. Whether this provides suitable redundancy depends — the two interlocation fibres may be installed on the same path and thus at risk of being damaged by a single event.

This is why alternative links, including microwave and satellite links, have to be used as well as fibre links for aviation networks.

Figure 3 — Interlocation networks need to be connected by a variety of links, including satellite, microwave and fibre.

Above the physical layer, is the soul of the modern network: the ethernet/IP model.

Although it is a success in the network industry, the ethernet/IP model does provide some headaches for aviation network design. By its nature, it has no guarantee to the packets/frames, unlike the fabric channel. To overcome the congenital defect, lots of protocols have been invented to protect the data — no exceptions to the mission critical networks.

To provide network agility within a network data centre, BGP-based IP fabric and spine/leaf architecture bring flexibility and good resilience to the servers and services.

MPLS takes care of failovers between data centres, with its fabulous features including fast reroute or node protection, auto-bandwidth, and active/hot standby path.

Finally, quality of service (QoS) must be correctly deployed across the entire network. Because of the diversity of link media — each type of media has its own attributes including link buffer size and speed — the QoS policy must be carefully planned and fully tested.

Figure 4 — MPLS takes care of failovers between data centres while IP fabric provides agility.

Integrating network security

Network security is a complex topic, regardless of the type of business. Every corporation has its information technology security framework. That said, there are some generic rules, as listed below, that aviation networks will also include:

Developing enclosed systems — absolute physical separation provides greater security than any firewall, or network traffic splitting (VLAN or VRF) system. To achieve the communication between enclosed systems, strict procedures should be applied to avoid human mistakes.
Deploying network device and system service hardening — unauthorized device management traffic and protocols should be blocked.
Using multi-factor authentication — it is a very common method to protect network infrastructure access and adds an extra layer of identity authentication.
Keep operation systems of network devices up to date — it will be too late when a software bug has become well known worldwide.
Enforcing security policy at the edge of networks that connect to the Internet or interconnect with other organizations — as an extra measure it’s recommended to have bidirectional enforcement given that most security threats come from inside the organization.
Employing modular security designs — these protect other systems from being impacted.
Recording all data flow activities inside an organization — this metadata can be very useful when investigating a security breach.

Network automation

As it does for ISP networks, network automation has many useful applications in mission-critical networks, including aviation networks.

Aside from protecting against human error, applications such as configuration template systems enable network changes to be automatically generated and reviewed by engineers, then configured to specific devices.

It’s worth noting, though, that automation is a double-edged sword — a misconfiguration can be easily amplified in a very short time. As such, be sure that automation tasks are tested thoroughly and peer reviewed in a lab environment.

Building robust, fully redundant and resilient networks

In summary, any network design, critical or non-critical, needs to consider the key priorities of the business, and the resources available now and into the future.

For the aviation industry, and most mission critical systems, SAFETY is the key priority of its business. And from a network design point of view, it is our priority to build a robust, fully redundant and resilient network to protect the systems from predictable outages and human mistakes, as well as provide enough flexibility for certain degrees of tolerance of unforeseen issues.

Quincy Liao is the principal network design engineer at Airways NZ.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

3 Comments

anna April 24, 2019 at 5:47 am

Interesting post.

Wondering what is the type of networking deployed within an aircraft . Is it based on Ethernet/IP or FiberChannel or some other proprietary technologies…
Also what type of redundancy would be using for the internal aircraft networking..

Reply ↓
Regan Hughes May 6, 2019 at 5:05 pm

Great post Quincy. It is comprehensive and well written.

Reply ↓
Vikas Gupta June 19, 2019 at 4:12 pm

Very well explained. I would appreciate such articles in future as well. Thanks a ton !

Reply ↓