Don’t leave network blind spots

By on 28 Jul 2023

Category: Tech matters

Tags: , , ,

Blog home

Enterprise asset management plays a vital role in ensuring the security and trustworthiness of devices connected to an enterprise network. Effective methods currently used by our IT community often involve the active configuration of a standard operating environment (SOE) on each network-connected host.

This approach works well in strictly managed networks where all devices are centrally regulated by the IT department. However, the effectiveness of such active methods diminishes in networks that are operated in a federated manner, like campus networks that give freedom to various departments/organizations running their own networked assets and accommodate bring-your-own-technology (BYOT) devices. Under such settings, these active methods can lead to security blind spots that go unnoticed by IT departments.

In this post, I discuss recent research by Hassan Habibi Gharakheili, Vijay Sivaraman, and myself from the School of Electrical Engineering and Telecommunications, University of New South Wales that presents a solution for monitoring the enterprise hosts with fine-grained visibility into their network behavioural profiles.

This research passively analyses network traffic from/to all connected hosts in an enterprise network, regardless of their SOE configuration status. By constructing specialized network behavioural profiles with AI-based classifications, real-time inference on the network behavioural patterns of hosts and potential anomalies are generated for IT departments.

Further details of this work can be found in this paper (or the preprint version).

Diversity in functionalities and network behaviours of enterprise hosts

Before delving into our method, let’s take a glimpse at the diversities that exist in the functionalities and underlying network behaviours of hosts connected to a representative enterprise (campus) network. Table 1 illustrates ten popular types of enterprise hosts with their counts, sample DNS names, and a coarse overview of network behaviours (the four rightmost columns) they commonly exhibit.

Heuristically, differences in communication patterns can be observed among various types of hosts based on their internal services (or port numbers on internal hosts) opened to the public, external services (or port numbers on external hosts) they access, the duration of services (flows), and the volume of data being exchanged for each service (represented by packet sizes).

Type Number of hosts Sample DNS name Internal services External services Flow duration Packet size
Website srv 61 Small, fixed Large, random Short Medium
Authoritative name srv 15 Small, fixed Large, random Short Small
VPN srv 13 Small, fixed Large, random Long Medium
Remote computing srv 16 Medium, fixed Large, random Long Small
File storage srv 14 Small, fixed Large, random Medium Large
Mail srv 18 Medium, fixed Large, random Short Medium
DNS proxy 7 Large, random Small, fixed Short Small
Web proxy 4 Large, random Small, fixed Short Medium
NAT gateway 256 Large, random Large, random Medium Medium
End-host 1,961 Medium, random Small, random Medium Medium

Table 1 — Ten popular host types identified from DNS names and their coarse network behaviours.

Driven by this hypothesis, we can develop and optimize a method that captures important behavioural profiles of networked hosts through passive analysis of their network traffic.

Capturing (comprehensive) network behaviours of hosts

Diagram of the rooted graph data structure capturing the network behaviour of hosts.
Figure 1 — The rooted graph data structure capturing the network behaviour of hosts.

The first design task required was a data structure that can maintain sufficient information of a host’s network behavioural profile as discussed above. The data structure specific to each enterprise host is updated at runtime when relevant packets are being passively analysed.

Towards this objective, we design a four-layer graph, as depicted in Figure 1. Each layer, from left to right, represents the monitored enterprise host, internal services open to the public, external service being accessed, and external hosts communicating with the enterprise host, respectively. Nodes in adjacent layers are interconnected by links that carry packet and flow metrics, including their counts, volume, and directions.

A constructed graph of an enterprise host can be readily sent to specialized AI models designed to handle graphs. However, those graph-based AI models are in their early stages of becoming explainable, which means — to give contextual reasoning for the inference they make. Therefore, prior to inference models, we systematically define attributes that can be extracted from the host graph to offer contextual meaning for network operators. Broadly speaking, the attributes describe host network behaviours from four aspects:

  • Aggregate host activity
  • Utilization of internal transport services
  • Utilization of external transport services
  • Top transport-layer services

Classifying hosts by their fine-grained functionalities or coarse-grained network behaviours

Diagram of our dual-grained classification
Figure 2 — Our dual-grained classification scheme.

We then develop the classification process that takes the attributes to make inferences on the behavioural types of a certain enterprise host. In practice, network operators often have a known list of host application types such as website servers, email servers and Wi-Fi routers that are quite popular in their networks. In addition, uncommon host types that are not known by network operators can also emerge.

We note that such infinite types at the application level can be categorized into six finite types for their network-level behaviours, namely TCP/UDP-dominant server/proxies, end hosts and NAT gateways. Therefore, our classification scheme is designed as dual-grained shown in Figure 2. An enterprise host is classified into either fine-grained application types known by the network operator, or coarse-grained network types for further investigation.

Time-trace of model confidence per class for two host examples: (a) website server; and (b) NAT gateway.
Figure 3 — Time-trace of model confidence per class for two host examples: (a) website server; and (b) NAT gateway.

As a showcase of our method, Figure 3 presents the real-time classification results and confidence levels of two representative hosts in our university network, a website server and a NAT gateway. From Figure 3 (a), we can see that this host is constantly classified as a website server with 100% confidence.

In addition, it also behaves like an end-host with confidence up to 40%, particularly at night time. After checking with the server operator, we know that it undertakes regular updates, thus, exhibits such untypical behaviours. From Figure 3 (b), a host is classified as either a high-confidence NAT gateway during the daytime on workdays or an end-host otherwise, indicating the busy hours of a Wi-Fi router.

Those are the key ideas of this work. The full details can be found in our paper published in the Computer Networks journal.

Minzhao Lyu is currently a postdoctoral research associate at the University of New South Wales, Sydney, NSW, Australia, where he received a B.Eng. degree (First Class Hons) in electrical engineering and a PhD degree in network security in 2017 and 2022, respectively. His research primarily focuses on making telecommunications data networks secure and performant using network traffic analysis, programmable networks, and machine learning techniques.

Contributors/co-authors of the original paper: Hassan Habibi Gharakheili and Vijay Sivaraman.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *