Don’t leave network blind spots

Enterprise asset management plays a vital role in ensuring the security and trustworthiness of devices connected to an enterprise network. Effective methods currently used by our IT community often involve the active configuration of a standard operating environment (SOE) on each network-connected host.

This approach works well in strictly managed networks where all devices are centrally regulated by the IT department. However, the effectiveness of such active methods diminishes in networks that are operated in a federated manner, like campus networks that give freedom to various departments/organizations running their own networked assets and accommodate bring-your-own-technology (BYOT) devices. Under such settings, these active methods can lead to security blind spots that go unnoticed by IT departments.

In this post, I discuss recent research by Hassan Habibi Gharakheili, Vijay Sivaraman, and myself from the School of Electrical Engineering and Telecommunications, University of New South Wales that presents a solution for monitoring the enterprise hosts with fine-grained visibility into their network behavioural profiles.

This research passively analyses network traffic from/to all connected hosts in an enterprise network, regardless of their SOE configuration status. By constructing specialized network behavioural profiles with AI-based classifications, real-time inference on the network behavioural patterns of hosts and potential anomalies are generated for IT departments.

Further details of this work can be found in this paper (or the preprint version).

Diversity in functionalities and network behaviours of enterprise hosts

Before delving into our method, let’s take a glimpse at the diversities that exist in the functionalities and underlying network behaviours of hosts connected to a representative enterprise (campus) network. Table 1 illustrates ten popular types of enterprise hosts with their counts, sample DNS names, and a coarse overview of network behaviours (the four rightmost columns) they commonly exhibit.

Heuristically, differences in communication patterns can be observed among various types of hosts based on their internal services (or port numbers on internal hosts) opened to the public, external services (or port numbers on external hosts) they access, the duration of services (flows), and the volume of data being exchanged for each service (represented by packet sizes).

Type	Number of hosts	Sample DNS name	Internal services	External services	Flow duration	Packet size
Website srv	61	www.unswlawjournal.unsw.edu.au	Small, fixed	Large, random	Short	Medium
Authoritative name srv	15	ns1.sdn.unsw.edu.au	Small, fixed	Large, random	Short	Small
VPN srv	13	securevpn.nida.edu.au	Small, fixed	Large, random	Long	Medium
Remote computing srv	16	analyticalcentre2.chem.unsw.edu.au	Medium, fixed	Large, random	Long	Small
File storage srv	14	files.be.unsw.edu.au	Small, fixed	Large, random	Medium	Large
Mail srv	18	smtp.garvan.unsw.edu.au	Medium, fixed	Large, random	Short	Medium
DNS proxy	7	ns6.unsw.edu.au	Large, random	Small, fixed	Short	Small
Web proxy	4	wwwproxy2.library.unsw.edu.au	Large, random	Small, fixed	Short	Medium
NAT gateway	256	uniwide-pat-pool-a-b-c-d.gw.unsw.edu.au	Large, random	Large, random	Medium	Medium
End-host	1,961	minzhaos-macbook-pro.ad.unsw.edu.au	Medium, random	Small, random	Medium	Medium

Table 1 — Ten popular host types identified from DNS names and their coarse network behaviours.

Driven by this hypothesis, we can develop and optimize a method that captures important behavioural profiles of networked hosts through passive analysis of their network traffic.

Capturing (comprehensive) network behaviours of hosts

Diagram of the rooted graph data structure capturing the network behaviour of hosts. — Figure 1 — The rooted graph data structure capturing the network behaviour of hosts.

The first design task required was a data structure that can maintain sufficient information of a host’s network behavioural profile as discussed above. The data structure specific to each enterprise host is updated at runtime when relevant packets are being passively analysed.

Towards this objective, we design a four-layer graph, as depicted in Figure 1. Each layer, from left to right, represents the monitored enterprise host, internal services open to the public, external service being accessed, and external hosts communicating with the enterprise host, respectively. Nodes in adjacent layers are interconnected by links that carry packet and flow metrics, including their counts, volume, and directions.

A constructed graph of an enterprise host can be readily sent to specialized AI models designed to handle graphs. However, those graph-based AI models are in their early stages of becoming explainable, which means — to give contextual reasoning for the inference they make. Therefore, prior to inference models, we systematically define attributes that can be extracted from the host graph to offer contextual meaning for network operators. Broadly speaking, the attributes describe host network behaviours from four aspects:

Aggregate host activity
Utilization of internal transport services
Utilization of external transport services
Top transport-layer services

Classifying hosts by their fine-grained functionalities or coarse-grained network behaviours

Diagram of our dual-grained classification — Figure 2 — Our dual-grained classification scheme.

We then develop the classification process that takes the attributes to make inferences on the behavioural types of a certain enterprise host. In practice, network operators often have a known list of host application types such as website servers, email servers and Wi-Fi routers that are quite popular in their networks. In addition, uncommon host types that are not known by network operators can also emerge.

We note that such infinite types at the application level can be categorized into six finite types for their network-level behaviours, namely TCP/UDP-dominant server/proxies, end hosts and NAT gateways. Therefore, our classification scheme is designed as dual-grained shown in Figure 2. An enterprise host is classified into either fine-grained application types known by the network operator, or coarse-grained network types for further investigation.

Figure 3 — Time-trace of model confidence per class for two host examples: (a) website server; and (b) NAT gateway.

As a showcase of our method, Figure 3 presents the real-time classification results and confidence levels of two representative hosts in our university network, a website server and a NAT gateway. From Figure 3 (a), we can see that this host is constantly classified as a website server with 100% confidence.

In addition, it also behaves like an end-host with confidence up to 40%, particularly at night time. After checking with the server operator, we know that it undertakes regular updates, thus, exhibits such untypical behaviours. From Figure 3 (b), a host is classified as either a high-confidence NAT gateway during the daytime on workdays or an end-host otherwise, indicating the busy hours of a Wi-Fi router.

Those are the key ideas of this work. The full details can be found in our paper published in the Computer Networks journal.

Minzhao Lyu is currently a postdoctoral research associate at the University of New South Wales, Sydney, NSW, Australia, where he received a B.Eng. degree (First Class Hons) in electrical engineering and a PhD degree in network security in 2017 and 2022, respectively. His research primarily focuses on making telecommunications data networks secure and performant using network traffic analysis, programmable networks, and machine learning techniques.

Contributors/co-authors of the original paper: Hassan Habibi Gharakheili and Vijay Sivaraman.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Diversity in functionalities and network behaviours of enterprise hosts

Capturing (comprehensive) network behaviours of hosts

Classifying hosts by their fine-grained functionalities or coarse-grained network behaviours

Leave a Reply Cancel reply