Enterprise asset management plays a vital role in ensuring the security and trustworthiness of devices connected to an enterprise network. Effective methods currently used by our IT community often involve the active configuration of a standard operating environment (SOE) on each network-connected host.
This approach works well in strictly managed networks where all devices are centrally regulated by the IT department. However, the effectiveness of such active methods diminishes in networks that are operated in a federated manner, like campus networks that give freedom to various departments/organizations running their own networked assets and accommodate bring-your-own-technology (BYOT) devices. Under such settings, these active methods can lead to security blind spots that go unnoticed by IT departments.
In this post, I discuss recent research by Hassan Habibi Gharakheili, Vijay Sivaraman, and myself from the School of Electrical Engineering and Telecommunications, University of New South Wales that presents a solution for monitoring the enterprise hosts with fine-grained visibility into their network behavioural profiles.
This research passively analyses network traffic from/to all connected hosts in an enterprise network, regardless of their SOE configuration status. By constructing specialized network behavioural profiles with AI-based classifications, real-time inference on the network behavioural patterns of hosts and potential anomalies are generated for IT departments.
Further details of this work can be found in this paper (or the preprint version).
Diversity in functionalities and network behaviours of enterprise hosts
Before delving into our method, let’s take a glimpse at the diversities that exist in the functionalities and underlying network behaviours of hosts connected to a representative enterprise (campus) network. Table 1 illustrates ten popular types of enterprise hosts with their counts, sample DNS names, and a coarse overview of network behaviours (the four rightmost columns) they commonly exhibit.
Heuristically, differences in communication patterns can be observed among various types of hosts based on their internal services (or port numbers on internal hosts) opened to the public, external services (or port numbers on external hosts) they access, the duration of services (flows), and the volume of data being exchanged for each service (represented by packet sizes).
Type | Number of hosts | Sample DNS name | Internal services | External services | Flow duration | Packet size |
Website srv | 61 | www.unswlawjournal.unsw.edu.au | Small, fixed | Large, random | Short | Medium |
Authoritative name srv | 15 | ns1.sdn.unsw.edu.au | Small, fixed | Large, random | Short | Small |
VPN srv | 13 | securevpn.nida.edu.au | Small, fixed | Large, random | Long | Medium |
Remote computing srv | 16 | analyticalcentre2.chem.unsw.edu.au | Medium, fixed | Large, random | Long | Small |
File storage srv | 14 | files.be.unsw.edu.au | Small, fixed | Large, random | Medium | Large |
Mail srv | 18 | smtp.garvan.unsw.edu.au | Medium, fixed | Large, random | Short | Medium |
DNS proxy | 7 | ns6.unsw.edu.au | Large, random | Small, fixed | Short | Small |
Web proxy | 4 | wwwproxy2.library.unsw.edu.au | Large, random | Small, fixed | Short | Medium |
NAT gateway | 256 | uniwide-pat-pool-a-b-c-d.gw.unsw.edu.au | Large, random | Large, random | Medium | Medium |
End-host | 1,961 | minzhaos-macbook-pro.ad.unsw.edu.au | Medium, random | Small, random | Medium | Medium |
Table 1 — Ten popular host types identified from DNS names and their coarse network behaviours.
Driven by this hypothesis, we can develop and optimize a method that captures important behavioural profiles of networked hosts through passive analysis of their network traffic.
Capturing (comprehensive) network behaviours of hosts
The first design task required was a data structure that can maintain sufficient information of a host’s network behavioural profile as discussed above. The data structure specific to each enterprise host is updated at runtime when relevant packets are being passively analysed.
Towards this objective, we design a four-layer graph, as depicted in Figure 1. Each layer, from left to right, represents the monitored enterprise host, internal services open to the public, external service being accessed, and external hosts communicating with the enterprise host, respectively. Nodes in adjacent layers are interconnected by links that carry packet and flow metrics, including their counts, volume, and directions.
A constructed graph of an enterprise host can be readily sent to specialized AI models designed to handle graphs. However, those graph-based AI models are in their early stages of becoming explainable, which means — to give contextual reasoning for the inference they make. Therefore, prior to inference models, we systematically define attributes that can be extracted from the host graph to offer contextual meaning for network operators. Broadly speaking, the attributes describe host network behaviours from four aspects:
- Aggregate host activity
- Utilization of internal transport services
- Utilization of external transport services
- Top transport-layer services
Classifying hosts by their fine-grained functionalities or coarse-grained network behaviours
We then develop the classification process that takes the attributes to make inferences on the behavioural types of a certain enterprise host. In practice, network operators often have a known list of host application types such as website servers, email servers and Wi-Fi routers that are quite popular in their networks. In addition, uncommon host types that are not known by network operators can also emerge.
We note that such infinite types at the application level can be categorized into six finite types for their network-level behaviours, namely TCP/UDP-dominant server/proxies, end hosts and NAT gateways. Therefore, our classification scheme is designed as dual-grained shown in Figure 2. An enterprise host is classified into either fine-grained application types known by the network operator, or coarse-grained network types for further investigation.
As a showcase of our method, Figure 3 presents the real-time classification results and confidence levels of two representative hosts in our university network, a website server and a NAT gateway. From Figure 3 (a), we can see that this host is constantly classified as a website server with 100% confidence.
In addition, it also behaves like an end-host with confidence up to 40%, particularly at night time. After checking with the server operator, we know that it undertakes regular updates, thus, exhibits such untypical behaviours. From Figure 3 (b), a host is classified as either a high-confidence NAT gateway during the daytime on workdays or an end-host otherwise, indicating the busy hours of a Wi-Fi router.
Those are the key ideas of this work. The full details can be found in our paper published in the Computer Networks journal.
Minzhao Lyu is currently a postdoctoral research associate at the University of New South Wales, Sydney, NSW, Australia, where he received a B.Eng. degree (First Class Hons) in electrical engineering and a PhD degree in network security in 2017 and 2022, respectively. His research primarily focuses on making telecommunications data networks secure and performant using network traffic analysis, programmable networks, and machine learning techniques.
Contributors/co-authors of the original paper: Hassan Habibi Gharakheili and Vijay Sivaraman.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.