Suppose that your home router could decide for itself what each Internet of Things (IoT) device in your home is allowed to do.
It might decide, for example, that your smart fridge should only access its manufacturer’s server. To do that, your router would need to establish what type of device it was dealing with. It would also need to know what forms of network activity are appropriate for devices of that type to work properly and remain secure.
Over the last year, various academic articles have been published describing the use of machine learning algorithms to determine device types. We, at SIDN Labs, therefore, thought it would be useful to reflect on a topic that’s important concerning such algorithms, namely periodic network traffic.
Recognizing IoT device types
Various types of IoT devices can be connected to a home network, including smart light bulbs, TVs and vacuum cleaners. Over the last year, several universities have published methods of classifying the IoT device type based on the associated network traffic patterns.
Being able to classify the device type is important because devices connected to a home network don’t always reveal their type themselves (even though it’s technically possible to do so, for example, using MUD). AuDI and N-BaIoT are two of the published methods, both of which are based on machine learning. They involve analysing the network traffic associated with your IoT devices to distinguish patterns.
Although they make use of different algorithms, both methods assume that devices perform activities periodically. Your smart fridge might check for firmware updates every Sunday evening, for example. It’s also assumed that devices of the same type undertake broadly similar periodic activities. So, TVs are expected to establish video streams and be active mainly in the evening, while smart thermometers send packets maybe once every five minutes.
Extracting a time series
SPIN is an open-source system that we’ve developed to protect the Internet and its users against insecure smart devices in home networks. In recent months, we’ve been exploring the potential of the classification methods referred to above for enhancing SPIN.
The first practical step in this process was to visualize periodic network traffic. We decided to do this based on the AuDI method of converting network traffic data into a time series.
First, network traffic has to be separated into distinct flows. In this context, a flow is defined as a series of network packets sent from an IoT device using a given communication protocol (NTP, ARP, RTSP, and so forth). Within this definition (unlike the general definition of a flow), the destination is irrelevant because it’s not who a device is communicating with, but what it’s doing.
Next, each flow has to be converted into a binary time series with a sample rate of one measurement per second. In each case, the measured value is either 1, indicating that one or more packets were sent in the relevant period, or 0, indicating that no packets were sent. The AuDI paper describes how the time series is used for signal analysis and classification. In the present context, we are concerned with the visualization of the series.
Case study: periodic network traffic from smart light bulbs
We used the approach described above to generate a time series for four smart lightbulbs of various brands. We connected the light bulbs to SPIN for 24 hours and gathered the time series. We then visualized the time series with two questions in mind:
- Is network traffic from IoT devices periodic?
- Do devices of the same type undertake similar periodic activities?
Figure 1 shows the active flows from a Tuya light bulb in the first five minutes after the light bulb is switched on. In the left-hand region of the graph, you will see a lot of coloured circles, indicating numerous active flows. For example, there was DNS traffic (two light blue circles in quick succession, UDP port 53) and HTTP traffic (three orange circles, TCP port 80). Quite soon, however, the signal seems to stabilize. The red and blue flows are both active about once a minute (ARP and MQTT on TCP port 1883, respectively), while the yellow flow is active every three seconds (UDP port 6666).
If you download the interactive plot (see the table below) you can zoom out and examine other periods. One of the things you’ll notice is that there is DHCP traffic every 5 hours and 45 minutes (purple circles, UDP port 67). There is also some non-periodic activity or activity whose periodicity cannot be discerned from a monitoring period of this length.
We can conclude that traffic from the Tuya light bulb is (semi-) periodic. Although our test group was small, it is sufficient to demonstrate that various light bulbs can exhibit almost identical periodic activity. That much is apparent from Figure 2, which shows the first two operational minutes of a Baixin light bulb. While the intervals between the periods of activity may differ (see, for example, the orange circles, TCP port 80), there is considerable broad similarity.
The behaviour of the light bulbs showed similarities later in the time series as well, generating ICMP traffic after about six hours and activity on TCP port 56010. We, therefore, suspect that the two makes of bulb have the same firmware loaded.
The periodic activities of the other two light bulbs were very different, however (see the table below), with the Omeran light bulb, ARP, DNS and TCP traffic flows via port 8805 are active every two minutes. By contrast, the Mi Led is active via UDP port 8053 every 15 seconds and via ARP every 30 seconds. In other words, devices of the same type sometimes have similar activity patterns, but not always. Of course, it might be that the periodic activities of other smart light bulbs resemble those of the Omeran and Mi Led. Data points from more IoT devices would be required to determine whether that is the case.
|First 5 minutes
|Full data set
Table 1 — Links to interactive plot HTML files.
Additional time series
In view of the findings outlined above, we want to gather a large number of additional time series.
To make that possible, we’ll be adding a feature to SPIN so that users can visualize a time series and upload them to us.
We’ll then apply the AuDI and N-BaIoT methods to the additional time-series to build our own classification model. If the model performs well, device type classification functionality may be added to SPIN in due course.
The compiled data may also be useful to researchers: methods are often evaluated using data from a controlled lab environment, whereas we want to compile a time series from real users in typical home environments. Naturally, participation in the development of the planned data upload functionality will be optional, and the new features will be implemented with privacy and security in mind — we are open to suggestions on the best way to do this responsibly.
We’d also like to hear whether you’re interested in supporting this initiative by uploading a time series and, if so, on what conditions — if so, leave a comment below.
Adapted from original post which appeared on SIDN Labs Blog.
Thymen Wabeke is a research engineer at SIDN Labs.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.