Every IPv4 address on the Internet gets scanned day-in-day-out, resulting in trillions of scan packets on a daily basis. Botnet zombies scan for new devices to infect, and attackers scan to identify vulnerable hosts — to exploit them or leverage them in subsequent amplification attacks.
But how can you find out if the scans you observe in your network are just artifacts of scanners targeting the entire IPv4 space, or if these scans specifically target your network and its hosts? How much scanning is there in today’s Internet anyway?
Internet-wide scans vs targeted scans
The classical way to determine current levels of scanning on the Internet is to leverage darknets — large prefixes of otherwise unused address space, which collect incoming traffic. Here, if a scanner naively targets IP addresses randomly, it will sooner or later target addresses in the darknet, making detection possible.
But what if a malicious actor scans specific prefix ranges, IP addresses, or networks? Such scans are unlikely to be visible in darknets, because, there’s not much to find there.
As part of a collaborative effort between MIT and Akamai, we sought to tackle this question, the results of which we published and presented at the recent ACM Internet Measurement Conference.
Watch: Philipp Richter presenting at AMC IMC 2019
Instead of relying on a classical darknet, we leveraged traffic collected at the firewalls of some 89,000 machines of a major Content Delivery Network (CDN), which are located inside more than 1,300 Autonomous Systems, and are widely distributed geographically across the Internet (the considered machines are a subset of all machines of this CDN). This distributed network telescope enables the tracking of not only Internet-wide scans, but also scans that target only specific prefixes and networks.
3,000 daily packets per IPv4 address from Internet-wide scans
To study scanning activity, we leveraged unsolicited packets that were dropped and logged by the firewalls of the CDN machines. In particular, packets that were not destined to any port number on which the CDN machines offer services, such as port 80, 443, and a few others. These packets get dropped and a portion get logged.
Studying unsolicited traffic for individual machines, we made a surprising finding: every IP address of every machine logged at least some 3,000 packets day-in-day-out. Here, it did not matter if the IP address was exposed to clients via DNS, or only used for internal communication.
We corroborated this finding with data from a large /8 darknet and received the same result: Every IPv4 address received some 3,000 packets on a daily basis. However, while many machines received only this baseline radiation of 3,000 packets per day (see Figure 2, example 1), some machines log significantly more unsolicited traffic (Figure 2, example 2).
We developed a method to identify individual scan events (see more details in paper) and then classified scans into Internet-wide scans and targeted scans. Internet-wide scans target the entire IPv4 space, or a random subset of it, while targeted scans only hit a narrow subset of the IPv4 space.
Leveraging this classification, we found that some 90% of the baseline radiation of 3,000 packets per day per-IPv4 address is the result of Internet-wide scans. This translates to more than 8 trillion daily scan packets per day on the Internet!
As per the two example machines (Figure 2), we can see that the baseline radiation largely cancels out when removing all traffic that is the result of these Internet-wide scans.
This finding can be put to good use in practice: if you log only baseline radiation unsolicited traffic on your IP addresses, and this traffic is in line with current levels baseline radiation, then your network is most likely just experiencing the overall level of scanning activity. If you log significantly more unsolicited scan traffic, you have a strong indicator that scanners target your infrastructure specifically.
Scrutinizing scan target selection strategies
We identified and classified individual scanning campaigns in their data.
Internet-wide scans, full IPv4 space or random subsets, contribute some 67% of all the scan traffic. However, a third of all scan traffic comes from scanners that target specific IP addresses or prefixes. Of these, some 4% of targeted scan traffic can be attributed to scanners targeting the CDN itself (for example, by scanning domain names hosted by the CDN), and the remaining 29% of packets are the result of localized scans that target specific prefixes and networks, in which the CDN simply happens to have machines located. For more details on how to tell these two cases apart, please see the paper.
Localized scans target specific prefixes and networks
Localized scans are scans that are directed at specific prefixes and networks. They make up some 29% of all the scan traffic inspected.
Notably, such scans cannot be detected in darknets, since the actors direct their traffic at specific networks as opposed to randomly probing the whole Internet.
We scrutinized these scans in great detail, finding that they are vastly different from scans that target the IPv4 space randomly.
One of the key differences of localized scans is that they target different port numbers when compared to Internet-wide scans.
While most of the scans that are also visible in darknets target port 23/TCP (Telnet) and Windows Remote Desktop Protocol (445/TCP), we found that localized scans target different ports — 8291/TCP and 7547/TCP are ports related to critical vulnerabilities in home routers. These ports were scanned heavily in the past, for example, by the Hajime botnet, but have not appeared in the top ranks of public scan statistics in more recent times.
Indeed, 8291/TCP and 7545/TCP do not even appear in the top 50 targeted port numbers from Internet-wide scans! As a result, they are barely visible in darknets. Our results, however, show that 8291/TCP and 7547/TCP are indeed still heavily scanned, but exclusively in a targeted way!
Contributors: Arthur Berger
Philipp Richter is a senior research scientist with Akamai and a research affiliate at MIT. At the time this paper was published, he was a postdoctoral research associate at MIT.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.