Internet-wide scanning — the process of connecting to every public IPv4 address on a targeted port — is a standard research technique for understanding real-world service configuration and deployment.
However, scanning studies often assume that services are hosted on their IANA-assigned ports (for example, HTTPS on TCP/443) and overlook scanning additional ports for unexpected services.
My colleagues and I at Stanford University recently found that a considerable fraction of TCP-responsive hosts never complete the expected L7 handshake (Figure 1) for two reasons:
- TCP-responsiveness is not an accurate indicator of service presence.
- Real TCP services are not necessarily running the expected L7 service (for example, 80/TLS).
We found that the vast majority of TCP responsive hosts (that is, those that respond with a SYN-ACK) on less-common ports do not actually ‘speak’ TCP, as they never accept data that is sent to them.
Where are Internet services deployed in practice?
We tracked the underlying cause of services not accepting data to middlebox protections. Middleboxes will SYN-ACK on behalf of a service (IP, Port), and will only check if that service is actually present after sending the SYN-ACK. If a service is not present, middleboxes will end the TCP connection (for example, sending a RST or timing out). Thus, middleboxes incorrectly amplify the fraction of services that are actually real. To identify real service presence, one must verify that the service indeed acknowledges data received.
When filtering for real services, we found that while services on popular ports typically run the expected protocol (for example, 93% of real services on port 80 speak HTTP), the fraction of IPs that speak the expected service on unpopular ports approaches zero (for example, 99% of real services that acknowledge data on port 623 do not speak IPMI).
Across all scanned ports, HTTP and TLS are the most popular unexpected services, with 65% of unexpected services speaking HTTP and 30% speaking TLS (Figure 2).
Furthermore, as seen in Figure 3, protocols are diffuse across all 65k ports (measured by scanning 0.1% of the IPv4 address space across all 65k ports with 10 unique protocols). Only 3.0%, 5.5% and 6.4% of HTTP, Telnet, and TLS services are served on port 80, 23, and 443, respectively. Researchers must scan roughly 25k ports to achieve 90% coverage of all HTTP services.
What is the security posture of unexpected services?
We identified that 50% of unexpected TLS belongs to IoT devices. For example, 35% of TLS on port 8000 belongs to CCTV devices in Korea Telecom.
Furthermore, we found unexpected services are more vulnerable than assigned services: 23% of ports hosting unexpected TLS are more likely to host shared public keys than 443/TLS.
Over half of unexpected services scanned host a higher fraction of public-facing login pages than 80/HTTP and 443/HTTPS.
Ports hosting unexpected SSH are 2.4 times more likely to allow non-public key authentication.
Identifying unexpected services more efficiently
Scanning for 30+ protocols on every IP and port is too intrusive to identify unexpected services; thus, we introduce LZR, an open-sourced scanner that accurately and efficiently identifies all Internet services.
LZR can be used with ZMap (a popular existing Layer 4 scanner) to quickly identify protocols running on a port, or as a shim between ZMap and an application-layer scanner like ZGrab, to instruct the scanner what follow-up handshake to perform.
LZR’s novelty and performance gain is primarily due to its ‘fail-fast’ approach to scanning and ‘fingerprint everything’ approach to identifying protocols. It builds on two main ideas:
- LZR ignores non-data-acknowledging hosts, thereby does not reattempt to establish a connection with these fake services.
- LZR listens more: We discover in 8 of the 30 protocols we scanned, the server sends data first, and 10 additional protocols send fingerprintable data when sent an incorrect L7 handshake. Thus, LZR waits and then fingerprinting invalid server responses, thereby identifying up to 16 of the 30 protocols by sending a single packet. By doing so, LZR finds an additional 1.3M unexpected services on port 443.
LZR’s optimizations also make it much faster than current scanning tools, nearly 55 times more so when scanning less common ports compared to the leading application layer scanner, ZGrab.
LZR is open-sourced and can be found at: https://github.com/stanford-esrg/lzr.
Liz Izhikeivch is a PhD Candidate/NSF Fellow/Stanford Graduate Fellow at the Computer Science Department at Stanford University.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.