Mapping the Internet’s topology improves our understanding of its interconnectivity, and thus its robustness and resilience.
One of the best ways to do this is to capture data at the router level; it provides the greatest detail of the physical infrastructure of the Internet. As there have been no tools or techniques to do this, we at Sorbonne Université and University of Oregon set about developing one using a new alias resolution technique, Limited Ltd., which leverages the Internet Control Message Protocol (ICMP) rate-limiting feature available on all modern routers, and which is mandatory in IPv6.
Idea and challenges
ICMP rate limiting is a feature that allows routers to limit their incoming/outgoing ICMP traffic.
The main idea of using it for alias resolution was that routers often share a common rate limit between interfaces, meaning that if one interface is pushed in a rate-limited state, the others might also be.
To achieve this we needed to overcome three potential challenges:
- How do we trigger ICMP rate-limiting?
- How do we obtain and transform loss traces provoked by ICMP rate-limiting into a relevant signal for alias resolution?
- How do we ensure that Limited Ltd. does not impact the network?
Before delving into the details of our technique, the figure below shows examples of loss traces obtained from five different routers pushed in a rate-limiting state. A black stroke means a response, a white stroke means no response.
It is interesting to note there is no common loss pattern for these routers. We experimentally deduced from reading the documentation (refer to our paper) that ICMP rate-limiting implementation is dependent on the vendor and the configuration.
Overview of Limited Ltd. algorithm
- Limited Ltd. requires you to input a set of IP addresses and output a set of aliases, in the following manner: select one interface a from the input set.
- Find probing rate to trigger a loss rate λ within [⍺,β] on a (this relates to challenge 1 and 3).
- Probe a with its triggering rate, all of the others with a low rate, generate loss traces (this relates to challenge 2).
- Classify loss traces (this also relates to challenge 2).
- Pull out a and its aliases.
Below are more details on the key steps of the algorithm: 2, 3, and 4.
Find probing rate to trigger a loss rate λ within [⍺,β] on a
We used [⍺,β] = [0.05, 0.10] in all our experiments for two reasons:
- It is the minimum loss rate to trigger exploitable loss traces.
- It has minimal impact on the network.
Starting at 64 packets per second (pps), a lower bound on which we found that very few routers are performing ICMP rate limiting, we sent ICMP echo request (ping) packets to a for five seconds and waited five seconds. This would provide one of three data points:
- If λ in [⍺,β]: stop and record the corresponding probing rate. This rate is called the triggering rate.
- If λ < ⍺: double the probing rate.
- If λ > β: perform a binary search on probing rate until λ falls in [⍺,β].
Note: we experimentally set the upper bound at 32k pps.
Probe a with its triggering rate, all of the others with a low rate, generate loss traces
All is said in the title of the step; it generates loss traces such as the ones in Figure 1.
Classify loss traces
As we show in Figure 1, there is no common pattern across all the routers in the Internet concerning their ICMP rate-limiting implementation. As a result, to transform these loss traces into sets of aliases, we have chosen a solution based on machine learning.
This is an important step of Limited Ltd. — you can find further details in our paper.
So, how does it affect the network?
First of all, we had a web server on the machine that performed the probing, which allowed any network operator to contact us and opt-out from our measurement.
We also tested Limited Ltd. in a controlled environment (lab experiment) and the wild.
In our lab experiments, we tested Cisco and Juniper hardware that was 10 years or older (Cisco model 3825, IOS 12.3 and Juniper model J4350, JunOS 8.0R2.8), of which we had a maximum of 40% CPU increase during each of the five-second tests.
In our wild test, we performed joint experiments with SURFnet and Switch operators. These involved running Limited Ltd. on their routers while they were monitoring the CPU usage. Each run lasted about one minute.
- In the SURFnet network, the CPU usage of two Juniper routers increased between 2-4% (Note: the overheating was not on the central routing engine CPU).
- In the Switch network, of the three Cisco routers we tested, CPU usage did not change and one (ASR-920-24SZ-M) increased by almost 30% during the test.
These results confirm our belief that Limited Ltd. is unlikely to impact the control and data planes. Even on this particular Cisco router, which was a non-production lightweight router from Switch network, 30% of CPU increase during 1 minute was not considered as impactful for them.
To learn more about our research and Limited Ltd., read our paper Alias Resolution Based on ICMP Rate Limiting, which we presented at PAM 2020.
Contributors: Burim Ljuma, Vamsi Addanki, Matthieu Gouel, Olivier Fourmaux, Timur Friedman (Sorbonne Université) and Reza Rejaie (University of Oregon).
Kévin Vermeulen is a PhD candidate in Computer Science at Sorbonne Université under the supervision of Olivier Fourmaux and Timur Friedman.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.