Understanding traceroute by reimplementing it in Rust

One way to understand how a program works is to re-implement it in a different programming language. That’s the approach that Vinay Keerthi took when he re-wrote the traceroute utility in Rust (rather than its original C).

Some advantages of Rust

Rust is fast becoming a language of choice for systems coding in the Linux community, having recently been adopted in Linux kernel development. It’s no surprise that this is the language Vinay chose to re-implement a userspace utility that ships with every Linux distribution.

Rust has two interesting — arguably linked — properties that improve on coding in C.

The first is strong type checking: In C, you can, for example, get the compiler to treat a pointer as a 32-bit number or a sequence of bytes as a string, a trick known as ‘casting‘. In Rust, you have to declare up front how blocks of data structure will be used, and perform more constrained type conversions. Casting can be useful, but it introduces the possibility of hidden bugs after compilation. Strong typing requires more preliminary thinking, but results in more reliable code.

Secondly, an outcome of this improved compile-time checking is that, even though Rust, like C, has no garbage collection, it has a more consistent model for tracking memory use. In Rust and C, data objects are swept from memory when they fall out of use. While compilation of Rust code can take longer than C code, the resulting programs are more efficient and use less memory because Rust has a clearer sense of when objects have fallen out of use and can be swept away.

That means, for example, that NLnet Labs‘ Resource Public Key Infrastructure (RPKI) validator ‘routinator‘, and their RPKI server ‘krill‘, which are both written in Rust, can run on memory-constrained hardware like a Raspberry Pi.

The workings of traceroute and the importance of careful interpretation

Vinay also shared his understanding of how traceroute works, so he’d have a way to assess whether his re-implementation was ‘feature-complete’.

Traceroute is a diagnostic tool that uses the time to live (TTL) property in IP packet headers and Internet Control Message Protocol (ICMP) ‘Time Exceeded‘ responses to build a list of routers that packets traverse.

Normally, a default TTL value for IP packets is set by the operating system (64 in Linux). As a packet is forwarded, each active element decrements this TTL value in the packet, and when it hits zero, the packet is discarded, and an ICMP Unreachable response is sent back to the sender.

Traceroute uses this mechanism but follows a different model. The first packet sent by traceroute has a TTL of one, and in each sending round, the TTL is incremented until the destination is reached. Because each participant in the chain of forwarding decreases TTL by one until it reaches zero, starting with this low value ‘enumerates’ the chain, one by one, moving out from the sender as the TTL is increased. As you walk the chain with successive TTL values, you learn each successive ‘hop’ along the path.

Traceroute also includes delay and drop measurements, but these must be interpreted carefully. Traceroute really measures half of a packet’s journey — the way out. These same packets can be returned by an asymmetric path specific to that point in the chain, meaning that delay and drop might be introduced in hops for a real packet to the end point that don’t show up in what traceroute returns.

Additionally, routers don’t always apply the same logic to packets flowing through them. There is evidence that some routing systems de-prioritize ICMP traffic (like traceroute and ping) to manage the workload, calling into question delay and jitter measurements informed by ICMP. This is one of the reasons why many modern application stacks use in-bandwidth measurement to assess link quality and aid in route selection.

Vinay implemented four out of the nine features he’d identified in ‘real traceroute’, and learned enough doing so to call it ‘finished’ for his purposes.

Learning essential network diagnostic tools in public

The Hacker News discussion of Vinay’s post is also full of useful information, including other diagnostic tools like MyTraceroute (mtr) that combines traceroute and ping, and some detailed explanations of traceroute specifics.

By publicly interpreting the functions of a utility written in one programming language to implement it in another, Vinay developed his own understanding of traceroute and Rust. By sharing what he learned, Vinay shook loose some of the latent knowledge in the open source development and network operations communities.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Some advantages of Rust

The workings of traceroute and the importance of careful interpretation

Learning essential network diagnostic tools in public

Leave a Reply Cancel reply