At the recent LACNIC 31 meeting, Agustin Formoso (RIPE NCC), Michela Galante (RIPE NCC) and I organized the Internet Tools BoF. Our common goal was to better understand the problems network operators struggle with and determine whether we can assist them by developing new or improved tools. We also saw the BoF as an opportunity to share our knowledge of possible solutions using the tools we know.
In the BoF session, we introduced five main categories of problems:
- Interaction with other Autonomous Systems (identifying points of contact and communication)
- Routing decisions — peering
- Routing decisions — transit
- Network insights (for monitoring, troubleshooting, and so on)
- Access to and use of tools (language barrier, technical complexity)
We asked participants to raise their hands to indicate which category of problem frustrates them most, with the goal of dividing them up into small groups based on their response. The most popular category was the fourth one — Network insights, which had enough participants to form several small groups. Participants from the first three categories were combined into a single group, since there were only a few participants and the categories were similar. We had enough participants in the fifth category to form its own group.
With the groups formed, participants could share their problems and talk about what they could do to solve them. After about 15 minutes of group discussions, each group reported back with their main points and summaries. Here are some of the highlights:
Categories 1, 2 and 3: Interaction with other ASes, peering and transit
One issue raised was about how to reach people at other ASes. Group members mentioned that often whois contacts are not up to date. Participants were encouraged to enter contact information in PeeringDB.
In response to a question about how to select ASes for peering/transit, it was mentioned that the costs of establishing options for transit are high and that it is hard to have objective criteria for selecting peering providers, as there are no measurements or SLAs. It was proposed that peering could be done based on the type of content or proximity to an IXP. In either case (peering or transit), participants said it would be interesting to propose third party measurements to quantify the benefits of choosing a neighbour.
Category 4: Network insights
From the several groups discussing problems in this space, we identified five themes:
We heard there is a need for tools that help classify traffic based on protocol and originating prefix, to identify sources of resource consumption and make optimal decisions about traffic distribution and load balancing. Ideally, these decisions could be automated. Mobile networks, in particular, present a challenge when trying to determine the access point with the highest consumption of traffic (and what type of traffic) to better distribute among commercial plans.
Internal network and service status
We also heard about the need for better third party (vendor independent) tools for monitoring internal networks and services. For example, for monitoring DNS status, to optimize the network and deliver better services. Some particular cases were mentioned, like Wireless ISPs, commonly serving rural areas and experiencing issues due to link quality degradation, or CDNs that need to perform measurements in their backbones to calculate available capacity and automatically shift some customers.
An example of a current solution was mentioned: Packet Clearing House uses RIPE Atlas to analyse their next PoP placement, and check which nodes are being heavily used and from which source. They also use RIPE Atlas for service benchmarking, for example, against Cloudflare or Google. Some other suggested solutions were Elastiflow and NfSen.
Insights about traffic paths
In this topic, participants mentioned they would like to have access to tools that allow them to monitor traffic paths, to be able to resolve issues faster (for example, when the cause is not in the link between customer and provider, but hops away) and also to identify saturation points on the paths to CDNs.
Some examples of current solutions were mentioned: Verizon uses RIPE Atlas for A/B testing between peering and transit, to see which one offers best performance. At Cloudflare, they capture flow connection. At NTT they do pre-testing and post-testing of the network configuration and they use a lot of BGP and BGP Monitoring Protocol (BMP) data and visualization to inform their manual decisions about traffic engineering.
IPv6 issues and information
The main challenge mentioned regarding IPv6 is related MTU issues and how hard it is to detect them. Participants also mentioned they would like to be able to monitor content available through IPv4 versus IPv6 and to have access to IPv6 statistics tools.
Detection and resolution of incidents
There was a strong theme around incident detection and resolution in this discussion. One participant proposed identifying and classifying patterns that could anticipate issues, which would allow automated responses. It was also mentioned that it would be useful to have access to network data as seen by end users, in particular, when dealing with issues relating to DNS serial synchronization, incorrect routing and saturated links. It was also mentioned that good visualizations for incidents would also be interesting.
As an example of current solutions, a participant mentioned they had used Routeviews to see how their prefixes were being announced when trying to troubleshoot a routing issue.
Category 5: Access to and use of tools
Last but not least, some participants chose to discuss access to and the use of existing tools. The main issue in the Latin America and Caribbean region seems to be the language barrier, as documents are usually written in English. This documentation is good for technical users but there is a lack of a general overview of services and tools available and high-level explanation. Participants agreed that it would be nice to have a catalogue listing all tools and services, mapping them to use cases. For example, having a list of top ten use cases and most used tools for each.
It was suggested that articles, blogs, use cases, online tutorials and webinars be translated into languages other than English. An interesting example provided was from a university in Yucatan, Mexico, where they wrote a user guide for RIPE Atlas in Spanish.
There was also some more specific feedback about the user interfaces of network monitoring tools, which could be more intuitive.
Overall, the Internet Tools BoF was quite successful, with a high level of participation and some very interesting outcomes. We will be following up with participants to understand their interest in receiving further information. For example, some solutions to the problems raised at the BoF already exist, so we will be sharing this information with interested participants. It is hoped that APNIC NetOX (which was developed in collaboration with the RIPE NCC) can help solve some of these problems.
An interesting approach to explore could be to host a knowledge base for network operators that could be curated and maintained by members from the technical community, making sure the information is relevant and up to date. Some topics that were identified during the BoF that may be a good fit for this knowledge base are listed below:
- Tools for traffic classification
- Metrics for measuring the benefits of peering with a specific AS or engaging a specific upstream provider
- Tools for monitoring the status of internal networks and services
- Tools to assist with troubleshooting of external routing issues
We want to hear from you!
What do you think about having a shared knowledge base for network operators?
Would you like to contribute to any of the topics listed above?
Are there any other topics you would like to add to that list?
Please get in touch to let us know!
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.