Internet Edge IP SLA deep dive

By Daniel Dib on 5 Sep 2022

Tags: DNS, Guest Post, How to, HTTP, monitoring

It is a common design to have an Internet Edge router connected to two different Internet Service Providers (ISPs) to protect against the failure of one ISP bringing the office down. The topology may look something like this:

Network diagram showing Internet Edge HA scenario. — Figure 1 — Internet Edge High Availability (HA) scenario.

The two ISPs are used in an active/standby fashion using static routes. This is normally implemented by using two default routes where one of the routes is a floating static route. It will look something like this:

ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY
ip route 0.0.0.0 0.0.0.0 203.0.113.9 200 name SECONDARY

With this configuration, if the interface to ISP1 goes down, the floating static route, which has an Administrative Distance (AD) of 200, will be installed and traffic will flow via ISP2.

The drawback to this configuration is that it only works if the physical interface goes down. What happens if ISP1’s customer-premises equipment (CPE) has the interface towards the customer up but the interface towards the ISP core goes down? What happens if there is a failure in another part of the ISP’s network? What if all interfaces are up but they are having Border Gateway Protocol (BGP) issues in the network?

In scenarios like these, since the customer’s interface towards ISP1 is still up, traffic would flow to ISP1 but would not reach its final destination. The traffic would then be blackholed. To prevent failures like these, the Internet Protocol Service Level Agreement (IP SLA) feature can be implemented to track something of importance, such as a service provided by the ISP, or one outside of the ISP network, to have the static route only installed if the service is available. How do we select what to track, though?

Selecting what to track

What services are important to an ISP?

They normally provide a resolver service. If the resolvers are down, that prevents people from browsing the Internet, so the resolver service is very important to an ISP. What else?

The ISP has a webpage where you order products and interact with them, for example, verizon.com. If that site is down, they lose money, so it is also an important service to them.

Using Verizon as an example, let’s find out what IP addresses are interesting to track. We can do this using the dig command on a Linux host. First let’s see what verizon.com resolves to:

dig verizon.com

; <<>> DiG 9.16.1-Ubuntu <<>> verizon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52122
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;verizon.com.                   IN      A

;; ANSWER SECTION:
verizon.com.            600     IN      A       192.16.31.89

;; Query time: 40 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: sön aug 21 08:54:45 CEST 2022
;; MSG SIZE  rcvd: 56

The IP address of interest is 192.16.31.89. When it comes to resolvers, this will most likely vary per region and service, but on the other hand, we can check what nameservers Verizon uses for the verizon.com domain. These are guaranteed to be important as well.

daniel@devasc:~$ dig ns verizon.com

; <<>> DiG 9.16.1-Ubuntu <<>> ns verizon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55703
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;verizon.com.                   IN      NS

;; ANSWER SECTION:
verizon.com.            3600    IN      NS      s1ns1.verizon.com.
verizon.com.            3600    IN      NS      ns2.edgecastdns.net.
verizon.com.            3600    IN      NS      s3ns3.verizon.com.
verizon.com.            3600    IN      NS      s2ns2.verizon.com.
verizon.com.            3600    IN      NS      ns1.edgecastdns.net.
verizon.com.            3600    IN      NS      s4ns4.verizon.com.
verizon.com.            3600    IN      NS      ns3.edgecastdns.net.
verizon.com.            3600    IN      NS      ns4.edgecastdns.net.

;; Query time: 624 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: sön aug 21 09:00:26 CEST 2022
;; MSG SIZE  rcvd: 207

There are several servers here. Note that it seems Verizon is using a third-party DNS service to provide resiliency for their nameservers. Pick a server and check the IP:

dig s1ns1.verizon.com

; <<>> DiG 9.16.1-Ubuntu <<>> s1ns1.verizon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12917
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;s1ns1.verizon.com.             IN      A

;; ANSWER SECTION:
s1ns1.verizon.com.      3573    IN      A       192.16.16.5

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: sön aug 21 09:00:52 CEST 2022
;; MSG SIZE  rcvd: 62

The IP address 192.16.16.5 is what we are looking for here. We now have two IP addresses we can track. Conceptually, it looks like this:

Network diagram showing Internet Edge ISP services. — Figure 2 — Internet Edge ISP services.

This is a good first step. We have identified the important services for the ISP. However, this is not the best way of tracking the availability of your Internet service. Why?

The availability of these services does not guarantee the availability of the greater Internet.
These services may not respond to Internet Control Message Protocol (ICMP) Echo packets or be rate limited.

While we did achieve measuring availability beyond the CPE of the ISP, we want to make sure that we can reach services that are not local to the ISP. This is usually where people start tracking something like 8.8.8.8, which is Google’s well-known resolver service (Figure 3).

Network diagram showing Internet Edge track Google resolver. — Figure 3 — Internet Edge track Google resolver

Compared to the previous scenario, the health of 8.8.8.8 should be more relevant as it’s not local to the ISP. However, as this is a resolver service, responding to ICMP Echo is not in the job description, meaning that ICMP may get rate limited. Let’s implement tracking of 8.8.8.8 and then describe some of the challenges/caveats

Basic implementation and caveats

Let’s start with a standard implementation of IP SLA tracking. Here is the basic configuration:

interface GigabitEthernet1
 ip address 203.0.113.2 255.255.255.248
!
interface GigabitEthernet2
 ip address 203.0.113.10 255.255.255.248
!
ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 1
ip route 0.0.0.0 0.0.0.0 203.0.113.9 200 name SECONDARY
!
ip sla 1
 icmp-echo 8.8.8.8 source-ip 203.0.113.2
ip sla schedule 1 life forever start-time now
!
track 1 ip sla 1 reachability

GigabitEthernet1 is towards ISP1 and GigabitEthernet2 is towards ISP2. The following commands can be used to verify the IP SLA setup:

Edge#show ip sla sum
IPSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
All Stats are in milliseconds. Stats with u are in microseconds

ID           Type        Destination       Stats       Return      Last
                                                       Code        Run 
-----------------------------------------------------------------------
*1           icmp-echo   8.8.8.8           RTT=8       OK          44 seconds ag
                                                                   o            

Edge#show ip sla statistics 
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: 7 milliseconds
Latest operation start time: 07:10:40 UTC Mon Aug 22 2022
Latest operation return code: OK
Number of successes: 14
Number of failures: 0
Operation time to live: Forever

Edge#show ip route track-table 
 ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 1 state is [up]

Edge#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 1, metric 0, candidate default path
  Routing Descriptor Blocks:
  * 203.0.113.1
      Route metric is 0, traffic share count is 1

This is behaving as expected. The IP SLA is up. The tracker is up. The default route towards ISP1 is installed. Now, let’s simulate a failure of ISP1. I will implement this in the background using an ACL in my lab to filter the ICMP Echo packets. Let’s check the logs:

Aug 22 07:37:22.171: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up
Aug 22 07:37:32.171: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
Aug 22 07:37:42.171: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up
Aug 22 07:37:52.171: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
Aug 22 07:38:02.171: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up

This doesn’t look too good. Why is the reachability flapping? Let’s check some of the routes:

Edge#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 200, metric 0, candidate default path
  Routing Descriptor Blocks:
  * 203.0.113.9
      Route metric is 0, traffic share count is 1

Edge#show ip cef 8.8.8.8                         
0.0.0.0/0
  nexthop 203.0.113.9 GigabitEthernet2

Edge#show ip cef exact-route 203.0.113.2 8.8.8.8 
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet2, addr 203.0.113.9

This looks as expected. The default route is now pointing towards 203.0.113.9. Notice what the next-hop is for 8.8.8.8? I’ll come back to this. But first, let’s check the routing a few seconds later:

Edge#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 1, metric 0, candidate default path
  Routing Descriptor Blocks:
  * 203.0.113.1
      Route metric is 0, traffic share count is 1

Edge#show ip cef 8.8.8.8
0.0.0.0/0
  nexthop 203.0.113.1 GigabitEthernet1

Edge#show ip cef exact-route 203.0.113.2 8.8.8.8 
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1

The routing is flapping between 203.0.113.1 and 203.0.113.9. Why is this happening? This is because, initially, the SLA packets flow through GigabitEthernet1. This path then fails so the SLA packets are sent towards GigabitEthernet2 as the default route towards Gi1 is removed. When the SLA packets are sent towards Gi2, they succeed. Since the SLA is successful, the tracked default route gets installed again. And the process repeats.

To prevent this from happening, we must ensure that SLA packets only get sent toward Gi1. How can we do this? And what happens if we set the source interface in our SLA configuration?

ip sla 1
 icmp-echo 8.8.8.8 source-interface GigabitEthernet1
  frequency 10
ip sla schedule 1 life forever start-time now

Unfortunately, the results are still the same. We still have flapping. When configuring the SLA, it does say ingress interface, not egress:

Edge(config-ip-sla)#icmp-echo 8.8.8.8 ?
  source-interface  Source Interface (ingress icmp packet interface)
  source-ip         Source Address

We can verify that packets are using GigabitEthernet2 as the next hop:

Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet2, addr 203.0.113.9

Unfortunately, we can’t specify the egress interface of the SLA packets. So what can we do? The options we have are:

Create a static route for the destination of the SLA packets.
Use policy-based routing to force SLA packets out of GigabitEthernet1.

Let’s try using the static route approach first. A static route for 8.8.8.8 is added:

Edge(config)#ip route 8.8.8.8 255.255.255.255 203.0.113.1 name SLA

Packets to 8.8.8.8 should only be flowing via 203.0.113.1 now, right? Initially, this looks promising:

Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1
Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1
Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1
Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1

Packets are only being sent out of GigabitEthernet1. Not so fast, though! The current way of simulating a failure is by filtering packets. What happens if Gi1 goes down? I will shut down the interface to simulate failure where the interface towards ISP1 goes down. I added some debugs to show what goes on in the background:

Aug 22 08:18:05.209: RT: interface GigabitEthernet1 removed from routing table
Aug 22 08:18:05.209: RT: del 203.0.113.0 via 0.0.0.0, connected metric [0/0]
Aug 22 08:18:05.209: RT: delete subnet route to 203.0.113.0/29
Aug 22 08:18:05.209: CONN: delete conn route, idb: GigabitEthernet1, addr: 203.0.113.2, mask: 255.255.255.248
Aug 22 08:18:05.210: CONN(multicast): connected_route: FALSE
Aug 22 08:18:05.210: RT: interface GigabitEthernet1 topo state DOWN, afi 0
Aug 22 08:18:05.210: IP-ST-EV(default): queued adjust on GigabitEthernet1
Aug 22 08:18:05.210: RT: del 203.0.113.2 via 0.0.0.0, connected metric [0/0]
Aug 22 08:18:05.210: RT: delete subnet route to 203.0.113.2/32
Aug 22 08:18:05.221: RT: del 8.8.8.8 via 203.0.113.1, static metric [1/0]
Aug 22 08:18:05.221: RT: delete subnet route to 8.8.8.8/32
Aug 22 08:18:06.108: %SYS-5-CONFIG_I: Configured from console by daniel on vty0 (10.254.255.2)
Aug 22 08:18:07.202: %LINK-5-CHANGED: Interface GigabitEthernet1, changed state to administratively down
Aug 22 08:18:07.208: CONN: connected_route: FALSE
Aug 22 08:18:07.208: is_up: GigabitEthernet1 0 state: 6 sub state: 1 line: 0
Aug 22 08:18:08.203: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1, changed state to down
Aug 22 08:18:08.204: CONN: connected_route: FALSE
Aug 22 08:18:08.204: is_up: GigabitEthernet1 0 state: 6 sub state: 1 line: 0

From the debug above, we can see that Gi1 goes down, it removes the connected subnet 203.0.113.0/29, but also the route to 8.8.8.8 is removed. This means that the SLA packets are now flowing via Gi2:

Edge#show ip cef exact-route 8.8.8.8 203.0.113.2
8.8.8.8 -> 203.0.113.2 =>IP adj out of GigabitEthernet2, addr 203.0.113.9

The default route via Gi1 can’t be installed as the interface is down, but if we were tracking SLA statistics it would skew the data, as these packets are now going through even though ISP1 is down.

There is a more elegant solution, though. It is possible to add a permanent static route using the permanent keyword:

Edge(config)#ip route 8.8.8.8 255.255.255.255 203.0.113.1 permanent name SLA

Notice the permanent keyword in the output below:

Edge#show ip route 8.8.8.8
Routing entry for 8.8.8.8/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 203.0.113.1, permanent
      Route metric is 0, traffic share count is 1

When the interface is shut down, this route will remain:

Edge(config)#int gi1
Edge(config-if)#sh
Edge(config-if)#^Z
Edge#show ip route 8.8.8.8
Routing entry for 8.8.8.8/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 203.0.113.1, permanent
      Route metric is 0, traffic share count is 1

This ensures that SLA packets can only ever use Gi1. This works well.

Keep in mind one thing, though. What we’ve just configured will apply to ALL packets towards 8.8.8.8. — not just the SLA packets. If people are using 8.8.8.8 as their resolver, and ISP1 is down or having issues, packets towards 8.8.8.8 will NOT be able to use ISP2. We have essentially made our secondary link unavailable for any packets towards 8.8.8.8. This is not that great. What if we could change the routing towards 8.8.8.8 for only the packets generated by the router itself? We can, but it means we’ll have to use our old friend/foe policy-based routing. Let’s configure Policy-Based Routing (PBR):

ip access-list extended G1-ICMP-TO-GOOGLE-DNS
 permit icmp host 203.0.113.2 host 8.8.8.8 echo
 !
route-map LOCAL-POLICY permit 10
 match ip address G1-ICMP-TO-GOOGLE-DNS
 set ip next-hop 203.0.113.1
!
ip local policy route-map LOCAL-POLICY

The policy is being used:

Edge#show ip local policy 
Local policy routing is enabled, using route map LOCAL-POLICY
route-map LOCAL-POLICY, permit, sequence 10
  Match clauses:
    ip address (access-lists): G1-ICMP-TO-GOOGLE-DNS 
  Set clauses:
    ip next-hop 203.0.113.1
  Policy routing matches: 2 packets, 128 bytes
Edge#show ip access-lists 
Extended IP access list G1-ICMP-TO-GOOGLE-DNS
    10 permit icmp host 203.0.113.2 host 8.8.8.8 echo (3 matches)

Packets to 8.8.8.8 can use the secondary path:

Edge#show ip cef 8.8.8.8
0.0.0.0/0
  nexthop 203.0.113.9 GigabitEthernet2

This all looks great. We have pinned the SLA packets to Gi1 but user traffic to 8.8.8.8 can still use the secondary path. So far, we have only used ICMP Echo, which is not the best way of determining if a path is healthy. Let’s look into some more advanced options.

IP SLA using DNS

Instead of sending ICMP Echo packets to DNS servers, what if we just sent DNS queries instead? Wouldn’t this be better? It would indeed, as the job of a DNS server is to respond to DNS queries, not ICMP Echo packets.

It’s possible to configure IP SLA to send DNS queries. Define the name to be queried and the nameserver in the IP SLA configuration:

ip sla 1
 dns google.com name-server 8.8.8.8
 frequency 10
!
ip sla schedule 1 life forever start-time now

Edge#show ip sla statistics 
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: 8 milliseconds
Latest operation start time: 13:08:24 SWE Mon Aug 22 2022
Latest operation return code: OK
Number of successes: 4
Number of failures: 0
Operation time to live: Forever


Edge#show ip sla
Edge#show ip route trac
Edge#show ip route track-table 
 ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 1 state is [up]

The default route is now installed. Based on that, 8.8.8.8 responds to our query for the google.com name. This is a lot better! Consider this, though:

What happens if there is no response for google.com?
What happens if 8.8.8.8 does not respond at all?
What is the packet length of a DNS query?

Before answering the first two questions, why would we care about the packet size? What if DNS queries can go through but other user traffic can’t? Maybe the path does not allow for 1,500 bytes packets due to some upstream issue?

First, let’s check what the size of the SLA packet is by using some packet capturing magic:

Edge#debug platform condition ipv4 8.8.8.8/32 both
Edge#debug platform packet-trace packet 256
 Please remember to turn on 'debug platform condition start' for packet-trace to work
Edge#debug platform condition start 
Edge#show platform packet-trace sum
Pkt   Input                     Output                    State  Reason
0     INJ.2                     Gi1                       FWD    
1     Gi1                       internal0/0/rp:0          PUNT   11  (For-us data)

Edge#show platform packet-trace packet 0
Packet: 0           CBUG ID: 0

IOSd Path Flow: 
  Feature: UDP
  Pkt Direction: OUTsrc=203.0.113.2(53883), dst=8.8.8.8(53), length=36

  Feature: UDP
  Pkt Direction: OUT
  FORWARDED 
        UDP: Packet Handoff to IP
        Source      : 203.0.113.2(53883)
        Destination : 8.8.8.8(53)


  Feature: IP
  Pkt Direction: OUTRoute out the generated packet.srcaddr: 203.0.113.2, dstaddr: 8.8.8.8
Summary
  Input     : INJ.2  
  Output    : GigabitEthernet1
  State     : FWD 
  Timestamp
    Start   : 97239727399057 ns (08/22/2022 11:10:03.807558 UTC)
    Stop    : 97239727755013 ns (08/22/2022 11:10:03.807914 UTC)
Path Trace
  Feature: IPV4(Input)
    Input       : internal0/0/rp:0
    Output      : <unknown>
    Source      : 203.0.113.2
    Destination : 8.8.8.8
    Protocol    : 17 (UDP)
      SrcPort   : 53883
      DstPort   : 53

This is quite a small packet at only 36 bytes. Let’s get back to this later. For our first problem, how can we track something more than google.com and also send queries to more than one DNS server? That can be implemented using multiple IP SLA statements:

ip sla 1
 dns google.com name-server 8.8.8.8
  frequency 10
ip sla schedule 1 life forever start-time now
ip sla 2
 dns amazon.com name-server 208.67.220.220
  frequency 10
ip sla schedule 2 life forever start-time now
ip sla 3
 dns microsoft.com name-server 1.1.1.1
  frequency 10
ip sla schedule 3 life forever start-time now
!
track 2 ip sla 2 reachability
!
track 3 ip sla 3 reachability

Then, configure a track statement that uses Boolean logic for all of these SLA statements:

track 10 list boolean or
 object 1
 object 2
 object 3

Finally, update the default route to use the new tracker:

Edge(config)#no ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 1
Edge(config)#ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 10

Let’s verify:

Edge#show track 10
Track 10
  List boolean or
  Boolean OR is Up
    2 changes, last change 00:02:25
    object 1 Up
    object 2 Up
    object 3 Up
  Tracked by:
    Static IP Routing 0
Edge#show ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 1, metric 0, candidate default path
  Routing Descriptor Blocks:
  * 203.0.113.1
      Route metric is 0, traffic share count is 1
Edge#show ip route track-table 
 ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 10 state is [up]
Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1

This looks great! We now have three SLA statements and one of them failing is not enough to move traffic to the secondary path. If just one of them fails, it’s more likely that there is a temporary issue with a DNS server or DNS zone.

Remember I mentioned packet sizes? Let’s discuss this in the next section.

IP SLA using HTTP

Tracking connectivity using the DNS is more useful than a simple ICMP Echo. What if we could move even further up the stack? This can be achieved by using HTTP probes. Rather than just checking that we get responses to DNS queries, let’s try to connect to a website. The syntax is similar to that of DNS:

ip sla 5
 http secure get https://amazon.com name-server 8.8.8.8 source-interface GigabitEthernet1
!
ip sla schedule 5 life forever start-time now
!
track 5 ip sla 5 reachability 
!
no ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 10
ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 5

Then let’s verify:

Edge#show ip sla statistics 5
IPSLAs Latest Operation Statistics

IPSLA operation id: 5
        Latest RTT: 1175 milliseconds
Latest operation start time: 13:57:12 SWE Mon Aug 22 2022
Latest operation return code: OK
Latest DNS RTT: 7 ms
Latest HTTP Transaction RTT: 1168 ms
Number of successes: 2
Number of failures: 0
Operation time to live: Forever


Edge#show ip route track-table
 ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 5 state is [up]

The router is now sending HTTP GET for the name amazon.com by first resolving the name through a DNS query to 8.8.8.8. This tests more of the stack than a DNS query. What’s the size of our SLA packets now? Let’s use another packet capturing utility to check:

ip access-list extended CAP-HTTP
 10 permit tcp host 203.0.113.2 any
 20 permit tcp any host 203.0.113.2
!
Edge#monitor capture CAP interface GigabitEthernet1 both
Edge#monitor capture CAP access-list CAP-HTTP 
Edge#monitor capture CAP start
Started capture point : CAP
Edge#monitor capture CAP export flash:/CAP.pcap 
Exported Successfully
Edge#copy flash:/CAP.pcap ftp://redacted:redacted@10.254.255.2

Let’s have a look at the Packet Capture (PCAP):

Screenshot of PCAP of SLA HTTP probe. — Figure 4 — PCAP of SLA HTTP probe

The packets are definitely larger than when just using DNS. We see packets approaching 600 bytes in size. Not quite 1,500 bytes, though! For production use, we should of course track more than one web server. What else can we do when it comes to IP SLA? Time to get fancy!

Getting fancy with IP SLA

There’s a lot we can do with IP SLA. Let’s implement this:

Resolve google.com via 8.8.8.8
Resolve amazon.com via 208.67.220.220
HTTP GET to amazon.com
Ping sdwan.measure.office.com (O365 beacon service) with 1,500-byte ICMP Echo

If everything succeeds, that means the circuit is functioning as we can send ICMP, resolve DNS, and browse websites. This is the configuration:

track 1 ip sla 1 reachability
!
track 2 ip sla 2 reachability
!
track 5 ip sla 5 reachability
!
track 20 ip sla 20 reachability
!
track 100 list boolean and
 object 1
 object 2
 object 5
 object 20
!
ip sla 1
 dns google.com name-server 8.8.8.8
  frequency 10
ip sla schedule 1 life forever start-time now
ip sla 2
 dns amazon.com name-server 208.67.220.220
  frequency 10
ip sla schedule 2 life forever start-time now
ip sla 5
 http secure get https://amazon.com name-server 8.8.8.8 source-interface GigabitEthernet1
ip sla schedule 5 life forever start-time now
ip sla 20
 icmp-echo sdwan.measure.office.com source-interface GigabitEthernet1
  request-data-size 1450
  frequency 10
ip sla schedule 20 life forever start-time now

Note that the request data size is set to 1,450 which will generate 1,500-byte ICMP Echo packets. I’m not sure how the maths works here but I verified it with a packet capture. The ICMP Echo SLA using a DNS name will resolve to an IP when configuring and be put into the running configuration. Let’s see if this all works:

Edge#show track 100
Track 100
  List boolean and
  Boolean AND is Up
    2 changes, last change 00:00:03
    object 1 Up
    object 2 Up
    object 5 Up
    object 20 Up
  Tracked by:
    Static IP Routing 0
Edge#show ip route track-table 
 ip route 0.0.0.0 0.0.0.0 203.0.113.1 name PRIMARY track 100 state is [up]
Edge#show ip cef exact-route 203.0.113.2 8.8.8.8
203.0.113.2 -> 8.8.8.8 =>IP adj out of GigabitEthernet1, addr 203.0.113.1

Pretty cool! Using only static routes and IP SLA we now have a pretty good mechanism for verifying connectivity. A lot better than the simple ICMP Echos we started with.

Conclusion

You can make it as simple or complex as you want using IP SLA. It all comes down to your requirements. Finally, keep the following in mind:

Consider what you want to measure.
How do you ensure SLA packets are flowing towards the correct ISP?
How many things do you want to measure to ensure the path is good?

I hope this post has been informative and that you have learned some new IP SLA tricks as well as some good debugging and packet capture commands.

Adapted from the original post which appeared on Daniel’s Networking Blog.

Daniel Dib is a Senior Network Architect at Conscia.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

One Comment

Dave Taht September 7, 2022 at 9:36 am

heh. Now try this with ipv6.

Reply ↓