SLA_Topology_New

IP SLA explanation and examples of when to use it

An IP SLA (Service Level Agreement) is an agreement between the Service Provider and (you) the customer, that you will get an agreed upon level of uptime / bandwidth for your money, and IP SLA configuration on a router is a mechanism to measure and verify that the service being provided by the carrier is within the agreed upon SLA.

This tracking can include both the uptime of service, and the bandwidth speed measured by the delay, when sending a packet to a destination and receiving a reply back.

It can be sent back to some sort of third party Syslog server to display in a nice graphical format for “Accounting” purposes to keep your ISP honest, however it is also used by other features at Layer 3 on the CLI that we as ROUTE candidates are interested in!

For example, and this is from the SWITCH Exam material, but HSRP (Hot Standby Routing Protocol) which runs on the LAN between a fail-over pair of Routers, that have identical configurations and remain in an Active / Standby role in case the Active router goes down it can fail over to the Standby.

As HSRP runs on the LAN side (making it a SWITCH topic), you can tie it back to IP SLA configurations, so if the Active router stops getting responses back from pings sent out its ISP gateway HSRP can tell the Standby router to make itself the Active router.

On a more ROUTE topic, you can also tie it into PBR, as Policy Routing does no kind of polling to verify the next hop you configured is alive / reachable – This is where IP SLA can be configured and tied into the PBR configuration to verify the configured next hop.

IP SLA configuration concepts and IP SLA Terminology (what things are called)

The most common use for IP SLA is to generate traffic originating from the local router in the form of a ping to another IP address, to confirm its reachable, you can measure much more detailed information such as Delay and Jitter but for the ROUTE we will keep it to tracking some pings.

Also you can configure multiple “SLA Operations” to multiple different destinations, that generate different types of traffic, to track completely different types of data statistics at the same time.

When configuring an “Operation” of IP SLA, the local router sending the traffic is called the “Sender”, and the destination device is called the “Receiver” and can be a router or a host depending on what kind of traffic you are sending / requesting back.

For example if an “Operation” is configured as a simple ping / response, the “Receiver” could be a laptop on a  remote network, as that device could respond back to the ping.

If you want more details like timestamps / delay it took to respond on the remote device / etc, you will need a “Receiver” like a Cisco router, that is capable of generating that “Operations” request as a regular web server or laptop may not be able to send the requested data (also things like looking at Jitter / Delay / other values).

Here is a quick list for exam day purposes of what types of traffic can be configured:

  • ICMP – Pings, jitter
  • RTP – VOIP traffic
  • TCP – Established Connections
  • UDP – Pings, jitter
  • DNS
  • HTTP
  • FTP

Although for detailed data (anything outside a ping response), you would need to configure the remote router to respond to the “Operation” configuration correctly.

However for ROUTE exam and really from what I’ve seen, the most common “Operation” to configure and troubleshoot is for pings / internet connectivity to the ISP, so that is what I will be configuring.

Enough theory and definitions, lets get configuring:

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#ip sla ?
  <1-2147483647>          Entry Number
  auto                    IP SLAs Auto Configuration
  enable                  Enable Event Notifications
  ethernet-monitor        IP SLAs Auto Ethernet Configuration
  group                   Group Configuration or Group Scheduling
  key-chain               Use MD5 Authentication for IP SLAs Control Messages
  logging                 Enable Syslog
  low-memory              Configure Low Water Memory Mark
  reaction-configuration  IP SLAs Reaction-Configuration
  reaction-trigger        IP SLAs Trigger Assignment
  reset                   IP SLAs Reset
  responder               Enable IP SLAs Responder
  restart                 Restart An Active Entry
  schedule                IP SLAs Entry Scheduling

R1(config)#ip sla 1 ?
  <cr>

R1(config)#ip sla 1
R1(config-ip-sla)#

To start just from Global configuration mode, type “ip sla #”, and it will drop you into SLA configuration mode to configure the “operation”. I will break this up into different segments of ? output as there is quite a bit:

First looking at “Operation” options

R1(config-ip-sla)#?
IP SLAs entry configuration commands:
  dhcp         DHCP Operation
  dns          DNS Query Operation
  ethernet     Ethernet Operations
  exit         Exit Operation Configuration
  ftp          FTP Operation
  http         HTTP Operation
  icmp-echo    ICMP Echo Operation
  icmp-jitter  ICMP Jitter Operation
  path-echo    Path Discovered ICMP Echo Operation
  path-jitter  Path Discovered ICMP Jitter Operation
  tcp-connect  TCP Connect Operation
  udp-echo     UDP Echo Operation
  udp-jitter   UDP Jitter Operation
  voip         Voice Over IP Operation

R1(config-ip-sla)#icmp-echo ?
  Hostname or A.B.C.D  Destination IP address or hostname, broadcast disallowed

R1(config-ip-sla)#icmp-echo 5.5.5.5 ?
  source-interface  Source Interface (ingress icmp packet interface)
  source-ip         Source Address
  <cr>

R1(config-ip-sla)#icmp-echo 5.5.5.5 source-ip 1.1.1.1 ?
  <cr>

R1(config-ip-sla)#icmp-echo 5.5.5.5 source-ip 1.1.1.1
R1(config-ip-sla-echo)#

I could have ended it at a destination address, but for the heck of it I configured R1’s loopback to be a source-ip for the operation, so if either loopback goes down I assume the operation bites the dust.

Also note it dropped me into “echo” configuration mode highlighted in red, so I will look at my options using ? again and go from there.

Configuration of the icmp-echo “Operation”:

R1(config-ip-sla-echo)#?
IP SLAs Icmp Echo Configuration Commands:
  default            Set a command to its defaults
  exit               Exit operation configuration
  frequency          Frequency of an operation
  history            History and Distribution Data
  no                 Negate a command or set its defaults
  owner              Owner of Entry
  request-data-size  Request data size
  tag                User defined tag
  threshold          Operation threshold in milliseconds
  timeout            Timeout of an operation
  tos                Type Of Service
  verify-data        Verify data
  vrf                Configure IP SLAs for a VPN Routing/Forwarding instance

R1(config-ip-sla-echo)#frequency ?
  <1-604800>  Frequency in seconds (default 60)

R1(config-ip-sla-echo)#frequency 10 ?
  <cr>

R1(config-ip-sla-echo)#frequency 10
R1(config-ip-sla-echo)#

So I set IP SLA operation # 1 to do an ICMP echo sorced from 1.1.1.1 to remote destination 5.5.5.5 with a frequency of every 10 seconds, now to apply it and get this party started:

R1(config-ip-sla-echo)#exit
R1(config)#ip sla schedule 1 ?
  ageout      How long to keep this Entry when inactive
  life        Length of time to execute in seconds
  recurring   Probe to be scheduled automatically every day
  start-time  When to start this entry
  <cr>

R1(config)#ip sla schedule 1 life ?
  <0-2147483647>  Life seconds (default 3600)
  forever         continue running forever

R1(config)#ip sla schedule 1 life forever ?
  ageout      How long to keep this Entry when inactive
  recurring   Probe to be scheduled automatically every day
  start-time  When to start this entry
  <cr>

R1(config)#ip sla schedule 1 life forever start-time ?
  after     Start after a certain amount of time from now
  hh:mm     Start time (hh:mm)
  hh:mm:ss  Start time (hh:mm:ss)
  now       Start now
  pending   Start pending

R1(config)#ip sla schedule 1 life forever start-time now ?
  ageout     How long to keep this Entry when inactive
  recurring  Probe to be scheduled automatically every day
  <cr>

R1(config)#ip sla schedule 1 life forever start-time now
R1(config)#
ASR#5
[Resuming connection 5 to r5 … ]
[OK]

R5#debug ip packet
IP packet debugging is on

Lets see some IP SLA traffic! :
R5#
May 21 23:02:50.965: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:02:50.965: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:02:50.965: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:02:50.965: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#
May 21 23:03:00.963: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:03:00.967: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:03:00.967: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:03:00.967: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#
May 21 23:03:10.965: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:03:10.965: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:03:10.965: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:03:10.965: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#u all
All possible debugging has been turned off
R5#

As can be seen at the very top, if you exit once from all the way into SLA configuration, it drops you back into Global configuration mode.

From there “ip sla schedule # …” is the command to start the operation and defining for how long / often it should run, I gave mine a “life” of “forever” so it continuously runs forever and a start-time of “now” to turn it on immediately – Though there are options as to how long it runs and when to start it!

As can be seen I hopped over to R5 quick and ran a “debug ip packet” and sure enough we are now seeing a ping sourced from 1.1.1.1 hitting 5.5.5.5 every 10 seconds on the dot!

There are two verification commands, that give two very different outputs

“show ip sla statistics”

R1#sh ip sla stat
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: 1 milliseconds
Latest operation start time: 23:12:51.010 UTC Sun May 21 2017
Latest operation return code: OK
Number of successes: 63
Number of failures: 0
Operation time to live: Forever

R1#

I started to highlight the important parts, but its all important, look at that info!

Operation start time, Operation id #, # of successes / fails, operation life time, this gives you the statistics of the operation at work whereas the second command is more geared towards the configuration itself (as you might guess by the name):

“show ip sla configuration”

R1#sh ip sla config
IP SLAs Infrastructure Engine-III
Entry number: 1
Owner:
Tag:
Operation timeout (milliseconds): 5000
Type of operation to perform: icmp-echo
Target address/Source address: 5.5.5.5/1.1.1.1
Type Of Service parameter: 0x0
Request size (ARR data portion): 28
Verify data: No
Vrf Name:
Schedule:
   Operation frequency (seconds): 10  (not considered if randomly scheduled)
   Next Scheduled Start Time: Start Time already passed
   Group Scheduled : FALSE
   Randomly Scheduled : FALSE
   Life (seconds): Forever
   Entry Ageout (seconds): never
   Recurring (Starting Everyday): FALSE
   Status of entry (SNMP RowStatus): Active
Threshold (milliseconds): 5000
Distribution Statistics:
   Number of statistic hours kept: 2
   Number of statistic distribution buckets kept: 1
   Statistic distribution interval (milliseconds): 20
Enhanced History:
History Statistics:
   Number of history Lives kept: 0
   Number of history Buckets kept: 15
   History Filter Type: None

R1#

This will also show a more complete view of life in terms of age-out time and all other fields I didn’t configure, so this is really the command to verify the configuration of the “Operation” where as “statistics” verification command is to see how the operation is succeeding or failing.

How SLA ties into PBR with something called “Enhanced Object Tracking”

I have created an addition to the Topology, a loopback interface, that will represent another route to the destination of 5.5.5.5:

SLA_Topology_New_2

Enhanced Object Tracking is the configuring of “track objects” that detect when the SLA is starting to slip, for example if it misses a ping it can be configured with a delay value so that it will not report back as unreachable until the delay value hits zero.

Track objects are configured because Cisco IOS does not allow for things like HSRP or PBR to refer directly back to IP SLA, however it can refer to a “track object” that then refers to the SLA that is running.

So first things first, the track object must locally reference the SLA # that PBR is also being performed on, so I will need to remove the configs and set them from R4… so one sec here:

R5#debug ip packet
IP packet debugging is on
R5#
May 22 00:07:19.470: IP: tableid=0, s=4.4.4.4 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 22 00:07:19.470: IP: s=4.4.4.4 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 22 00:07:19.474: IP: tableid=0, s=5.5.5.5 (local), d=4.4.4.4 (FastEthernet0/1), routed via FIB
May 22 00:07:19.474: IP: s=5.5.5.5 (local), d=4.4.4.4 (FastEthernet0/1), len 64, sending
R5#u all
All possible debugging has been turned off
R5#

So we have the same config sourced from 4.4.4.4 going to 5.5.5.5 now.

So the first task is to set configure a track object:

R4(config)#track ?
  <1-1000>    Tracked object
  resolution  Tracking resolution parameters
  timer       Polling interval timers

R4(config)#track 5 ip sla 1 ?
  reachability  Reachability
  state         Return code state
  <cr>

R4(config)#track 5 ip sla 1 reachability ?
  <cr>

R4(config)#track 5 ip sla 1 reachability

Note here it drops me into configuration mode for this tracking object

R4(config-track)#?
Tracking instance configuration commands:
  default        Set a command to its defaults
  default-state  Default object state
  delay          Tracking delay
  exit           Exit from tracking configuration mode
  no             Negate a command or set its defaults

R4(config-track)#delay ?
  down  Delay down change notification
  up    Delay up change notification

R4(config-track)#delay down ?
  <0-180>  Seconds to delay

R4(config-track)#delay down 30 ?
  up  Delay up change notification
  <cr>

R4(config-track)#delay down 30
R4(config-track)#

So it is polling for reachability every 10 seconds, but it now will have a total of 30 seconds before this track object reports it as Down / Unreachable, and for this example we’ll need to tie this track number to an IP route configured for the destination address:

R4(config)#ip route 5.5.5.5 255.255.255.255 172.12.45.5 ?
  <1-255>    Distance metric for this route
  multicast  multicast route
  name       Specify name of the next hop
  permanent  permanent route
  tag        Set tag for this route
  track      Install route depending on tracked item
  <cr>

R4(config)#ip route 5.5.5.5 255.255.255.255 172.12.45.5 track 5

So now the route to that is being tracked, lets look at what it looks like in a some different scenarios on its own, before we involve PBR at all:

“show tracking” to verify your tracked object

R4(config)#do show track
Track 5
  IP SLA 1 reachability
  Reachability is Up
    1 change, last change 00:07:17
  Delay down 30 secs
  Latest operation return code: OK
  Latest RTT (millisecs) 1
  Tracked by:
    STATIC-IP-ROUTING 0
R4(config)#

So it’s looking pretty good, so lets shut down R5’s loopback and see how this goes:

R4(config)#do sh track
Track 5
  IP SLA 1 reachability
  Reachability is Up, delayed Down (23 secs remaining)
    1 change, last change 00:11:26
  Delay down 30 secs
  Latest operation return code: OK
  Latest RTT (millisecs) 3
  Tracked by:
    STATIC-IP-ROUTING 0
R4(config)#do sh ip sla stat
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: 00:21:09 UTC Mon May 22 2017
Latest operation return code: Timeout
Number of successes: 82
Number of failures: 3
Operation time to live: Forever

R4(config)#

Before I could get back to routers to issue the command again, I got this console message:

R4(config)#
May 22 00:21:31.104: %TRACKING-5-STATE: 5 ip sla 1 reachability Up->Down
R4(config)#

So lets look at some things to see what this has changed if anything:

IP Route Table:

R4#sh ip route

Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
S        1.1.1.1 [1/0] via 172.12.14.1
      4.0.0.0/32 is subnetted, 1 subnets
C        4.4.4.4 is directly connected, Loopback4
      172.12.0.0/16 is variably subnetted, 6 subnets, 2 masks
C        172.12.14.0/24 is directly connected, FastEthernet0/1
L        172.12.14.4/32 is directly connected, FastEthernet0/1
C        172.12.45.0/24 is directly connected, FastEthernet0/0
L        172.12.45.4/32 is directly connected, FastEthernet0/0
C        172.12.55.0/24 is directly connected, Loopback55
L        172.12.55.4/32 is directly connected, Loopback55
R4#

I tried to look at the route more detailed with “sh ip route 5.5.5.5” but said no route exists in the routing table, so I checked CEF to see if the route is even being considered at all:

R4#sh ip cef
Prefix               Next Hop             Interface
0.0.0.0/0            no route
0.0.0.0/8            drop
0.0.0.0/32           receive
1.1.1.1/32           172.12.14.1          FastEthernet0/1
4.4.4.4/32           receive              Loopback4
127.0.0.0/8          drop
172.12.14.0/24       attached             FastEthernet0/1
172.12.14.0/32       receive              FastEthernet0/1
172.12.14.1/32       attached             FastEthernet0/1

No it isn’t, if CEF doesn’t see you as a route, you are not a route. However of course in the running configuration:

R4#sh run | i 5.5.5.5
ip route 5.5.5.5 255.255.255.255 172.12.45.5 track 5
 icmp-echo 5.5.5.5 source-ip 4.4.4.4
access-list 5 permit 5.5.5.5
R4#

To round off the reaction of tying the SLA to the IP route, and after it went down the route seemed to just disappear from the router other than the running config, IP SLA is still running even though it continues to fail:

R4#sh ip sla stat
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: 00:45:19 UTC Mon May 22 2017
Latest operation return code: Timeout
Number of successes: 82
Number of failures: 148
Operation time to live: Forever

R4#

Now I did a “no shut” on Lo5 on R5, but the failures continue to increment on the “sh ip sla stat”, and the route is not being brought back into the route table.

I will have to re-visit this, as I’m not finding an answer on the internet very easily to this one, so I will need to research why this happens and revisit this so I can keep moving.

Configuring PBR with the Tracking Object

***One important note about Route-Maps and configuring tracking, if the same sequence has a line allowing the traffic, it will see the verify line and once it sees it cannot verify reachability it will go to the next set command which is “next-hop …” and route traffic there anyways so if you see any more commands within the same sequence that should be a huge red flag on exam day***

So I had to actually completely remove and reconfigure the tracking object, as it would not let go of that Track being “Down”, but once I did I tied it to my Policy Route like this:

R4(config)#access-list 5 permit 5.5.5.5
R4(config)#route-map PBR permit 10
R4(config-route-map)#match ip add 5
R4(config-route-map)#set ip next-hop ?
  A.B.C.D              IP address of next hop
  dynamic              application dynamically sets next hop
  encapsulate          Encapsulation profile for VPN nexthop
  peer-address         Use peer address (for BGP only)
  recursive            Recursive next-hop
  self                 Use self address (for BGP only)
  verify-availability  Verify if nexthop is reachable

R4(config-route-map)#set ip next-hop verify-availability ?
  A.B.C.D  IP address of next hop
  <cr>

R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 ?
  <1-65535>  Sequence to insert into next-hop list

R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 ?
  track  set the next hop depending on the state of a tracked object

R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 track ?
  <1-1000>  tracked object number

R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 track 5
R4(config-route-map)#route-map PBR permit 20
R4(config-route-map)#exit
R4(config)#

That is a really long command, I am hoping I don’t have to configure that off the top of my head on exam day!

So now I am going to see how THIS reacts to the tracking when the interface is shut down, hopefully it doesn’t just yank the route and not return it again, that was a bit frustrating to not resolve (yet):

R4#sh track
Track 5
  IP SLA 1 state
  State is Up, delayed Down (5 secs remaining)
    1 change, last change 00:11:09
  Delay up 5 secs, down 30 secs
  Latest operation return code: OK
  Latest RTT (millisecs) 1
  Tracked by:
    ROUTE-MAP 0
R4#sh track
May 22 01:40:43.256: %TRACKING-5-STATE: 5 ip sla 1 state Up->Down

After 30 seconds of being down, the tracking set it to down, so lets see if I “no shut” the interface if this will work any different than the static route:

R5(config-if)#no shut
R5(config-if)#
ASR#4
[Resuming connection 4 to r4 … ]

R4#sh ip route

Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
S        1.1.1.1 [1/0] via 172.12.14.1
      4.0.0.0/32 is subnetted, 1 subnets
C        4.4.4.4 is directly connected, Loopback4
      5.0.0.0/32 is subnetted, 1 subnets
S        5.5.5.5 [1/0] via 172.12.45.5
      172.12.0.0/16 is variably subnetted, 6 subnets, 2 masks
C        172.12.14.0/24 is directly connected, FastEthernet0/1
L        172.12.14.4/32 is directly connected, FastEthernet0/1
C        172.12.45.0/24 is directly connected, FastEthernet0/0
L        172.12.45.4/32 is directly connected, FastEthernet0/0
 –More–
May 22 01:42:28.256: %TRACKING-5-STATE: 5 ip sla 1 state Down->Up
C        172.12.55.0/24 is directly connected, Loopback55
L        172.12.55.4/32 is directly connected, Loopback55
R4#

So not only was I able to to verify that it didn’t remove the 5.5.5.5 route even when it was unreachable to the SLA operation, and also saw it come back up within what I believe was the 5 second delay time I set on track object as “Up” delay value.

So I’m getting tired of beating this topic to death, so I’ll end it here, but if this were to be applied to an interface and the SLA tracking went to a Down state the PBR would be disabled and normal routing would take over.

My brain is kind of melting at this point and I need to try to get one topic down, so I will update the static routing portion if I find a good answer, cause I actually see that a lot at my work and the fact its just not bringing routes back into the table is odd to me.