IP SLA explanation and examples of when to use it
An IP SLA (Service Level Agreement) is an agreement between the Service Provider and (you) the customer, that you will get an agreed upon level of uptime / bandwidth for your money, and IP SLA configuration on a router is a mechanism to measure and verify that the service being provided by the carrier is within the agreed upon SLA.
This tracking can include both the uptime of service, and the bandwidth speed measured by the delay, when sending a packet to a destination and receiving a reply back.
It can be sent back to some sort of third party Syslog server to display in a nice graphical format for “Accounting” purposes to keep your ISP honest, however it is also used by other features at Layer 3 on the CLI that we as ROUTE candidates are interested in!
For example, and this is from the SWITCH Exam material, but HSRP (Hot Standby Routing Protocol) which runs on the LAN between a fail-over pair of Routers, that have identical configurations and remain in an Active / Standby role in case the Active router goes down it can fail over to the Standby.
As HSRP runs on the LAN side (making it a SWITCH topic), you can tie it back to IP SLA configurations, so if the Active router stops getting responses back from pings sent out its ISP gateway HSRP can tell the Standby router to make itself the Active router.
On a more ROUTE topic, you can also tie it into PBR, as Policy Routing does no kind of polling to verify the next hop you configured is alive / reachable – This is where IP SLA can be configured and tied into the PBR configuration to verify the configured next hop.
IP SLA configuration concepts and IP SLA Terminology (what things are called)
The most common use for IP SLA is to generate traffic originating from the local router in the form of a ping to another IP address, to confirm its reachable, you can measure much more detailed information such as Delay and Jitter but for the ROUTE we will keep it to tracking some pings.
Also you can configure multiple “SLA Operations” to multiple different destinations, that generate different types of traffic, to track completely different types of data statistics at the same time.
When configuring an “Operation” of IP SLA, the local router sending the traffic is called the “Sender”, and the destination device is called the “Receiver” and can be a router or a host depending on what kind of traffic you are sending / requesting back.
For example if an “Operation” is configured as a simple ping / response, the “Receiver” could be a laptop on a remote network, as that device could respond back to the ping.
If you want more details like timestamps / delay it took to respond on the remote device / etc, you will need a “Receiver” like a Cisco router, that is capable of generating that “Operations” request as a regular web server or laptop may not be able to send the requested data (also things like looking at Jitter / Delay / other values).
Here is a quick list for exam day purposes of what types of traffic can be configured:
- ICMP – Pings, jitter
- RTP – VOIP traffic
- TCP – Established Connections
- UDP – Pings, jitter
- DNS
- HTTP
- FTP
Although for detailed data (anything outside a ping response), you would need to configure the remote router to respond to the “Operation” configuration correctly.
However for ROUTE exam and really from what I’ve seen, the most common “Operation” to configure and troubleshoot is for pings / internet connectivity to the ISP, so that is what I will be configuring.
Enough theory and definitions, lets get configuring:
R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#ip sla ?
<1-2147483647> Entry Number
auto IP SLAs Auto Configuration
enable Enable Event Notifications
ethernet-monitor IP SLAs Auto Ethernet Configuration
group Group Configuration or Group Scheduling
key-chain Use MD5 Authentication for IP SLAs Control Messages
logging Enable Syslog
low-memory Configure Low Water Memory Mark
reaction-configuration IP SLAs Reaction-Configuration
reaction-trigger IP SLAs Trigger Assignment
reset IP SLAs Reset
responder Enable IP SLAs Responder
restart Restart An Active Entry
schedule IP SLAs Entry Scheduling
R1(config)#ip sla 1 ?
<cr>
R1(config)#ip sla 1
R1(config-ip-sla)#
To start just from Global configuration mode, type “ip sla #”, and it will drop you into SLA configuration mode to configure the “operation”. I will break this up into different segments of ? output as there is quite a bit:
First looking at “Operation” options
R1(config-ip-sla)#?
IP SLAs entry configuration commands:
dhcp DHCP Operation
dns DNS Query Operation
ethernet Ethernet Operations
exit Exit Operation Configuration
ftp FTP Operation
http HTTP Operation
icmp-echo ICMP Echo Operation
icmp-jitter ICMP Jitter Operation
path-echo Path Discovered ICMP Echo Operation
path-jitter Path Discovered ICMP Jitter Operation
tcp-connect TCP Connect Operation
udp-echo UDP Echo Operation
udp-jitter UDP Jitter Operation
voip Voice Over IP Operation
R1(config-ip-sla)#icmp-echo ?
Hostname or A.B.C.D Destination IP address or hostname, broadcast disallowed
R1(config-ip-sla)#icmp-echo 5.5.5.5 ?
source-interface Source Interface (ingress icmp packet interface)
source-ip Source Address
<cr>
R1(config-ip-sla)#icmp-echo 5.5.5.5 source-ip 1.1.1.1 ?
<cr>
R1(config-ip-sla)#icmp-echo 5.5.5.5 source-ip 1.1.1.1
R1(config-ip-sla-echo)#
I could have ended it at a destination address, but for the heck of it I configured R1’s loopback to be a source-ip for the operation, so if either loopback goes down I assume the operation bites the dust.
Also note it dropped me into “echo” configuration mode highlighted in red, so I will look at my options using ? again and go from there.
Configuration of the icmp-echo “Operation”:
R1(config-ip-sla-echo)#?
IP SLAs Icmp Echo Configuration Commands:
default Set a command to its defaults
exit Exit operation configuration
frequency Frequency of an operation
history History and Distribution Data
no Negate a command or set its defaults
owner Owner of Entry
request-data-size Request data size
tag User defined tag
threshold Operation threshold in milliseconds
timeout Timeout of an operation
tos Type Of Service
verify-data Verify data
vrf Configure IP SLAs for a VPN Routing/Forwarding instance
R1(config-ip-sla-echo)#frequency ?
<1-604800> Frequency in seconds (default 60)
R1(config-ip-sla-echo)#frequency 10 ?
<cr>
R1(config-ip-sla-echo)#frequency 10
R1(config-ip-sla-echo)#
So I set IP SLA operation # 1 to do an ICMP echo sorced from 1.1.1.1 to remote destination 5.5.5.5 with a frequency of every 10 seconds, now to apply it and get this party started:
R1(config-ip-sla-echo)#exit
R1(config)#ip sla schedule 1 ?
ageout How long to keep this Entry when inactive
life Length of time to execute in seconds
recurring Probe to be scheduled automatically every day
start-time When to start this entry
<cr>
R1(config)#ip sla schedule 1 life ?
<0-2147483647> Life seconds (default 3600)
forever continue running forever
R1(config)#ip sla schedule 1 life forever ?
ageout How long to keep this Entry when inactive
recurring Probe to be scheduled automatically every day
start-time When to start this entry
<cr>
R1(config)#ip sla schedule 1 life forever start-time ?
after Start after a certain amount of time from now
hh:mm Start time (hh:mm)
hh:mm:ss Start time (hh:mm:ss)
now Start now
pending Start pending
R1(config)#ip sla schedule 1 life forever start-time now ?
ageout How long to keep this Entry when inactive
recurring Probe to be scheduled automatically every day
<cr>
R1(config)#ip sla schedule 1 life forever start-time now
R1(config)#
ASR#5
[Resuming connection 5 to r5 … ]
[OK]
R5#debug ip packet
IP packet debugging is on
Lets see some IP SLA traffic! :
R5#
May 21 23:02:50.965: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:02:50.965: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:02:50.965: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:02:50.965: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#
May 21 23:03:00.963: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:03:00.967: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:03:00.967: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:03:00.967: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#
May 21 23:03:10.965: IP: tableid=0, s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 21 23:03:10.965: IP: s=1.1.1.1 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 21 23:03:10.965: IP: tableid=0, s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), routed via FIB
May 21 23:03:10.965: IP: s=5.5.5.5 (local), d=1.1.1.1 (FastEthernet0/1), len 64, sending
R5#u all
All possible debugging has been turned off
R5#
As can be seen at the very top, if you exit once from all the way into SLA configuration, it drops you back into Global configuration mode.
From there “ip sla schedule # …” is the command to start the operation and defining for how long / often it should run, I gave mine a “life” of “forever” so it continuously runs forever and a start-time of “now” to turn it on immediately – Though there are options as to how long it runs and when to start it!
As can be seen I hopped over to R5 quick and ran a “debug ip packet” and sure enough we are now seeing a ping sourced from 1.1.1.1 hitting 5.5.5.5 every 10 seconds on the dot!
There are two verification commands, that give two very different outputs
“show ip sla statistics”
R1#sh ip sla stat
IPSLAs Latest Operation Statistics
IPSLA operation id: 1
Latest RTT: 1 milliseconds
Latest operation start time: 23:12:51.010 UTC Sun May 21 2017
Latest operation return code: OK
Number of successes: 63
Number of failures: 0
Operation time to live: Forever
R1#
I started to highlight the important parts, but its all important, look at that info!
Operation start time, Operation id #, # of successes / fails, operation life time, this gives you the statistics of the operation at work whereas the second command is more geared towards the configuration itself (as you might guess by the name):
“show ip sla configuration”
R1#sh ip sla config
IP SLAs Infrastructure Engine-III
Entry number: 1
Owner:
Tag:
Operation timeout (milliseconds): 5000
Type of operation to perform: icmp-echo
Target address/Source address: 5.5.5.5/1.1.1.1
Type Of Service parameter: 0x0
Request size (ARR data portion): 28
Verify data: No
Vrf Name:
Schedule:
Operation frequency (seconds): 10 (not considered if randomly scheduled)
Next Scheduled Start Time: Start Time already passed
Group Scheduled : FALSE
Randomly Scheduled : FALSE
Life (seconds): Forever
Entry Ageout (seconds): never
Recurring (Starting Everyday): FALSE
Status of entry (SNMP RowStatus): Active
Threshold (milliseconds): 5000
Distribution Statistics:
Number of statistic hours kept: 2
Number of statistic distribution buckets kept: 1
Statistic distribution interval (milliseconds): 20
Enhanced History:
History Statistics:
Number of history Lives kept: 0
Number of history Buckets kept: 15
History Filter Type: None
R1#
This will also show a more complete view of life in terms of age-out time and all other fields I didn’t configure, so this is really the command to verify the configuration of the “Operation” where as “statistics” verification command is to see how the operation is succeeding or failing.
How SLA ties into PBR with something called “Enhanced Object Tracking”
I have created an addition to the Topology, a loopback interface, that will represent another route to the destination of 5.5.5.5:
Enhanced Object Tracking is the configuring of “track objects” that detect when the SLA is starting to slip, for example if it misses a ping it can be configured with a delay value so that it will not report back as unreachable until the delay value hits zero.
Track objects are configured because Cisco IOS does not allow for things like HSRP or PBR to refer directly back to IP SLA, however it can refer to a “track object” that then refers to the SLA that is running.
So first things first, the track object must locally reference the SLA # that PBR is also being performed on, so I will need to remove the configs and set them from R4… so one sec here:
R5#debug ip packet
IP packet debugging is on
R5#
May 22 00:07:19.470: IP: tableid=0, s=4.4.4.4 (FastEthernet0/1), d=5.5.5.5 (Loopback5), routed via RIB
May 22 00:07:19.470: IP: s=4.4.4.4 (FastEthernet0/1), d=5.5.5.5, len 64, rcvd 4
May 22 00:07:19.474: IP: tableid=0, s=5.5.5.5 (local), d=4.4.4.4 (FastEthernet0/1), routed via FIB
May 22 00:07:19.474: IP: s=5.5.5.5 (local), d=4.4.4.4 (FastEthernet0/1), len 64, sending
R5#u all
All possible debugging has been turned off
R5#
So we have the same config sourced from 4.4.4.4 going to 5.5.5.5 now.
So the first task is to set configure a track object:
R4(config)#track ?
<1-1000> Tracked object
resolution Tracking resolution parameters
timer Polling interval timers
R4(config)#track 5 ip sla 1 ?
reachability Reachability
state Return code state
<cr>
R4(config)#track 5 ip sla 1 reachability ?
<cr>
R4(config)#track 5 ip sla 1 reachability
Note here it drops me into configuration mode for this tracking object
R4(config-track)#?
Tracking instance configuration commands:
default Set a command to its defaults
default-state Default object state
delay Tracking delay
exit Exit from tracking configuration mode
no Negate a command or set its defaults
R4(config-track)#delay ?
down Delay down change notification
up Delay up change notification
R4(config-track)#delay down ?
<0-180> Seconds to delay
R4(config-track)#delay down 30 ?
up Delay up change notification
<cr>
R4(config-track)#delay down 30
R4(config-track)#
So it is polling for reachability every 10 seconds, but it now will have a total of 30 seconds before this track object reports it as Down / Unreachable, and for this example we’ll need to tie this track number to an IP route configured for the destination address:
R4(config)#ip route 5.5.5.5 255.255.255.255 172.12.45.5 ?
<1-255> Distance metric for this route
multicast multicast route
name Specify name of the next hop
permanent permanent route
tag Set tag for this route
track Install route depending on tracked item
<cr>
R4(config)#ip route 5.5.5.5 255.255.255.255 172.12.45.5 track 5
So now the route to that is being tracked, lets look at what it looks like in a some different scenarios on its own, before we involve PBR at all:
“show tracking” to verify your tracked object
R4(config)#do show track
Track 5
IP SLA 1 reachability
Reachability is Up
1 change, last change 00:07:17
Delay down 30 secs
Latest operation return code: OK
Latest RTT (millisecs) 1
Tracked by:
STATIC-IP-ROUTING 0
R4(config)#
So it’s looking pretty good, so lets shut down R5’s loopback and see how this goes:
R4(config)#do sh track
Track 5
IP SLA 1 reachability
Reachability is Up, delayed Down (23 secs remaining)
1 change, last change 00:11:26
Delay down 30 secs
Latest operation return code: OK
Latest RTT (millisecs) 3
Tracked by:
STATIC-IP-ROUTING 0
R4(config)#do sh ip sla stat
IPSLAs Latest Operation Statistics
IPSLA operation id: 1
Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: 00:21:09 UTC Mon May 22 2017
Latest operation return code: Timeout
Number of successes: 82
Number of failures: 3
Operation time to live: Forever
R4(config)#
Before I could get back to routers to issue the command again, I got this console message:
R4(config)#
May 22 00:21:31.104: %TRACKING-5-STATE: 5 ip sla 1 reachability Up->Down
R4(config)#
So lets look at some things to see what this has changed if anything:
IP Route Table:
R4#sh ip route
Gateway of last resort is not set
1.0.0.0/32 is subnetted, 1 subnets
S 1.1.1.1 [1/0] via 172.12.14.1
4.0.0.0/32 is subnetted, 1 subnets
C 4.4.4.4 is directly connected, Loopback4
172.12.0.0/16 is variably subnetted, 6 subnets, 2 masks
C 172.12.14.0/24 is directly connected, FastEthernet0/1
L 172.12.14.4/32 is directly connected, FastEthernet0/1
C 172.12.45.0/24 is directly connected, FastEthernet0/0
L 172.12.45.4/32 is directly connected, FastEthernet0/0
C 172.12.55.0/24 is directly connected, Loopback55
L 172.12.55.4/32 is directly connected, Loopback55
R4#
I tried to look at the route more detailed with “sh ip route 5.5.5.5” but said no route exists in the routing table, so I checked CEF to see if the route is even being considered at all:
R4#sh ip cef
Prefix Next Hop Interface
0.0.0.0/0 no route
0.0.0.0/8 drop
0.0.0.0/32 receive
1.1.1.1/32 172.12.14.1 FastEthernet0/1
4.4.4.4/32 receive Loopback4
127.0.0.0/8 drop
172.12.14.0/24 attached FastEthernet0/1
172.12.14.0/32 receive FastEthernet0/1
172.12.14.1/32 attached FastEthernet0/1
No it isn’t, if CEF doesn’t see you as a route, you are not a route. However of course in the running configuration:
R4#sh run | i 5.5.5.5
ip route 5.5.5.5 255.255.255.255 172.12.45.5 track 5
icmp-echo 5.5.5.5 source-ip 4.4.4.4
access-list 5 permit 5.5.5.5
R4#
To round off the reaction of tying the SLA to the IP route, and after it went down the route seemed to just disappear from the router other than the running config, IP SLA is still running even though it continues to fail:
R4#sh ip sla stat
IPSLAs Latest Operation Statistics
IPSLA operation id: 1
Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: 00:45:19 UTC Mon May 22 2017
Latest operation return code: Timeout
Number of successes: 82
Number of failures: 148
Operation time to live: Forever
R4#
Now I did a “no shut” on Lo5 on R5, but the failures continue to increment on the “sh ip sla stat”, and the route is not being brought back into the route table.
I will have to re-visit this, as I’m not finding an answer on the internet very easily to this one, so I will need to research why this happens and revisit this so I can keep moving.
Configuring PBR with the Tracking Object
***One important note about Route-Maps and configuring tracking, if the same sequence has a line allowing the traffic, it will see the verify line and once it sees it cannot verify reachability it will go to the next set command which is “next-hop …” and route traffic there anyways so if you see any more commands within the same sequence that should be a huge red flag on exam day***
So I had to actually completely remove and reconfigure the tracking object, as it would not let go of that Track being “Down”, but once I did I tied it to my Policy Route like this:
R4(config)#access-list 5 permit 5.5.5.5
R4(config)#route-map PBR permit 10
R4(config-route-map)#match ip add 5
R4(config-route-map)#set ip next-hop ?
A.B.C.D IP address of next hop
dynamic application dynamically sets next hop
encapsulate Encapsulation profile for VPN nexthop
peer-address Use peer address (for BGP only)
recursive Recursive next-hop
self Use self address (for BGP only)
verify-availability Verify if nexthop is reachable
R4(config-route-map)#set ip next-hop verify-availability ?
A.B.C.D IP address of next hop
<cr>
R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 ?
<1-65535> Sequence to insert into next-hop list
R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 ?
track set the next hop depending on the state of a tracked object
R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 track ?
<1-1000> tracked object number
R4(config-route-map)#set ip next-hop verify-availability 5.5.5.5 10 track 5
R4(config-route-map)#route-map PBR permit 20
R4(config-route-map)#exit
R4(config)#
That is a really long command, I am hoping I don’t have to configure that off the top of my head on exam day!
So now I am going to see how THIS reacts to the tracking when the interface is shut down, hopefully it doesn’t just yank the route and not return it again, that was a bit frustrating to not resolve (yet):
R4#sh track
Track 5
IP SLA 1 state
State is Up, delayed Down (5 secs remaining)
1 change, last change 00:11:09
Delay up 5 secs, down 30 secs
Latest operation return code: OK
Latest RTT (millisecs) 1
Tracked by:
ROUTE-MAP 0
R4#sh track
May 22 01:40:43.256: %TRACKING-5-STATE: 5 ip sla 1 state Up->Down
After 30 seconds of being down, the tracking set it to down, so lets see if I “no shut” the interface if this will work any different than the static route:
R5(config-if)#no shut
R5(config-if)#
ASR#4
[Resuming connection 4 to r4 … ]
R4#sh ip route
Gateway of last resort is not set
1.0.0.0/32 is subnetted, 1 subnets
S 1.1.1.1 [1/0] via 172.12.14.1
4.0.0.0/32 is subnetted, 1 subnets
C 4.4.4.4 is directly connected, Loopback4
5.0.0.0/32 is subnetted, 1 subnets
S 5.5.5.5 [1/0] via 172.12.45.5
172.12.0.0/16 is variably subnetted, 6 subnets, 2 masks
C 172.12.14.0/24 is directly connected, FastEthernet0/1
L 172.12.14.4/32 is directly connected, FastEthernet0/1
C 172.12.45.0/24 is directly connected, FastEthernet0/0
L 172.12.45.4/32 is directly connected, FastEthernet0/0
–More–
May 22 01:42:28.256: %TRACKING-5-STATE: 5 ip sla 1 state Down->Up
C 172.12.55.0/24 is directly connected, Loopback55
L 172.12.55.4/32 is directly connected, Loopback55
R4#
So not only was I able to to verify that it didn’t remove the 5.5.5.5 route even when it was unreachable to the SLA operation, and also saw it come back up within what I believe was the 5 second delay time I set on track object as “Up” delay value.
So I’m getting tired of beating this topic to death, so I’ll end it here, but if this were to be applied to an interface and the SLA tracking went to a Down state the PBR would be disabled and normal routing would take over.
My brain is kind of melting at this point and I need to try to get one topic down, so I will update the static routing portion if I find a good answer, cause I actually see that a lot at my work and the fact its just not bringing routes back into the table is odd to me.