Thursday, October 27, 2016

Cisco Advanced Routing Lab: IP Service Level Agreements

So you've taken all of the pieces that we've discussed so far -- Multicast, QoS, etc. -- and built a solid network for your customers. How do you prove to your customers that they are getting the quality that they are paying for? Arguably more important, how do you keep tabs on what's happening on your network so that you can fix any problems before your customer's begin complaining about them? As I'm sure you've guessed, there is a solution that is built in to most reasonably modern versions of Cisco IOS, and that solution is called "ip sla" (service level agreements). To mock up this lab, I had to upgrade my virtual routers slightly by adding IOS images for Cisco 7200 routers, as the IOS versions I had for the 3640 and 2651 routers I've been modeling don't support the ip sla feature.

Basically, the ip sla feature sends traffic across the network to measure packet loss, latency, jitter, etc. What's nice is that ip sla's allow you to craft the packets (or datagrams) that will be sent to mimic whatever kind of traffic you want to monitor. For example, you can set the TOS markings to monitor priority traffic across the network. You can tell the IP SLA engine to send data inside the G729 codec to measure how voice traffic is affected by the network. You can specify how often to send this traffic across the network, and how many packets to send at a time. It's kind of like the ping tool on steroids ;)

Suppose you have the following network...:


Because we have clients connected to R5, and a streaming server connected through R3, we want to make sure that jitter and latency is within acceptable limits between these two routers. We could build a Smokeping server (and in fact, Smokeping is running on the server labeled "CentOS6"), but Smokeping just sends standard ICMP packets. What if we want to measure latency of UDP traffic with specific QoS markings? Smokeping can't do that. Fortunately, this is exactly the kind of problem that the ip sla feature was designed to address, so we will enable ip sla on R3 and R5:
R3#sho run | begin ip sla
ip sla responder
ip sla 10015
udp-jitter 10.254.254.5 32766 source-ip 10.254.254.3 num-packets 20
tos 184
frequency 300
ip sla schedule 10015 life forever start-time now
ip sla 10025
udp-echo 10.254.254.5 32764 source-ip 10.254.254.3
ip sla schedule 10025 life forever start-time now

...and on R5:
R5#sho run | begin ip sla
ip sla responder
ip sla 10013
udp-jitter 10.254.254.3 32766 source-ip 10.254.254.5 num-packets 20
tos 184
frequency 300
ip sla schedule 10013 life forever start-time now
ip sla 10023
udp-echo 10.254.254.3 32764 source-ip 10.254.254.5
ip sla schedule 10023 life forever start-time now

The first line, "ip sla responder" tells the router to respond to IP SLA traffic. Next, we create an IP SLA process to send traffic to remote destination. The 5-digit identifier is simply a unique identifier on that router; in my case, the first three digits were pretty much arbitrary, the fourth digit I used to separate the UDP jitter test from the UDP echo test, and the last digit was taken from the router number (i.e., xxxx3 was a query for R3, xxxx5 was a query for R5, etc.). This is perhaps overkill with the examples I've shown, but on R1, for example, I set up similar IP SLA probes for R2-R5. Next, I specified what kind of test I wanted to run ("udp-jitter, udp-echo -- there are many, many more options to choose from), the destination IP address, a random port number to probe on the destination, the source IP address to use in the outbound datagram, and the optional "num-packets" flag to specify how many packets (grrr...not to be pedantic, but UDP -> datagram, Cisco! In technical fields, precise terminology often is important </rant>). In the UDP jitter test, the next line specifies that we want to apply a TOS marking so that this packet receives the same kind of priority handling that VoIP traffic (DSCP EF) would receive. After that, I specified a frequency of one test every 5 minutes for the UDP jitter test (I left the UDP echo test at the default settings). Finally, I specified the schedule for the IP SLA probes, start now and run indefinitely. However, if you wanted to monitor performance over a weekend or for a short period during backups at night, you could specify different values:
R3(config)#ip sla schedule 10023 life ?
  <0-2147483647>  Life seconds (default 3600)
  forever         continue running forever

R3(config)#ip sla schedule 10023 life 18600 ?
  ageout      How long to keep this Entry when inactive
  recurring   Probe to be scheduled automatically every day
  start-time  When to start this entry
  <cr>

R3(config)#ip sla schedule 10023 life 18600 recurring ?
  ageout      How long to keep this Entry when inactive
  start-time  When to start this entry
  <cr>

R3(config)#ip sla schedule 10023 life 18600 recurring start-time ?
  after     Start after a certain amount of time from now
  hh:mm     Start time (hh:mm)
  hh:mm:ss  Start time (hh:mm:ss)
  now       Start now
  pending   Start pending

R3(config)#ip sla schedule 10023 life 18600 recurring start-time 23:00 ?
  <1-31>  Day of the month
  MONTH   Month of the year
  ageout  How long to keep this Entry when inactive
  <cr>

R3(config)#$dule 10023 life 18600 recurring start-time 23:00 November 1 ?
  ageout  How long to keep this Entry when inactive
  <cr>

Cool. So...what happens when you enable IP SLA monitoring on your routers?

R3#sho ip sla statistics
IPSLAs Latest Operation Statistics

IPSLA operation id: 10015
Type of operation: udp-jitter
        Latest RTT: 17 milliseconds
Latest operation start time: *00:20:17.115 AKDT Wed Oct 26 2016
Latest operation return code: OK
RTT Values:
        Number Of RTT: 19               RTT Min/Avg/Max: 1/17/51 milliseconds
Latency one-way time:
        Number of Latency one-way Samples: 0
        Source to Destination Latency one way Min/Avg/Max: 0/0/0 milliseconds
        Destination to Source Latency one way Min/Avg/Max: 0/0/0 milliseconds
Jitter Time:
        Number of SD Jitter Samples: 17
        Number of DS Jitter Samples: 18
        Source to Destination Jitter Min/Avg/Max: 0/11/39 milliseconds
        Destination to Source Jitter Min/Avg/Max: 0/8/22 milliseconds
Packet Loss Values:
        Loss Source to Destination: 1           Loss Destination to Source: 0
        Out Of Sequence: 0      Tail Drop: 0
        Packet Late Arrival: 0  Packet Skipped: 0
Voice Score Values:
        Calculated Planning Impairment Factor (ICPIF): 0
        Mean Opinion Score (MOS): 0
Number of successes: 6
Number of failures: 0
Operation time to live: Forever



IPSLA operation id: 10025
Type of operation: udp-echo
        Latest RTT: 8 milliseconds
Latest operation start time: *00:24:23.535 AKDT Wed Oct 26 2016
Latest operation return code: OK
Number of successes: 58
Number of failures: 0
Operation time to live: Forever


R3#

That is at once a lot of information and not nearly as much information as you might hope to receive. While there is a lot of output showing quite a bit of detail, this is only the results of the last test. If you want any kind of historical perspective, you will need to put in some additional work. For example, I believe that you can have the results of the IP SLA monitoring logged to a syslog server (Splunk, perhaps?), although I haven't tried that. Alternatively, you can use SNMP to pull the results, although finding the right OIDs to query is a bit of a chore.

For now, let's focus on the CLI output. First, the UDP jitter test: the executive summary of the IP SLA monitoring is presented at the very end of the output:
Voice Score Values:
        Calculated Planning Impairment Factor (ICPIF): 0
        Mean Opinion Score (MOS): 0
Number of successes: 6
Number of failures: 0

A quick Google search provides a little insight into this output. Both ICPIF and MOS attempt to provide a metric for evaluating how suitable a network is for Voice over IP telephony, with lower scores being better than higher scores.
ICPIF values numerically less than 20 are generally considered "adequate."

MOS, on the other hand, is rated on a scale from 1 to 5, with 1 being very poor quality and 5 being very good quality. Note: In my output, the given MOS score was zero. Per Cisco, this means that the "...MOS data could not be generated for the given operation."

That should be enough to get you started with IP SLA monitoring on IOS. It's definitely a feature worth checking out if you will be using Cisco routers on any kind of production network.

No comments:

Post a Comment