Cisco Intro to QoS and CoS, Part 2 -- A Deeper Dive into Marking Traffic

In our last lab, we set up a simple network, and showed how applying a QoS policy to prioritize certain traffic over other traffic could help high-priority, time-sensitive applications work properly, even during times of network congestion. But what did all of the access lists, class maps and policy maps really do? In this lab, we'll take a closer look at what the class maps and policy maps are really doing at the packet layer.

First, we'll start with a new network diagram:

As we stated last time, the first step in designing an effective QoS policy is identifying the traffic on your network, and deciding how you want to prioritize it. On this network, we will sort network traffic into the following classes:
  1. Core Control: OSPF, in this case (which is automatically identified and marked by the router itself);
  2. VoIP: We will be simulating voice traffic by streaming an MP4 video file from the host CentOS6 to hosts on the "voice" VLAN (VLAN 100);
  3. Call Signaling: We will also be simulating call signaling traffic with SSH from the "voice" VLAN (VLAN 100);
  4. Routine: Bulk data traffic from the "LAN" VLAN (VLAN 10);
  5. default: Anything not otherwise classified.

Here are the access lists that we will use to identify the traffic:
ip access-list extended CALLSIGNALING
 permit tcp any eq 22
 permit tcp any eq 22
 deny ip any any
ip access-list extended ROUTINE
 permit ip any
 permit ip any
 deny ip any any
ip access-list extended VOIP
 permit tcp any eq www
 permit tcp any eq www
 deny ip any any

Like I said earlier, the router automatically identifies and marks routing protocol traffic, so we don't need to create a separate ACL for that.

After creating the ACL's to match the network traffic, we will create the class maps:
class-map match-any NETWORK_CONTROL
 match ip dscp cs6
class-map match-any VOIP
 match ip dscp ef
 match access-group name VOIP
class-map match-any CALLSIGNALING
 match ip dscp cs3
 match ip dscp af31
 match access-group name CALLSIGNALING
class-map match-any ROUTINE
 match ip dscp cs2
 match access-group name ROUTINE

Let's discuss the class map in a little more detail. First, notice that the first line of each class map contains the phrase "match-any." Take, for example, the class map for CALLSIGNALING. We have a line that states, "match ip dscp cs3," and then we have two other "match..." statements following that line. Since we created the class map using the "match-any" statement, then we are essentially using a logical OR to match either the DSCP marking CS3, the DSCP marking AF31 or the "CALLSIGNALING" access list. If we had used the phrase, "match-all," then we would be performing a logical AND against all of the statements (which would never match, as the DSCP CS3 and DSCP AF31 markings are mutually exclusive...but it can be useful if you are matching on other criteria than DSCP markings).

We will explain why we are matching against either DSCP markings or an access list shortly, but for now, let's proceed to the policy-map:
policy-map EGRESS
 class VOIP
  priority percent 20
  bandwidth percent 5
 class ROUTINE
  bandwidth percent 50
  random-detect dscp-based
  bandwidth percent 5
 class class-default
  bandwidth percent 1
  random-detect dscp-based
policy-map EGRESS-BW
 class class-default
  shape average 10000000
  service-policy EGRESS
policy-map INGRESS
 class VOIP
  set dscp ef
  set dscp af31
 class ROUTINE
  set dscp cs2
 class class-default
  set dscp cs1

We have created three policy-maps in this example: EGRESS, EGRESS-BW, and INGRESS (creative names, no?).

EGRESS-BW is pretty simple: essentially, we are shaping all of the traffic in this policy to 10Mbps, then calling the "EGRESS" policy. This is a pretty common strategy, as it allows you to create a parent policy (EGRESS-BW, in this case) to define the shaper/policer, and then a child policy to break out the bandwidth to the individual traffic classes. This allows for a very flexible approach, since you can create multiple parent policies that call the same child policy, allowing you to quickly make changes to your traffic shaper.

The EGRESS policy matches traffic against the class-maps that we previously defined, then divvies that bandwidth up into the individual traffic classes by percentage. There are two details worth a little extra discussion in the EGRESS policy. First, notice that class VOIP uses "priority percent..." whereas all of the other classes use "bandwidth percent..." This places VOIP traffic on the priority (low-latency) queue. The other classes buffer incoming traffic, placing it on a queue until the router gets a chance to transmit it on the interface; class VOIP, on the other hand is transmitted IMMEDIATELY. This is because VoIP traffic is very sensitive to jitter, and queueing up traffic before transmitting it causes latency. However, be aware that if traffic in class VOIP cannot be transmitted immediately, it will be DISCARDED! Therefore, you don't want to use the "priority" statement unless it is for a traffic class that can tolerate drops, but cannot tolerate jitter. Second, notice how class "ROUTINE" and class "class-default" (a built-in traffic class on Cisco routers) use the statement, "random-detect dscp-based?" This statement tells the QoS policy to enable the "random early detection" scheduler, which will randomly discard packets as the traffic queues become congested. The thresholds at which RED begins to discard traffic are configurable, but in this example, we are using the default settings, as this is beyond the scope of this tutorial. At first, randomly discarding traffic might sound like a bad idea, but the idea behind this strategy is that by randomly discarding the occasional packet before the network becomes fully congested, the service sending the traffic that is being dropped will begin to slow down the rate at which it is transmitted, thus delaying the onset of congestion. However, we do not want to enable this feature in class VOIP or class CALLSIGNALING, as this is priority traffic that you really don't want to drop.

Finally, the INGRESS policy is where the traffic inbound to the router from the network clients (Knoppix_Clone_1 and Knoppix_Clone_2) is identified and marked. The only action taken by the INGRESS policy is applying the DSCP markings to the traffic, based upon the access control lists. In truth, if an incoming packet already had a DSCP marking, the INGRESS policy would happily classify traffic based upon that, unless other configuration items were present in the router config, but for now, let's just assume that all incoming traffic has no DSCP markings, and therefore the only lines in the class-maps that will match are the "match access-group name..." statements.

So...this is great, in theory, but does it actually work? To find out, I ran tcpdump on the CentOS6 host to capture all incoming network traffic. If the router configuration is good, then...:
  1. Any traffic coming from 192.168.1.x should be marked with a DSCP value of CS2;
  2. Any traffic coming from 192.168.100.x outbound to any host on port 22 should be marked with a DSCP value of AF31;
  3. Any traffic coming from 192.168.100.x outbound to any host on port 80 should be marked with a DSCP value of EF;
  4. and Any traffic originating on the router itself that is identified as OSPF should be marked with a DSCP value of CS61.

After running several tests (streaming the MP4 file on CentOS6 to the Knoppix Clone on the VoIP VLAN, SSH'ing to CentOS6 from the Knoppix host on the VoIP VLAN, and connecting to the HTTP process on CentOS 6 from the Knoppix host on the LAN VLAN), I copied the capture file from CentOS6 to my local desktop, and opened up the file with Wireshark. Let's look at traffic originating on 192.168.1.x:

Yep, "Class Selector 2" is the value of the DSCP marking for traffic originating on, so it looks like our QoS policies are correctly identifying traffic on the LAN VLAN, and correctly marking it with a CS2 DSCP marking. Next let's look for traffic originating on the Voice VLAN, outbound for CentOS6 on port 22 (SSH):

Perfect! This screen capture shows that traffic originating on, sent to on port 22 was correctly identified and marked with a DSCP value of AF31. Next, let's see if traffic from to on port 80 is properly marked as "EF:"

Yep, this traffic has a DSCP marking of "EF," just as expected. Finally, let's verify that the OSPF multicast traffic from the router is properly marked as CS6:

That looks good, too.

Now that we have verified that traffic is being classified and marked appropriately, let's look at a very useful command for troubleshooting QoS and congestion on a Cisco router. Suppose that one of your users calls to report problems with their voice-over-IP telephones. They've already had the telephone tech look at the phone and the Call Manager, but because the phone tech found nothing wrong, they suspect it is a network problem. How can you tell if your QoS policy is dropping packets, due to excessive traffic in one of your queues?

If I had received a call such as this, I would start by looking at the QoS statistics for the outbound interface on the router -- in this case, FastEthernet1/0:
R1#sho policy-map int fa1/0  

  Service-policy output: EGRESS-BW

    Class-map: class-default (match-any)
      4298 packets, 316655 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
      Match: any
      Traffic Shaping
           Target/Average   Byte   Sustain   Excess    Interval  Increment
             Rate           Limit  bits/int  bits/int  (ms)      (bytes)  
         10000000/10000000  62500  250000    250000    25        31250    

        Adapt  Queue     Packets   Bytes     Packets   Bytes     Shaping
        Active Depth                         Delayed   Delayed   Active
        -      0         4298      316655    0         0         no

      Service-policy : EGRESS

        Class-map: VOIP (match-any)
          1556 packets, 103168 bytes
          5 minute offered rate 0 bps, drop rate 0 bps
          Match: ip dscp ef
            1556 packets, 103168 bytes
            5 minute rate 0 bps
          Match: access-group name VOIP
            0 packets, 0 bytes
            5 minute rate 0 bps
            Strict Priority
            Output Queue: Conversation 264
            Bandwidth 20 (%)
            Bandwidth 2000 (kbps) Burst 50000 (Bytes)
            (pkts matched/bytes matched) 0/0
            (total drops/bytes drops) 0/0

        Class-map: CALLSIGNALING (match-any)
          1035 packets, 79610 bytes
          5 minute offered rate 0 bps, drop rate 0 bps
          Match: ip dscp cs3
            0 packets, 0 bytes
            5 minute rate 0 bps
          Match: ip dscp af31
            1035 packets, 79610 bytes
            5 minute rate 0 bps
          Match: access-group name CALLSIGNALING
            0 packets, 0 bytes
            5 minute rate 0 bps
            Output Queue: Conversation 265
            Bandwidth 5 (%)
            Bandwidth 500 (kbps)
            (pkts matched/bytes matched) 0/0
        (depth/total drops/no-buffer drops) 0/0/0
             exponential weight: 9
             mean queue depth: 0

   dscp    Transmitted      Random drop      Tail drop    Minimum Maximum  Mark
           pkts/bytes       pkts/bytes       pkts/bytes    thresh  thresh  prob
   af11       0/0               0/0              0/0           32      40  1/10
   af12       0/0               0/0              0/0           28      40  1/10
   af13       0/0               0/0              0/0           24      40  1/10
   af21       0/0               0/0              0/0           32      40  1/10
   af22       0/0               0/0              0/0           28      40  1/10
   af23       0/0               0/0              0/0           24      40  1/10
   af31    1035/79610           0/0              0/0           32      40  1/10
   af32       0/0               0/0              0/0           28      40  1/10
   af33       0/0               0/0              0/0           24      40  1/10
   af41       0/0               0/0              0/0           32      40  1/10
   af42       0/0               0/0              0/0           28      40  1/10
   af43       0/0               0/0              0/0           24      40  1/10
    cs1       0/0               0/0              0/0           22      40  1/10
    cs2       0/0               0/0              0/0           24      40  1/10
    cs3       0/0               0/0              0/0           26      40  1/10
    cs4       0/0               0/0              0/0           28      40  1/10
    cs5       0/0               0/0              0/0           30      40  1/10
    cs6       0/0               0/0              0/0           32      40  1/10
    cs7       0/0               0/0              0/0           34      40  1/10
     ef       0/0               0/0              0/0           36      40  1/10
   rsvp       0/0               0/0              0/0           36      40  1/10
default       0/0               0/0              0/0           20      40  1/10

        Class-map: ROUTINE (match-any)
          284 packets, 23609 bytes
          5 minute offered rate 0 bps, drop rate 0 bps
          Match: ip dscp cs2
            284 packets, 23609 bytes
            5 minute rate 0 bps
          Match: access-group name ROUTINE
            0 packets, 0 bytes
            5 minute rate 0 bps
            Output Queue: Conversation 266
            Bandwidth 50 (%)
            Bandwidth 5000 (kbps)
            (pkts matched/bytes matched) 0/0
        (depth/total drops/no-buffer drops) 0/0/0
             exponential weight: 9
             mean queue depth: 0

   dscp    Transmitted      Random drop      Tail drop    Minimum Maximum  Mark
           pkts/bytes       pkts/bytes       pkts/bytes    thresh  thresh  prob
   af11       0/0               0/0              0/0           32      40  1/10
   af12       0/0               0/0              0/0           28      40  1/10
   af13       0/0               0/0              0/0           24      40  1/10
   af21       0/0               0/0              0/0           32      40  1/10
   af22       0/0               0/0              0/0           28      40  1/10
   af23       0/0               0/0              0/0           24      40  1/10
   af31       0/0               0/0              0/0           32      40  1/10
   af32       0/0               0/0              0/0           28      40  1/10
   af33       0/0               0/0              0/0           24      40  1/10
   af41       0/0               0/0              0/0           32      40  1/10
   af42       0/0               0/0              0/0           28      40  1/10
   af43       0/0               0/0              0/0           24      40  1/10
    cs1       0/0               0/0              0/0           22      40  1/10
    cs2     284/23609           0/0              0/0           24      40  1/10
    cs3       0/0               0/0              0/0           26      40  1/10
    cs4       0/0               0/0              0/0           28      40  1/10
    cs5       0/0               0/0              0/0           30      40  1/10
    cs6       0/0               0/0              0/0           32      40  1/10
    cs7       0/0               0/0              0/0           34      40  1/10
     ef       0/0               0/0              0/0           36      40  1/10
   rsvp       0/0               0/0              0/0           36      40  1/10
default       0/0               0/0              0/0           20      40  1/10

        Class-map: class-default (match-any)
          1423 packets, 110268 bytes
          5 minute offered rate 0 bps, drop rate 0 bps
          Match: any
            Output Queue: Conversation 267
            Bandwidth 1 (%)
            Bandwidth 100 (kbps) Max Threshold 64 (packets)
            (pkts matched/bytes matched) 0/0
        (depth/total drops/no-buffer drops) 0/0/0

cough...cough...That's a lot of output...what does it mean?

First, we need a little more information from the user. Were they having problems answering the phone, transferring the call to another number, or hanging up once the call was complete? If so, you would need to look into the call signaling queue. Was the call garbled, or was the audio cutting out during the call? In that case, you would need to look at the VOIP queue. Let's assume that the caller complained of call signaling symptoms. We would narrow our focus to the statistics relating to class CALLSIGNALING. I won't re-copy all of the output from class CALLSIGNALING, since it's so long, but scroll back up and look at the output starting with "Class-map CALLSIGNALING (match-any)." The first thing I would check is how many drops you've had in this queue. See the line that says, "(depth/total drops/no-buffer drops) 0/0/0?" That tells you that you have allocated adequate bandwidth for class CALLSIGNALING. You do not have any packets in the buffer waiting to be transmitted, and you have not dropped any packets since the last time the interface counters were cleared or since the service policy was instated (whichever was later). Also, look at this part of the output:
   dscp    Transmitted      Random drop      Tail drop    Minimum Maximum  Mark
           pkts/bytes       pkts/bytes       pkts/bytes    thresh  thresh  prob
   af31    1035/79610           0/0              0/0           32      40  1/10

"Random Drops" are where your random early detection scheduler begins dropping packets before the network becomes fully congested. "Tail Drops" are where your queue has already buffered all the packets it can hold, and new incoming packets are being discarded, since there is no room left on the queue to store them. In this case, both tail drops and random drops are zero, meaning that the QoS policy has not dropped any traffic with the AF31 marking (since that was all that we were marking with our INGRESS policy; typically, you would want to look at both CS3 and AF31 in a call signaling queue).

Since there are no dropped packets in the call signaling queue, you would need to look elsewhere for the problem (for example, look for dropped packets on the router on the other side of the circuit, or make sure that the router is recognizing traffic from the phone as belonging to class VOIP or class CALLSIGNALING, as appropriate). However, suppose you saw output like this:
        Class-map: CALLSIGNALING (match-any)
          2901880 packets, 192238996 bytes
          30 second offered rate 4000 bps, drop rate 0 bps
          Match: ip dscp cs3 (24) af31 (26)
            2901880 packets, 192238996 bytes
            30 second rate 4000 bps
          queue limit 64 packets
          (queue depth/total drops/no-buffer drops) 0/12155/0
          (pkts output/bytes output) 2889729/190911691
          bandwidth 320 kbps

In this case, you have dropped 12,155 packets(!) since the last time the counters were cleared (which was about four days ago, although you can't tell that from the output above). This means that you have not allocated adequate bandwidth to class CALLSIGNALING, and either 1) you need to allocate more bandwidth to handle the call volume, or 2) you have traffic on this circuit that is not being marked appropriately. On this router, I strongly suspect that there is traffic on the circuit that is being marked incorrectly, as we have already allocated 320Kbps to class CALLSIGNALING, which is a LOT of bandwidth for call signaling traffic.

1:Yes, it is silly to run OSPF on this router, since there are no other routers with which it can share routes. The only reason I configured OSPF in this lab is to show that the router does, indeed, automatically mark OSPF traffic with the CS6 DSCP marking.

