Sunday, October 23, 2016

Lesson 16: VRRP

In the last lesson that I wrote while working on my CCNA certification, I introduced the concept of router redundancy via a Cisco proprietary protocol known as HSRP, or "Hot Standby Router Protocol." However, HSRP is not the only way to create a redundant data connection for your office. In this lab, we'll look at a second, similar protocol known as VRRP, or "Virtual Router Redundancy Protocol."

Disclaimer:
The configuration document I used to play with VRRP in this lab didn't work exactly as advertised on the routers I was emulating. In fairness, Cisco 3640 routers are decidedly, ummmm, "old-school" (read that: obsolete), so it's entirely possible that the syntax has changed on more modern platforms that are running more recent versions of IOS. However, what I present here should be close enough to get you started. Here (pdf) is the link to the Cisco document with the slightly different syntax.

As usual, we'll start with the network diagram:


We'll set up lo0 and fa1/0 on R1 and R2 as normal, R4 exists only to act as a DHCP server, and R3 serves as a destination network provider. We'll establish OSPF between R1, R2 and R3, using network statements for 100.64.1.0/30 and 100.64.2.0/30 and using "redistribute connected subnets." On our client, "Knoppix Clone 1," we'll set the default gateway to 100.64.0.1/29. So far, nothing unexpected, right?

Just to recap, the problem we want to solve is, what happens when fa0/0 goes down on our default gateway? If R2 did not exist in this network, then R1 is our single point of failure. If we lose R1, then the clients on our LAN can no longer reach the servers hanging off of R3. To address this, we set up two routers in parallel so that we have a redundant path to R3. However, there is no way to tell a client PC (or router or...) to use multiple default gateways. HSRP and VRRP were designed to address this problem. In both scenarios, you configure a single default gateway on your client network, then use either HSRP or VRRP to shuffle that default gateway address between multiple routers. To set it up, you...:

  1. Enter configuration mode;
  2. Switch to the interface facing your client LAN;
  3. Add an IP address within the subnet of your client LAN;
  4. Configure a meaningful description of the VRRP group;
  5. Configure the client's default gateway address in the VRRP group;
  6. Set the VRRP priority for the router (a higher value takes priority over a lower value);
  7. Set the VRRP advertisement and preempt delay timers.


Here's how the configuration looks on R1...:
interface FastEthernet0/0
ip address 100.64.0.2 255.255.255.248
vrrp 10 description VRRP Group
vrrp 10 ip 100.64.0.1
vrrp 10 preempt delay minimum 3
vrrp 10 priority 254
end

...and on R2:
interface FastEthernet0/0
ip address 100.64.0.3 255.255.255.248
vrrp 10 description VRRP Group
vrrp 10 ip 100.64.0.1
vrrp 10 preempt delay minimum 3
vrrp 10 priority 128
end


NOTE:
I also added the following line to the config...:
R1(config-if)#vrrp 10 timers advertise 1

...to set VRRP to send an "advertisement" every second. However, this is the default behaviour for VRRP, and therefore, it didn't show up in the config until I changed it for testing. Anyway, does it work?

Let's traceroute from R4 to R3:
R4#traceroute 10.254.254.3

Type escape sequence to abort.
Tracing the route to 10.254.254.3

  1 100.64.0.2 4 msec 4 msec 4 msec
  2 100.64.1.2 12 msec 8 msec 8 msec
R4#

Now, if I shut down fa0/0 on R1, I should see a short interruption in service, followed by R2 picking up the traffic:
R4#ping 10.254.254.3 repeat 100

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 10.254.254.3, timeout is 2 seconds:
!!!!!!!!!!!!!..!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 97 percent (97/100), round-trip min/avg/max = 4/11/36 ms
R4#traceroute 10.254.254.3

Type escape sequence to abort.
Tracing the route to 10.254.254.3

1 100.64.0.3 8 msec 4 msec 8 msec
2 100.64.2.2 8 msec 12 msec 8 msec
R4#

Notice how the first hop originally was 100.64.0.2, but now it's 100.64.0.3? VRRP has failed over the virtual 100.64.0.1 IP address from R1 to R2, which is reflected in the traceroute output.

With a little effort, we can see what's happening at the Ethernet level, too, and it's even more interesting. We'll start by verifying our configuration. R1 is currently the VRRP master (i.e., it's hosting the IP address 100.64.0.1), and R2 is the backup:
R1#sho vrrp brief
Interface          Grp Pri Time  Own Pre State   Master addr     Group addr
Fa0/0              10  254 3007       Y  Master  100.64.0.2      100.64.0.1    
R1#
----------------------------------------------------------------------------

R2#sho vrrp brief
Interface          Grp Pri Time  Own Pre State   Master addr     Group addr
Fa0/0              10  128 3500       Y  Backup  100.64.0.2      100.64.0.1    
R2#

First, we'll clear the arp table on Knoppix Client 1:

Now, we'll ping 10.254.254.3 (lo0 on R3) from Knoppix Clone 1:

Next, we shut down fa0/0 on R1, and verify that R2 is now the VRRP master:
R1#sho vrrp brief
Interface          Grp Pri Time  Own Pre State   Master addr     Group addr
Fa0/0              10  254 3007       Y  Init    0.0.0.0         100.64.0.1    
R1#

----------------------------------------------------------------------------

R2#sho vrrp brief
Interface          Grp Pri Time  Own Pre State   Master addr     Group addr
Fa0/0              10  128 3500       Y  Master  100.64.0.3      100.64.0.1    
R2#

...and ping R3 again:

Hmmm...since the VRRP virtual MAC address moves with the router, that doesn't give us much insight into what was actually happening here. Fortunately, I was running tcpdump to capture the Ethernet frames while running this test. After exporting the PCAP file to Wireshark, we can get a little better understanding of what happened here.

Note:
To keep the Wireshark screen captures relevant, I filtered out some of the chatter. We configured VRRP to send advertisements every second, for example, so I filtered out the VRRP protocol data. These routers were also running CDP, so I filtered that as well.

At the very beginning of the capture, we can see Knoppix Client 1 ("CadmusCo_d3:7c:8f") send an arp request for 100.64.0.1, and we can see R1 (cc:00:2a:af:00:00) send an arp reply, stating that the VRRP virtual MAC address "00:00:5e:00:01:0a" is associated with 100.64.0.1:

Then, we ping R3 through R1. As you can see, we sent the ICMP request to the VRRP virtual MAC...:

...but received the reply from the MAC of fa0/0 on R1 (cc:00:2a:af:00:00):

At this point, we shut down fa0/0 on R1, and allowed R2 to take ownership of 100.64.0.1. Since VRRP also transports the virtual MAC, our next ping will still be sent to 00:00:5e:00:01:0a...:

...but this time, our reply has come from the MAC address of fa0/0 on R2 (cc:01:2a:af:00:00):

Because the MAC address doesn't change, we don't have to wait until the arp cache on connected devices times out for traffic to use the new path. This can be a serious problem, in some cases. For example, if you are using a Cisco ASA to connect to a (non-VRRP) "highly-available" system, the default arp cache timeout period is FOUR HOURS, which means it can take up to four hours for your "highly-available" (cough) system to recover from a failover! This isn't just an academic, theoretical point, either. I am currently working a trouble ticket in my day job where this is exactly what's happening. Unfortunately, just shortening the arp cache timeout period can drive up CPU load and memory requirements on busy devices, so there is a balancing act to be found between automatic fail-over times and system resource utilization. VRRP neatly solves that problem by sidestepping the whole issue.

No comments:

Post a Comment