Keepalived - A documentation nightmare

2018-03-06

Words: 1244
Reading time: 6 min.

I am writing this article with contradictory feelings, Am I having a bad day on top of The Oracle at Google not giving me answers, or is the documentation of the Keepalived software totally outdated and old?

We are installing a new log receiver for our ELK infrastructure and one of the components in the new system is a load balancer using HAProxy in a cluster configuration for redundancy. At first we considered using pacemaker/corosync to implement the high-availability of HAProxy but you need an extra subscription to use these products with Red Hat Enterprise Linux Server 7.4, so we chose keepalived insteed of pacemaker/corosync. All of them have been around for a long time and they do their job although the level of difficulty when configuring them is different.

My odyssey began when I needed some information about how to configure keepalived with both IPV4 and IPV6 addresses. An easy task in theory turned into a couple of hours of tests and failures and reading forum entries in the most remote corners of the internet. Neither the official documentation of Keepalived nor the RedHat documentation of the product had the information needed to do what I wanted to do. What we needed was:

Two HAProxies for TCP load balancing, one active and one standby ready to take over if the active one fails.
Keepalived using the “Virtual Redundancy Routing Protocol (VRRP)” to perform failover tasks between the two HAProxies.
IPV4 and IPV6 “Virtual IP addresses” (VIP)
The standby HAProxy server should take over the VIP addresses if the active node goes completely down, or if the haproxy service goes down on the active node.

This was going to be a piece of cake after reading the “Red Hat Enterprise Linux - Load Balancer Administration” documentation. We could use “vrrp_script”, “vrrp_instance” and “track_script” to configure keepalived in the way we wanted and Keepalived would take care of moving our VIPs between the HAProxies if the haproxy service went down.

vrrp_script chk_haproxy {

  # Check if the haproxy process is running
  script "killall -0 haproxy"

  # Check every 2 seconds
  interval 2

  # Add 2 points to priority if OK
  weight 2
}


vrrp_instance VI_1 {

  # interface to monitor
  interface ens192

  virtual_router_id 1
  priority 101

  virtual_ipaddress {

    10.1.1.100 # VIP-1 ipv4
    10.1.1.101 # VIP-2 ipv4

    0:0:0:0:0:FFFF:0A01:0164 # VIP-3 ipv6
    0:0:0:0:0:FFFF:0A01:0165 # VIP-4 ipv6
  }

  track_script {
    chk_haproxy
  }
}

First surprise, only the IPV4 VIPs got configured, no trace of the IPV6 addresses when we started keepalived.

[root@elk-lb01]# ip addr sh ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
qlen 1000
    link/ether 00:50:54:a8:b5:9b brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.10/24 brd 10.1.1.255 scope global ens192
       valid_lft forever preferred_lft forever
    inet 10.1.1.100/32 scope global ens192
       valid_lft forever preferred_lft forever
    inet 10.1.1.101/32 scope global ens192
       valid_lft forever preferred_lft forever

This simple configuration did not work with the version of keepalived (1.3.5-1) we were using from the official RedHat repository. The reason, you cannot have IPV4 and IPV6 addresses in the same virtual_ipaddress{} block. I found out in the keepalived github repository why this did not work after googling about it.

“…. It is not possible to configure both IPv4 and IPv6 addresses as virtual_ipaddresses in a single vrrp_instance; the reason is that the VRRP protocol doesn’t support it ….. Although earlier versions of keepalived didn’t complain if IPv4 and IPv6 addresses were both configured, it didn’t work properly. …..”

The same page had this tip:

“…. If you need to associate both IPv4 and IPv6 addresses with a single vrrp_instance, then configure the addresses of one family in a virtual_ipaddress_excluded block. Probably a better solution than using a virtual_ipaddress_excluded block is to configure two vrrp instances, one for IPv4 and one for IPv6 ….”

It could not be that difficult, two vrrp instances, one for IPv4 and one for IPv6:

vrrp_script chk_haproxy {
  # Check the haproxy process
  script "killall -0 haproxy"

  # Check every 2 seconds
  interval 2

  # Add 2 points to priority if OK
  weight 2
}


vrrp_instance VI_1 {

  # interface to monitor
  interface ens192

  virtual_router_id 1
  priority 101

  virtual_ipaddress {

    10.1.1.100 # VIP-1 ipv4
    10.1.1.101 # VIP-2 ipv4
}

  track_script {
    chk_haproxy
  }
}

vrrp_instance VI_2 {

  # interface to monitor
  interface ens192

  virtual_router_id 2
  priority 101

  virtual_ipaddress {

    0:0:0:0:0:FFFF:0A01:0164 # VIP-3 ipv6
    0:0:0:0:0:FFFF:0A01:0165 # VIP-4 ipv6
  }

  track_script {
    chk_haproxy
  }
}

Well, this configured both IPV4 and IPV6 addresses but the 2 keepalived instances configured the two IPV6 addresses at the same time in the two servers, so of course the log files started screaming about it:

ICMPv6: NA: someone advertises our address 0:0:0:0:0:FFFF:0A01:0164 on ens192!

Probably this was my fault, because I did not use “vrrp_sync_group” when having two “vrrp_instance”. So I added this block to the configuration:

vrrp_sync_group VI_1_2 {
 group {
   VI_1
   VI_2
 }
}

The only problem with this was that when using “vrrp_sync_group” you cannot use “track_script” in your “vrrp_instance” to find out if haproxy is running or not, so the VIPs will not get moved to the standby node if haproxy stops working. By the way, the documentation says nothing about it, I found out this because another error in the log file:

VRRP_Instance(VI_1) : ignoring tracked script with weights due to SYNC group
VRRP_Instance(VI_2) : ignoring tracked script with weights due to SYNC group

What if I moved the “track_script” from the “vrrp_instance” blocks to the “vrrp_sync_group” block? Well that did not work because “vrrp_sync_group” does not support “track_script” blocks. So I was in square one again.

At this point I started losing confidence in myself, was it me? what was I missing? How could it be possible that a software that had been widely used for years in production did not have a proper documentation?. My last chance, What about trying the other tip I found in Github?:

“…. configure the addresses of one family in a virtual_ipaddress_excluded block ….”

t did not sound logical at first if you asked me. But “virtual_ipaddress_excluded” contains a list of IP addresses that keepalived will bring up and down on the server, however they are not included in the VRRP packet itself, that is the meaning of this parameter, excluded from the VRRP packet but moved anyway. Well, I tried:

global_defs {
 router_id elk1
}

vrrp_script chk_haproxy {
  # Check the haproxy process
  script "killall -0 haproxy"

  # Check every 2 seconds
  interval 2

  # Add 2 points to priority if OK
  weight 2
}


vrrp_instance VI_1 {

  # interface to monitor
  interface ens192

  virtual_router_id 1
  priority 101

  virtual_ipaddress {

    10.1.1.100 # VIP-1 ipv4
    10.1.1.101 # VIP-2 ipv4
  }

  virtual_ipaddress_excluded {

    0:0:0:0:0:FFFF:0A01:0164 # VIP-3 ipv6
    0:0:0:0:0:FFFF:0A01:0165 # VIP-4 ipv6
  }

  track_script {
    chk_haproxy
  }
}

Yes, it worked. Both IPV4 and IPV6 addresses got moved to the standby server when the active server went down, or if haproxy/keepalived stopped working in the active server. I could not believe it but everything was working properly at last.

To finish I want to thank the developers of keepalived for developing and maintaining this software that works without problems when you configure it right, but the documentation in the official website of the project really has to be updated to avoid high levels of frustration when trying to configure the software.

PS.- I have found an updated documentation file in a github repository that would really had made a big difference to me when I was trying to configure my case: https://github.com/acassen/keepalived/blob/master/doc/keepalived.conf.SYNOPSIS

Enjoy it.

Links: