NFS Failover with DRBD and Heartbeat on AWS EC2
Setup NFS Failover with DRBD and Heartbeat on AWS EC2
This tutorial will help to configure NFS Failover using DRBD and Heartbeat on AWS EC2. There are multiple tutorials on implementing the above, but not on AWS EC2.
NOTE: Run the commands/scripts on BOTH SERVERS (Primary and Secondary), otherwise mentioned explicitly.
Requirements
- VPC with a public subnet and Internet gateway.
- IAM Role (DescribeInstances and AssignPrivateAddress) for EC2
- 2 x Ubuntu Instances (for NFS Primary and Secondary)
- 1 x Ubuntu Instance (for NFS Client; testing)
- 3 Elastic IPs (for Primary, Secondary and Virtual IP)
Create a VPC
Follow this guide: HERE
Create a Security Group
Follow this guide: HERE
Create an Amazon EC2 IAM Role with the following policy
1 | { |
Launch Two Ubuntu EC2 instances into Your VPC’s Public Subnet
- Assign EC2 IAM Role created above to the instance (do not skip this or forget)
- Assign Private IPs (Primary: 192.168.0.10, Secondary: 192.168.0.11, Virtual IP: 192.168.0.12)
- Assign the Security Group created above.
- Configure Elastic IP Addresses for Your Instances.
Create the vipup script
Make changes according to the server (don’t just copy paste like a kid)
1 |
|
Mark it executable:1
$ chmod +x vipup
Additional Steps to configure on servers:
Add Route Table for virtual IP:
1
$ echo "2 eth1_rt" >> /etc/iproute2/rt_tables ;
Add
eth0:0
interface and copy the below configuration so that it is available on AWS internal network:1
2
3
4
5
6$ vim /etc/network/interfaces.d/eth0:0.cfg
auto eth0:0
iface eth0:0 inet dhcp
up ip route add default via 192.168.0.1 dev eth0:0 table eth1_rt
up ip rule add from 192.168.0.12 lookup eth1_rt prio 1000Edit hosts file to enter details about Primary and Secondary Servers
1
2
3
4$ vim /etc/hosts
192.168.0.10 primary.nfs.server
192.168.0.11 secondary.nfs.serverEdit the hostname to reflect on the server:
1
2
3
4$ vim /etc/hostname
primary.nfs.server (on Primary Server)
secondary.nfs.server (on Secondary Server)Set the hostname (and logout, login)
1
$ hostname -F /etc/hostname
Update the repositories and install aws tools:
1
2
3$ apt-add-repository ppa:awstools-dev/awstools
$ apt-get update
$ apt-get install ntpdate tzdata ec2-api-tools ec2-ami-tools iamcli rdscli moncli ascli elasticache awscliTest:
1
$ ec2-describe-instances
or
1
$ aws ec2 describe-instances
Update the time for proper synchronization (very important):
1
$ ntpdate -u in.pool.ntp.org
Install DRBD:
Update the repositories and Install the DRBD software (reboot required)
1 | $ apt-get update |
Setup DRBD for a particular device (Example: /dev/xvdb):
- To configure drbd, edit
/etc/drbd.conf
and change global { usage-count yes; } to no (Ignore if already changed) Create a resource file
r0.res
in/etc/drbd.d/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42$ vim /etc/drbd.d/r0.res (edit as required)
resource r0 {
net {
#on-congestion pull-ahead;
#congestion-fill 1G;
#congestion-extents 3000;
#sndbuf-size 1024k;
sndbuf-size 0;
max-buffers 8000;
max-epoch-size 8000;
}
disk {
#no-disk-barrier;
#no-disk-flushes;
no-md-flushes;
}
syncer {
c-plan-ahead 20;
c-fill-target 50k;
c-min-rate 10M;
al-extents 3833;
rate 100M;
use-rle;
}
startup { become-primary-on master ; }
protocol C;
on master {
device /dev/drbd0;
disk /dev/xvdb;
meta-disk internal;
address 192.168.0.10:7801;
}
on slave {
device /dev/drbd0;
disk /dev/xvdb;
meta-disk internal;
address 192.168.0.11:7801;
}
}The file r0.res should be same on both the servers.
Now using the drbdadm utility initialize the meta data storage. On each server execute:
1
$ drbdadm create-md r0
Next, on both hosts, start the drbd daemon:
1
$ /etc/init.d/drbd start
On Primary Server, run:
1
$ drbdadm -- --overwrite-data-of-peer primary all
After executing the above command, the data will start syncing with the Secondary server. To watch the progress, on Secondary Server enter the following:
1
$ watch -n1 cat /proc/drbd
Finally, add a filesystem to /dev/drbd0 and mount it:
1
2
3$ mkfs.ext4 /dev/drbd0
$ mkdir /drbd
$ mount /dev/drbd0 /drbd
Install Heartbeat (for node failure detection)
Install heartbeat:
1
$ apt-get install heartbeat
Edit the ha.cf file under /etc/ha.d/:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25$ vim ha.cf
# Give cluster 30 seconds to start
initdead 60
# Keep alive packets every 1 second
keepalive 1
# Misc settings
traditional_compression off
deadtime 60
deadping 60
warntime 5
# Nodes in cluster
node primary.nfs.server secondary.nfs.server
# Use logd, configure /etc/logd.cf
use_logd on
# Don't move service back to preferred host when it comes up
auto_failback off
# Takover if pings (above) fail
respawn hacluster /usr/lib/heartbeat/ipfail
##### Use unicast instead of default multicast so firewall rules are easier
# primary
ucast eth0 192.168.0.10
# secondary
ucast eth0 192.168.0.11
bcast eth0Edit haresources file for heartbeat to use:
1
2
3$ vim haresources
primary.nfs.server drbddisk::r0 Filesystem::/dev/drbd0::/drbd::ext4 vipup nfs-kernel-serverEdit the authkeys file (for authentication between nodes):
1
2
3
4
5$ vim authkeys
# Automatically generated authkeys file
auth 1
1 sha1 1a8c3f11ca9e56497a1387c40ea95ce1or, generate the file from below command:
1
2
3
4
5cat <<EOF > /etc/ha.d/authkeys
# Automatically generated authkeys file
auth 1
1 sha1 `dd if=/dev/urandom count=4 2>/dev/null | md5sum | cut -c1-32`
EOFEnable logging for Heartbeat:
1
2
3
4
5$ vim /etc/logd.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
syslogprefix linux-haCreate softlink of nfs-kernel-server in /etc/ha.d/resource.d/ folder:
1
$ ln -s /etc/init.d/nfs-kernel-server /etc/ha.d/resource.d/
Add fstab entries for auto mounting device using heartbeat:
1
2# DRBD, mounted by heartbeat
/dev/drbd0 /drbd ext4 noatime,noauto,nobarrier 0 0
Configure NFS Exports
Edit the exports file:
1
2
3$ vim /etc/exports
/drbd 192.168.0.0/24(rw,async,no_subtree_check,fsid=0)Export the configuration to the network:
1
$ exportfs -a
NFS Client:
Install NFS packages on client so that exports can be mounted:
1
$ apt-get install nfs-common
Check for mounts available:
1
$ showmount -e 192.168.0.12
Mount the shared folder on the server:
1
2
3$ mkdir /drbd
$ mount -vvv 192.168.0.12:/drbd/ -o nfsvers=3,rsize=32768,wsize=32768,hard,timeo=50,bg,actimeo=3,noatime,nodiratime,intr /drbd/
$ df -hStart copying data (for testing):
1
$ rsync -av --progress --append --bwlimit=10240 /drbd/A_BIG_FILE /tmp/
Testing NFS Fail-over:
- From another system, mount the NFS share from the cluster
- Use rsync –progress -av to start copying a large file (1-2 GB) to the share.
- When the progress is 20%-30%, stop heartbeat service on primary (or turn off/reboot instance)
- Rsync will lock up (as intended) due to NFS blocking.
- After 5-10 seconds, the file should continue transferring until finished with no errors.
- Do an md5 checksum comparison of the original file and the file on the NFS share.
- Both files should be identical, if not, there was corruption of some kind.
- Try the test again by reading from NFS, rather than writing to it
References:
- https://www.drbd.org/en/doc/users-guide-83/s-heartbeat-r1
- https://www.howtoforge.com/high_availability_heartbeat_centos#configuration
- https://www.howtoforge.com/high-availability-nfs-with-drbd-plus-heartbeat#heartbeat
- https://aws.amazon.com/articles/2127188135977316
- https://www.vivaolinux.com.br/artigo/Instalando-DRBD-+-Heartbeat-no-Debian-6?pagina=3
- http://linuxmanage.com/fast-failover-configuration-with-drbd-and-heartbeat-on-debian-squeeze.html