2015년 4월 22일 수요일

Reconfiguring multipath devices on a production system

About two weeks ago, I was working my way down a server health checklist during the 2 ~ 5 A.M. maintenance window on an early Tuesday morning. Among four application servers running RHEL 6.4, one of them showed an incorrect number of LUN's (Logical Unit Number) from the 3PAR SAN connected over fibre channel (fc).

Since there were 8 disks in the array on the SAN and 8 fc paths from SAN-to-switch and switch-to-server, 64 LUN's should have appeared when I invoked multipath -ll (or more accurately, multipath -ll | grep sd | wc -l), but instead only 38 showed up.

Inside /etc/multipath.conf there is a blacklist {} section containing wwid's for the devices multipathd is supposed to ignore, like local disks. In addition to blacklisting individual devices by wwid, you can also specify classes of devices that multipathd should ignore by using the devnode keyword. Unfortunately, there was a problem with one particular devnode blacklist:

devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"

Can you find a problem with the above device blacklist statement?

Earlier I mentioned that an 8-disk array was connected to a Linux server by fc cables constituting 8 paths. This means that 64 LUN's / multipath devices should appear on the server. Assuming that the local disks are /dev/sda, /dev/sdb, /dev/sdc, and /dev/sdd, the SAN multipath devices with 64 LUN's will take up the device names /dev/sde through /dev/sdbi

The incorrect devnode blacklist above will correctly blacklist /dev/sda which is a local disk, but it will incorrectly exclude /dev/sdaa, /dev/sdab, ... /dev/sdaz
which comes to 26 multipath devices. Therefore only 64 - 26 = 38 LUN's were appearing.

After removing 'sda' from the above devnode blacklist statement, I had to notify our client that they would have to stop any applications running on the SAN multipath devices so that I could unmount the partitions on the SAN disks.

After the dev team brought their applications down, I tried to umount the partitions, but umount complained the devices were still busy. Apparently there were still some leftover threads locking files on the SAN devices. To find the PID's of the processes with files open on the SAN disks, I used lsof /foo which should return something like the following:

# lsof /foo
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
bash    34266 user1  cwd    DIR 253,14     4096 376176642 /foo/my_app/data01
...

The PID can be seen in the second field of the output lines. Simply kill -15 34266 (or kill -9 ... if that doesn't work) and retry umount /foo

Despite unmounting all the mount points on the multipath devices, when I tried to flush all unused multipath device maps with multipath -F to proceed with reconfiguring the device mappings, I got the following:

# multipath -F
Apr 07 05:44:24 | mpathh: map in use
Apr 07 05:44:24 | mpathg: map in use
Apr 07 05:44:24 | mpathf: map in use
Apr 07 05:44:24 | mpathe: map in use
Apr 07 05:44:24 | mpathd: map in use
Apr 07 05:44:24 | mpathc: map in use
Apr 07 05:44:24 | mpathb: map in use
Apr 07 05:44:24 | mpatha: map in use

Although the mountpoints on the multipath devices are no longer in use, if there are LVM partitions on the SAN disks, the volume group and logical volumes are probably still active. Although you can deactivate each LV individually with lvchange -an /dev/VGname/LVname it is easier to just deactivate the entire VG residing on the SAN disks using vgchange -an VGname

Now I can flush the multipath config data with multipath -F, reload the changes in /etc/multipath.conf with service multipathd reload and finally restart the multipath daemon with service multipathd restart

Now I want to check if 64 LUN's show up. I type multipath -v2 to enable maximum verbose output, then retry multipath -ll

Now that all 64 LUN's appear over 8 paths, it is time to reactivate the Volume Group with vgchange -ay VGname and to remount the file systems.

*Note: My fellow engineers sometimes use dmsetup remove dm-name to flush the device-mapper cache if vgchange -an fails to deactivate an LVM Volume Group. Apparently using dmsetup remove does not delete any data; all partitions are still intact on disk, but it allows you flush device-mapper data. This is useful if multipathd complains that device maps are still in use on disks that you want to reconfigure for multipathing.

Using this method, once you have fixed the multipath setup, you must then recreate the PV, VG, and LV's exactly the same as the original except you must not run mkfs on the "new" LV's, as the old data is still intact on disk. Simply remount the LV's and everything should be OK.