Moving to ZFS from Btrfs

Rule #4 states that failure will happen, therefore you should plan for the eventual reality. The Linux workstations I build and use (if I have any say about it) use at least 2 hard drives in some mirrored or otherwise redundant fashion. My current patternis to build workstations with a small (120G or there about) SSD drive as the boot drive that contains my OS install and swap space. /home, scratch, and possibly other areas are mounted from a 2 disk mirrored array of spinning rust. I’ve spent years on Linux’s software raid (md) setups, and the last 5+ years on Btrfs.

My workstation at home had a 7 year old disk paired with a 4+ year old disk drive. With Btrfs this was my primary storage filesystem. Good life expectancy for spinning rust is about 3 years. So I picked up a couple new Seagate BarraCuda drives to replace the existing drives before reality had a chance to ruin my day.

It was at this upgrade I knew I wanted to switch to ZFS. I’ve been running ZFS on other workstations and with a large capacity Graphite cluster and had a lot of success. As well as just enough failures to know how ZFS handled losing a drive. With Red Hat dropping Btrfs support from RHEL, no one using it in production seemingly (other than SuSE), Docker adopting ZFS as a storage backend, and the lack of progress on major features (encryption, RAID 5/6) I’ve become more convinced ZFS is the correct solution. In a way I’m saddened, because I believe the way to make Open Source solutions better is to, well, use them. But Btrfs, while still very actively moving, just doesn’t seem to be moving in the direction I need.

My upgrade plan was this:

Replace one drive in the mirrored pair with a new HDD.
Boot, create a zpool on the new drive, create ZFS filesystems for my users and scratch space.
Use rsync to copy data from the Btrfs filesystem to the ZFS filesystems.
Replace the last drive with a new HDD.
Attach the last new drive to the zpool and convert it to a mirror.

No plan survives contact with the enemy, and Step #2 is where things went awry. My Ubuntu Xenial machine dumped me into emergency mode (without a password even!) as it could not mount /home or /srv from the Btrfs array. I removed these from /etc/fstab and was able to boot into graphical mode like normal, albeit without my home directory. mount /dev/sdc /mnt failed with no useful errors. However, dmesg reported the following:

BTRFS: failed to read the system array on sdc
BTRFS: open_ctree failed

However, btrfs filesystem show did appear to see a filesystem available on sdc. So I Googled. I discovered that there’s a trove of mount options for Btrfs that don’t seem to be documented well or at least together. At the suggestion of some old forums I tried:

# mount -odegraded /dev/sdc /mnt

This worked. (Although, I should have mounted read only.) This is also a prime example of why I’m moving away from Btrfs. How is this behavior an acceptable way to communicate to the administrator that Btrfs is refusing to mount a degraded array? ZFS would have told me just that and how to override that safety feature in its messaging.

With that solved, I moved on to making a fresh zpool on sdb, my new drive.

# zpool create -f ordinary \
    /dev/disk/by-id/ata-ST2000DM006-2DM164_XXXXXXXX

I usually name the first zpool after the name of the machine rather than Matrix references. I also always use the /dev/disk/by-id/ata- name of the device. These names don’t reshuffle if the number or order of drives changes. This ID includes the serial number of my drive so I can match them up with the physical label on the drive if I have to.

Next, create filesystems:

# zfs create ordinary/slack
# zfs create ordinary/slack/Music
# zfs create ordinary/scratch

Next, I used rsync to copy the data from /mnt to my ZFS filesystems. This was slower than I expected. But a few hours later had my data safely in ZFS-land. I shutdown the machine, and replaced sdc with the second new HDD. I rebooted and ZFS was mounted with everything in place. The new drive wasn’t otherwise interfering with the system.

Next, attach the second disk to the zpool:

# zpool status
  pool: ordinary
 state: ONLINE
  scan: none requested
config:

	NAME                               STATE     READ WRITE CKSUM
	ordinary                           ONLINE       0     0     0
	  ata-ST2000DM006-2DM164_XXXXXXXX  ONLINE       0     0     0

errors: No known data errors
# zpool attach ordinary \
    ata-ST2000DM006-2DM164_XXXXXXXX \
    /dev/disk/by-id/ata-ST2000DM006-2DM164_YYYYYYYY

This took a couple tries to get the incantation correct, but ZFS’s error messages were surprisingly helpful. This command attaches the new device YYYYYYYY to the existing device XXXXXXXX in the zpool ordinary. This began resilvering immediately.

# zpool status
  pool: ordinary
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jul  7 00:11:06 2018
    792M scanned out of 355G at 52.8M/s, 1h54m to go
    788M resilvered, 0.22% done
config:

	NAME                                 STATE     READ WRITE CKSUM
	ordinary                             ONLINE       0     0     0
	  mirror-0                           ONLINE       0     0     0
	    ata-ST2000DM006-2DM164_XXXXXXXX  ONLINE       0     0     0
	    ata-ST2000DM006-2DM164_YYYYYYYY  ONLINE       0     0     0  (resilvering)

errors: No known data errors

In 1 hour and 2 minutes (faster than the rsync) the zpool had completed resilvering. Just to be safe I initiated a scrub operation.

# zpool scrub ordinary
...
# zpool status
  pool: ordinary
 state: ONLINE
  scan: scrub repaired 0 in 0h46m with 0 errors on Sun Jul  8 01:10:56 2018
config:

	NAME                                 STATE     READ WRITE CKSUM
	ordinary                             ONLINE       0     0     0
	  mirror-0                           ONLINE       0     0     0
	    ata-ST2000DM006-2DM164_XXXXXXXX  ONLINE       0     0     0
	    ata-ST2000DM006-2DM164_YYYYYYYY  ONLINE       0     0     0

errors: No known data errors

Finally, I set the mountpoints where I wanted my home directory to be mounted.

# zfs set mountpoint=/home/slack ordinary/slack

Rebooting, I had a functional system, and, most importantly, my home directory back! Next, I cloned zfs-auto-snapshot and made sure I had daily snapshots enabled. Hour and daily snapshots of your home directory is really the best feature ever. It should have worked well on Btrfs, but I was a little nervous to try it there. To complete the exercise, I ran my Borg backups by hand and confirmed they still worked.

LinuxCzar

Moving to ZFS from Btrfs