Altering KVM Virtual Disk Images

I wanted to alter a file that was a disk image for a KVM virtual machine.  With a physical machine I use dd fairly often to save and alter the partition table and boot loader.  I wanted to do that to a KVM image. The problem being that when you use dd to write to a file, when dd is done it truncates the file. So I would lay down a new partition table and boot loader on my KVM image and find the image was now only a few kilobytes long. It used to be 20 gigabytes in size!

To do this we need to use the loop device:

# dd if=/dev/zero of=/var/lib/libvirt/images/foo.img bs=1024 count=20971520
# losetup /dev/loop0 /var/lib/libvirt/images/foo.img
# dd if=/tmp/bar.img of=/dev/loop0

Now you can use fdisk or other tools to examine the hard disk image on /dev/loop0. When you are done, tear down the loopback.

# losetup -d /dev/loop0

Now boot your KVM.

Why do I like to do this? Think about automated ways to install DBan. Or to laydown a gPXE bootloader to reinstall or reprovision the machine.  Fixing a corrupt partition table or MBR.  Doing things this way allows for a high level of automation.

Bad Experiences With Fedora

I normally run Fedora on my personal systems at home.  I usually enjoy it and it helps keep me up to date with all the new toys that will eventually be a part of the RHEL machines I sysadmin as a professional.  I’ve been running Fedora 15 and had switched to XFCE as the new Gnome 3 user interface and I didn’t get along.  It was past time I updated to Fedora 16.

I’m a professional systems administrator.  If there is one thing that has taught me and I’d like to teach everyone else is that all hard drives fail.  Not if, but when.  So most machines I use (save for laptops) have 2 hard drives installed using Linux’s software RAID 1 to mirror them.  (Not two identical hard drives either.  Find ones that are different.)  Needless to say, my workstation at home is configured this way.  The /boot partition is md0 and everything else runs in LVM on top of md1.

I installed Fedora 16 on my workstation after backing up my data.  I normally do a clean install and reformat everything accept for the /home logical volume.  I get a brand new system and don’t lose my data.  Everything appeared to go well during the install.  When I rebooted I was greeted with an unfriendly Grub2 Rescue Mode.  The new boot loader couldn’t boot my system.

I’m quite familar with the older Grub and using its shell mode to recover my system.  Boy was I in for a suprise.  Grub 2′s command shell is completely different.  Unequipped with a “help” or “?” command to boot!  At this point the Grub2 rescue shell has 4 commands: ls, set, unset, and insmod.  Helpful isn’t it?

I was already using Google (from my smart phone).  There’s not a lot of quality documentation about recovering a system with Grub2.  There are quite a lot of Ubuntu articles.  This is a big problem.  A big problem in that Grub2 should have more visible documentation and a big problem that Fedora should have more visible documentation.

Learn More About Grub2 Rescue Mode

Running ls showed me the problem after I had figured out how Grub2 was working.  It only listed “(hd0) (hd1)“.  Grub didn’t see the /boot partitions because they were RAID 1 partitions.

Turns out Fedora has never “supported” /boot on a RAID 1 device.  Its worked for years and allowed me to recover broken systems many times.  The Fedora 16 installer does not have the Grub modules loaded to support Linux Software RAID devices.  (Yes, RAID 1 which you can mount one of the mirrors as normal ext4 filesystem in a pinch.)  There is a warning buried in my install logs that Grubby didn’t complete but the install itself had no errors.  Needless to say I am disappointed in many ways in the Fedora project.

Fortunately, the above Fedora 16 Common Bugs page had most of the solution.  After the install was complete and Grub2 ends up in the rescue shell this is how to recover your system.

  1. Boot the Fedora 16 install media and use its Rescue Mode.
  2. Tell the rescue mode to find and mount up your existing Linux system.  In my case the rescue mode couldn’t find my system.  I dropped down to the shell and ran
    # mdadm --assemble --scan
    # vgchange -ay

    This loaded up my Software RAID devices and the LVMs on top of them. Mount them properly under /mnt/sysimage.

  3. You need to setup a chroot to run the Grub2 install program correctly.  Because the automated rescue didn’t locate my system and I mounted it my hand, I needed to create enough nodes in /mnt/sysimage/dev so I could continue.
    # cp -a /dev/* /mnt/sysimage/dev/
  4. Do the chroot.
    # chroot /mnt/sysimage
  5. Add the following lines to /boot/grub2/grub.cfg:
    insmod raid
    insmod mdraid09
    insmod mdraid1x

    At the top of the file is fine.

  6. Now run the following commands:
    # grub2-install /dev/sda
    # grub2-install /dev/sdb

    At this point you should see a successful install of Grub 2 on both mirrors of the RAID 1. You should be able to reboot and have the system come up from the hard disks.

IPTables: The MARK Target

Load balancing for High Availability and Disaster Recovery with LVS and Keepalived is fun, and quite powerful.  One of the most useful aspects is that you can use IPTables with the MARK target to select what traffic is routed to a set of real servers.  Its a lot more powerful than simple IP or IP/port combinations.

For example, a specific service may have a web site as well as another protocol.  Printing uses the IPP protocol and we have a web site documenting our printing system.  With the above trick you can create one virtual IP and have web traffic directed to a pool of web servers doing virtual hosting of many sites.  IPP traffic on a different port gets routed to a pool of Cups servers that do not maintain any web infrastructure.  End users only have to remember one DNS name.

However, remember that the MARK target is one of IPTables’ non-terminating-targets.  It doesn’t stop the packet from being processed by later iptables rules and possibly other MARK targets.  So your iptables rules need to be in order from least specific to most specific.  With the above example, let’s say all traffic goes to the Cups pool and only web traffic gets redirected to the Apache pool.  Your snippet that lives in /etc/sysconfig/iptables will look something like this:

*mangle
-A PREROUTING -d 10.0.0.5 -p tcp -m tcp -j MARK --set-mark 0xc
-A PREROUTING -d 10.0.0.5 -p tcp -m tcp --dport 80 -j MARK --set-mark 0xA
-A PREROUTING -d 10.0.0.5 -p tcp -m tcp --dport 443 -j MARK --set-mark 0xA

Where 10.0.0.5 is the print service’s Virtual IP and our LVS configuration is set to direct firewall mark 0xC to the Cups pool and 0xA to the web pool.

Yum API: Reloading Repos

While working with the Yum driver for Bcfg2 (a configuration management tool) I came across an interesting problem.  Bcfg2 would load the driver and Yum would read all its configuration and .repo files.  Next, Bcfg2 would attempt to install files and packages according to its specification.  Bcfg2 would gladly install additional .repo files for Yum, but the Yum driver would have no knowledge of them and wouldn’t be able to install packages from the new repo.  You would have to run the Bcfg2 client again for the Yum driver to proper install the new packages.  A bug, and it was time to fix it.

Turns out, using the Yum API, there isn’t a way to reset or reload the .repo files without a lot of work.  All my attempts just removed all repos from Yum’s little mind rendering it unable to install any package.  A suggestion from the Yum-devel mailing list was to simply reinstantiate the Yum API. So,

import yum
b = yum.YumBase()
...do interesting things...
b = yum.YumBase()
...do interesting things with updated .repo files...

As I was cleanly between transactions at that point in the code, this worked very well.

My Bcfg2 patches and work can be found at GitHub: https://github.com/jjneely/bcfg2

Repairing Users’ Accounts

How does one go about repairing a user’s UNIX/Linux account?  We very often run into this issue were things have gone wrong in a user’s account.  A corrupted Firefox profile, a bad .xsession script.  Or any number of problems.  How do other people go about resetting a user’s account back to its default configuration at scale?

We have several tools available on the web or through the GDM session manager that all end up calling one very small script.  This script has a collection of files and directories it moves safely out of the way and attempts to repair Firefox and Chrome profiles.

repair_dotfiles.sh

What does your IT shop do?

Configuring kdump on RHEL 6

Some quick notes for configuring kdump on RHEL 6.  Kdump produces a vmcore on a kernel panic, oops, or other condition that our friends a Red Hat support can use to debug kernel level issues.

  1. Make sure /var/crash has space for vmcores.  You need to have enough space for an entire dump of RAM just to be safe.
  2. Add crashkernel=128M to your kernel command line in /boot/grub/grub.conf
  3. Setup /etc/kdump.conf to save vmcores to the right place.  I normally have /var in a separate logical volume so I need to change the default location.  We also setup what memory pages to leave out and to use compression.
    # cat /etc/kdump.conf
    ext4 /dev/mapper/Volume00-var
    path /crash
    core_collector makedumpfile -c --message-level 1 -d 31
  4. Make sure the kdump service is set to start on boot and restart the system.
  5. Check that there is an initrd in /boot created for kdump.  It will have “kdump” in the file name.
  6. Test your configuration.
    # echo 1 > /proc/sys/kernel/sysrq
    # echo c > /proc/sysrq-trigger
  7. Configure the system to kernel panic on oops or NMI depending on the problem you are attempting to capture. Add these lines to /etc/sysctl.conf and then run sysctl -p as root.
    kernel.panic_on_oops = 1
    kernel.unknown_nmi_panic=1
    kernel.panic_on_unrecovered_nmi=1

Dumb Tricks with gPXE

For my first bit of magic with gPXE I decided to replace the boot ISOs I have for folks that are unable to install machines via PXE.  With gPXE I don’t need my RHEL initrd and kernel image to boot strap myself into an install from a CD or USB stick.  I’ve encoded the TFTP server and the file name to grab and execute into the gPXE image so as long as the machine can get any type of DHCP lease it will load up my PXELINUX environment.  This makes the boot CD images work identically to doing a real PXE boot…because you are.

Step 1:  I grabbed the gPXE distribution and unpacked it.  I patched its autoboot functionality as described here.  This lets me DHCP automatically even if the first ethernet device is not the one connected to the network.  For gPXE 1.0.1 you can use my patch instead.

Step 2: Make an embedded script file.  This just supplies the information to gPXE that a normal PXE boot would get from the next-server and filename options in the DHCP response.

#!gpxe
autoboot
chain tftp://FQDN/pxelinux.0

Yup, we have DNS support so just add the FQDN of your TFTP server. In my setup I have pxelinux.0 in the root of my TFTP server.

Step 3: Build gPXE with your embedded script.

make EMBEDDED_IMAGE=path/to/your/script

Step 4: Burn the resulting ISO onto a CD and PXE boot a PXE-less machine.

Recovering RAID 5 Arrarys With Multiple Failed Devices

Failed Linux MD RAID devices. That’s what I got to deal with yesterday. The ext3 file system produced scary errors and remounted the file system as read-only. A quick look at /proc/mdstat showed

# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid5 sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0]
   645136128 blocks level 5, 256k chunk, algorithm 2 [10/8] [U_UUUUUUU_]

That’s bad.  The second hard drive had failed in a RAID 5 array with no spares.  Our mission, get the data back as best we can.

This was a fileserver that was in use.  I pulled emergency maintenance and rebooted the server into single user mode for safety.  After the reboot /dev/md1 was listed as inactive.  There were not enough working devices to bring up the array.  Obviously, it wasn’t mounted either.  This was exactly where I wanted to be.  (You could also unmount the broken file system and use mdadm --stop /dev/md1 to stop the array.)

Next, use mdadm to force assemble the array.

# mdadm --assemble --force /dev/md1
mdadm: forcing event count in /dev/sdb1 from X to Y

Now that should bring the array online.  The event counter for /dev/sdb1 was the least out of date and mdadm just fudged it.  This means we have introduced corruption.  Once the second device fails, its not long before the array fails.  So, provided that you bring back the second failed disk (not the first which failed 3 years ago, right?) you should introduce minimal corruption.  However, you now have a working array.

Next, we back up the array.  Mount /dev/md1 as a read-only file system.  Use rsync or another tool to copy off the data.  Don’t try to add more disks and rebuild the array before you back it up.

Our mission is accomplished.  We have data.

Encryption Types Order in Kerberos

Things I don’t wish to forget the next time I need them.  In a Kerberos Realm with multiple encryption types available the KDC will use the first type in the list that’s compatible with the client.  So let’s say you are adding encryption types to your KDC, of course you need to add new keys to the krbtgt/<REALM> principle.  Once you do so check out the list of keys.  What will the KDC chose to use if things are in this order?

Key: vno 2, DES cbc mode with CRC-32, no salt
Key: vno 2, AES-256 CTS mode with 96-bit SHA-1 HMAC, no salt
Key: vno 2, ArcFour with HMAC/md5, no salt
Key: vno 1, DES cbc mode with CRC-32, no salt

You guested it, des-cbc-crc!  The weakest type.  Even when the AES encryption is compatible with the client.

This is where I began to bang my head on my desk.  The order is, of course, set when you use the change_password command in kadmin or via the kpasswd tool.  The order is read from the supported_enctypes variable in your kdc.conf.  So make sure the encryption types listed for that variable are listed in the order of their strength.

Unfortunately, there’s no way to fix the order.  You need to correct the order in your kdc.conf file, restart the kadmin server, then use the change_password command again (with -keepold) to rekey the krbtgt/<REALM> principle.  Or use the kpasswd tool to update your principle.  Yes, the restart in there is required.