Configuration Management

June 17, 2007 Operations

Configuration anagement (CM) is a critical point of doing large scale, or even small, systems administration. Its more than overly important that your various machines pick up new and updated configuration files easily and in a timely matter. At NCSU, I’ve been doing what’s counted at CM using a python project I call Realmconfig. Okay, Realmconfig has been around at NCSU managing linux machines longer than I’ve been its maintainer. As Realmconfig developed, it gained more and more CM-like features such as a arbitrary collection of modules than run at boot to handle initial configuration. One of these modules “manages” a selection of files, if the file isn’t identical to the gold copy its replaced with the gold version. Generally, its worked well for initial configuration put pushing out changes hurt. They hurt bad.

Said module that “manages” files can either run once, run every boot, or run only when I bump its version which requires a new Realmconfig package. Run once handles inital configuration but ignores any updates. Run every boot sees updates, provided I’ve included them in a new package, but is draconian in applying those updates. Certain systems have modified configuration files in place and need to keep them. Only running when the version is bumped is a compromise, but I end up with the worst of all the problems.

Obviously, I need to move away from a haphazard collection of simplistic scripts to something that can scale to an environment of thousands of machines, not require a new RPM package for any update (unless one solely decides to do CM by RPM), propagate updates easily without weird scripts, ability to handle restarting of services and small scriptlets, and allow for administrators to override/replace aspects of the configuration I’ve provided.

The last point sounds a bit odd. Allow my configuration to be overridden? Most CM systems scale well but are designed around centralized administration. That’s a little different than what I will call centralized management. Centralized administration is one or a group of system administrators that act as one, unified entity to manage machines. In this case they are all trusted and working together to build and maintain their infrastructure. Any configuration changes made from outside the centralized administration are, by definition, not approved and should be quickly reconciled with the known good configuration. Or, more simply, you have a compromised machine.

However, I work at a university with many fiefdoms. Seldom is anything done that may be perceived as giving away direct control of something to another fiefdom. Fortunately, systems administrators are normally smart folks and understand that working together across fiefdoms they can achieve bigger and better things. Some, however, don’t. So what we have at the university are modified versions of Solaris, Windows, and Linux that we (the central IT folks) make available to the university. The colleges, and departments can deploy these “kits” as they need and leverage centralized management of the machines and deploy their own labs, workstations, and services. Most importantly, they can still be “in control” of the machines themselves.

So, where does this leave us in the realm of Configuration Management? I require a system where I can push out changes to all the managed linux machines on campus. Also, local systems administrators that may not be trusted with all the configuration of all machines may wish to add configuration and have it enforced on their machines. Its possible that there might be a third layer as well. Also, if a local administrator decides to manage a file I also manage we need to do something a lot smarter than replace it with the global copy. We need to merge, or make sure that their files remain intact and ignore the global changes.

I’m not aware of any CM tool that this flexible. I’ve been looking at Bcfg2 and will spend some more time with it as well. For a CM it seems designed well, stays away from inventing new languages, scalable, and is written in Python. We’ll see how it plays out in my testing. An important part for a useful CM tool is something that there is a community around rather than some random code I wrote. Bcfg2 has a very active community and maintainer.

Now we get into my crazy ideas. Toss in the hopper that configuration should be managed by some sort of SCM so that we have backups, machines can have their configuration rolled back, and a log is kept of why the configuration was updated. Suddenly, to me at least, we have the use case of a distributed SCM. Each machine has its own repository where configuration changes can be made locally and the machine can pull its configuration from any other machine, by default a master repository. An easy way to make a configuration hierarchy. We just need to be smart about automation and conflicts.

Using Git and pretending we have a useful data schema and tools to make use of it, how do we manage the magic local repository based on another machine’s?

git clone SOURCE_REPO
git branch upstream origin
git pull origin :upstream
git merge -s ours upstream
System distrubutes configuration files. Local admins can commit their own configuration to HEAD.
Goto step 3.

Alas, the local changes override the upstream default in an automated way and pull their configuration information from any arbitrary point.

Probably complete crack.

LinuxCzar

Configuration Management