Interview Questions for System Administrators

I originally wrote this post in October of 2008. Where did those 9 years go? I think its time for an update.

As the Linux Czar, I’m regularly asked to interview folks that are applying to various jobs that require some Linux skills. Interviewing isn’t really my strong point and I always struggle to come up with good questions that will lead the candidate to talk about himself and his skills in a helpful way. So, here are my own interview questions I’ve been known to use:

General

What email lists, blogs, websites, etc. do you read regularly to keep current in your areas of interest and IT in general?
Tell us about the systems/infrastructure you deployed that you are most proud of.
What tools and techniques have you used to deploy servers, workstations, etc. at scale?
Tell us about a time you screwed up. What did you do and how did you realize the problem? How did you get it fixed?

File system

What’s an inode and how can you find the inode number of a file?
How do you delete a file who’s name starts with a dash (ie: -foo)? What about a file name that contains control characters?
What’s the differences in RAID levels 0, 1, 4, 5, 6? 10? 50?
When would you not partition a storage array?
What are the advantages of BTRFS and ZFS? How are the two different?

Automation

Have you used Red Hat Kickstart’s before? When and why would you use the %pre and %post sections?
How do you define a new macro in an RPM .spec file?
How do you override the build process in a Debian package?
Chef, Puppet, Ansible, Docker familiarity?
Mesos, Aurora, Kubernetes, Docker Compose, or Nomad experience?
Build system experience: pbuilder, Bazel, Pants?

Coding

Let’s look at your GitHub account!!
What coding languages do you like? Why? What features do you dislike about them?
How does a web server handle requests at scale?
Sort the keys of a hash or dictionary object without using library functions.
How does a hash or dictionary object work? Can you implement one?
Name 3 sorting algorithms, the Big O or each, and the reasons you might use them.
What’s the dirtiest code you’ve written that’s still in production and why?

Monitoring

Talk about the difference between blackbox and whitebox style monitoring or products like Nagios vs a data driven solution like Prometheus. What are the pros and cons?
What is synthetic monitoring?
What’s the difference between a mean and an average?
What is the mean, median, and mode?
What is a percentile or quantile? How do you calculate them from a set?
StatsD gives you a percentile for each application instance representing latency. How do you build a global percentile for your entire application?
What are histograms? How can they be implemented in data based monitoring solutions? What algorithm(s) can be used to build quantiles?
What is an SLA or SLO? If these thresholds are known in advance and are used a bucket boundaries in a histogram, build a simple calculation that is a quantile of observations in violation of the SLO.

Problem Solving

A fleet of bare metal systems with RAID cards regularly crashes after about a month and a half of up time. The physical drives are taken offline by the RAID controller leaving read-only file systems or damaged RAID arrays. Walk through how you would debug and ask questions for additional clues.
A fork bomb is in progress and you happen to have a root shell still open on the machine. Clean up the fork bomb attack without rebooting or (obviously) spawning additional processes.

In general, these aren’t really questions but starting points of a conversation. I’m looking for T-shaped people in most cases and these starting points help me find the depth and breadth of someone’s skill set.

LinuxCzar