I’ve been researching quite a few algorithms for my client, Bruce, as I continue to scale out certain systems. I thought that getting them on my blog would be very useful for a future version of myself and many others. I suspect, and hope, that most folks that work in Systems Administration / Operations will find these at least familiar.

## Flap Detection

Flap detection is the ability to determine rapid state changes so that one can take corrective action. The text book example is BGP route flapping where a new route is advertised and then withdrawn (a flap) multiple times in a short period. In general, this “scores” an event with higher scores the more frequent the event.

## Warning: Math

For each event, a “penalty” $P$, “timestamp” $t$, and a boolean “isFlapping” variable is stored. $P$ is the current “score” for this event and is $0$ for an event that has not happened. $t$ is the timestamp, usually Unix Epoch time, of the last event.

Each time an event occurs we add a penalty value to the current penalty. The penalty value decays exponentially over time. A suppress limit is known which defines if the event is flapping when the event’s penalty is greater than the suppress limit. A lower reuse limit is known when the event is considered no longer flapping once the penalty value is less than the reuse limit.

### Summary

 Penalty $P$ An event's current score. Half-Life $h$ How much time for half of the penalty to decay. Timestamp $t$ Unix Epoch time of last event. Supress Limit Penalty > Suppress Limit == Flapping Reuse Limit Penalty < Reuse Limit && Flapping == Recovery

### Calculating for $P$

$$P(t_2) = P(t_1) \times e^{-(t_2 - t_1) \times ln(2) / h}$$

## Notes

f(x) = 2.71828 ** (-x*log(2) / 60)
set ylabel "Penalty"
set xlabel "Time"
set key off
set terminal png size 800,600
set output 'output.png'
set label "Reuse Limit" at 40,11
set label "Suppress Limit" at 100,26
plot [0:240] x < 10 ? f(x) : x < 20 ? f(x-10)*10 : x < 30 ? f(x-20)*20 : f(x-30)*30, 10 with dots, 25 with dots