This blog is highly personal, makes no attempt at being politically correct, will occasionaly offend your sensibility, and certainly does not represent the opinions of the people I work with or for.
Hydra, or Fun and Happiness seen by Pascal

A couple of days ago I mentioned Hydra, a new deamon running on alexandra. It is a perfect example of the kind of stuff I write to make me happy :-). Seriously, just happy. Let me explain what it is by describing the data structures and the functions on them.

Part 1

First, a collection is an ordered set of unix timestamps (an array in fact). An example of collection is [1346793147, 1346796986, 1346799873, 1346817413,1346824180, 1346831914, 1346841071, 1346847553]. Collections have the following properties: they always have at least three elements, the timestamps are always in the future, they do not have duplicate values, and always within 24 hours of the current time (current time is 1346782313 at the moment this sentence is written).

As time passes, the current time is going to reach the first element of the collection (the one with the lowest future value). At this moment, the following happens: the first element is removed from the collection, and two new timestamps, randomly chosen between now and now+24 hours, are added to the collection (at the relevant positions as to keep the collection ordered).

As it must be obvious to you, at each transition the number of elements in the collection increases by 1. We do not want collections to become unboundedly large, so let us describe another kind of transition, but for this we need some metrics and notations.

Given an ordered array of numbers a, the object a.adj_diffs is the array of differences between adjacent values. This array has got one less element than the original array. For instance if a = [1, 2, 4, 10, 12], we have a.adj_diffs = [1,2,6,2]. We then consider the functions, .min and .max on arrays of numbers, which return, respectively, the smallest and biggest elements. So, a.adj_diffs.min is the minimum distance between two adjacent elements of the array and a.adj_diffs.max is the maximum distance between two adjacent elements of the array. For instance, [1, 2, 4, 10,12].adj_diffs.min = 1 and [1, 2, 4, 10,12].adj_diffs.max = 6. We also have a function that returns the number of elements in an array .count. For instance, [1, 2, 4, 10, 12].count = 5. Given an array a, we define a.std_dev as the standard deviation of its elements.

Given a collection c, we denote by c(+), the collection to which we add the current time and time+24hours, in other words, time() and time()+86400.

To limit the number of elements in a collection, we do as follows. Given a collection c, when (and as long as) c(+).adj_diffs.max is less than 3600 (number of seconds in one hour), we perform the following transition: we chose a random integer number between 1 and c.count-3 and we remove one randomly chosen element from the collection that number of times. The resulting collection has a number of elements anywhere between 3 and c.count-1.

Part 2

The previous section describes the two kinds of transitions that a collection can go through (essentially increasing its size and decreasing its size). We are now going to define the index of a collection.

First, we define c.n1 as ln( c(+).adj_diffs.std_dev + 1), where ln is the natural logarithm. The thing to remember here is that the lesser the number of elements of c and the more regularly spaced they are, the bigger c.n1 is. I think that its maximum possible value is about ln( [1,86399].std_dev ) = 11.36673, but I haven't proven it.

Second define c.n2 as ln(c.count). The value c.n2 is theoretically unbounded. Well, actually it has a maximum value of about ln(86400-3600) = 11.32418 (coming from the fact that we use discrete time, the algorithm time is accurate to the second, not fractions of seconds). This said, it is obvious that even if the process had to run for as long as 10 times the age of universe, c.n2 would not come anywhere near this upper bound. The thing to remember here is that the bigger the collection (in terms of numbers of elements), the bigger c.n2 is.

Now, given any real number alpha between 0 and 1, we define

c.index(alpha) = alpha*c.n1 + (1-alpha)*c.n2

In other words, c.index(alpha) is the weighted average of c.n1 and c.n2 parameterised by alpha. Given the above discussion, its theoretical maximum value is [11.36673,11.32418].max = 11.36673.

Implementing the above is a trivial exercise, nothing difficult. You could display the current value of alpha in real time, and you have something to look at during your long days at work... With the above algorithm, the index of the collection jumps to a new value at each collection transition.

Part 3

So now, let us push things a little bit. The value of alpha is not constant, but slowly alternate continuously between 0 and 1. The exact formula is

0.5*sin(theta(time) + theta_{0}) + 0.5

where theta(time) is a linear function of time and theta_{0} a constant called shift. The function theta is such that theta(time) increases by 2 * PI in 24 hours. In other words alpha "circles" from 0 to 1 and then back to 0 in 24 hours.

Part 4

So now, I do not have one, but three collections running in parallel, and their shifts are 0, 2*PI/3 and 4*PI/3. Those values are not random, they put the shifts on the vertices of an equilateral triangle on the unit circle. (I hope you guys remember high school trigonometry ^_^)

With the above running, I call hydra index the absolute value of the biggest difference between any two collection indexes. In other words, the hydra index is

[ abs(c1.index-c2.index), abs(c1.index-c3.index), abs(c2.index-c3.index) ].max

This index jumps a little bit at each transition of one of the three collections and is continuously sensitive to the current time (remember that the underlying alphas depend on the time modulo 12 hours). Interestingly, c(+).adj_diffs.std_dev is also sensitive to time, because c(+) itself is. Also note that what I said that the end of the previous part is incorrect because actually the three collections do not have the same period. Their alphas "cycle" at different speeds. They do a complete circle in, respectively, 7, 13 and 29 hours.

The demon performs any possible transition and recomputes the index every few seconds. It also dumps the computational state to a file after each collection transition. This file is read at startup and the computation carries on.

I have a real time display of the three collections indexes and the main hydra index in alexandra's HUD.

Part 5

One more thing... The demon also keeps track of hourly hydra indexes for the past 28 days. It then notify me (both in OSX Notification Center and activity-stream) every time the current index has broken the record of the past 28 days. Hydra has been running for few days and the current record is still relatively low, but it will increase during the next hours/days, and then in the long run, as previous records become older than 28 days and are discarded, I will get a notification.... once in a while.

My only problem, now is to decide what I am going to do every time a new record is broken... Something naughty :-)

[update, 28th September]

The values of the three collection indexes composing the main hydra index (hydra) where particularly concentrated when I woke up this morning, resulting in the index having a particularly low value, so I modified the deamon to notifiy me on low record breaking values.