**Hydra, or Fun and Happiness seen by Pascal**

A couple of days ago I mentioned Hydra, a new deamon running on alexandra. It is a perfect example of the kind of stuff I write to make me happy :-). Seriously, just happy. Let me explain what it is by describing the data structures and the functions on them.

### Part 1

First, a **collection** is an **ordered** set of unix timestamps (an array in fact). An example of collection is [1346793147, 1346796986, 1346799873, 1346817413,1346824180, 1346831914, 1346841071, 1346847553]. Collections have the following properties: they always have at least three elements, the timestamps are always in the future, they do not have duplicate values, and always within 24 hours of the current time (current time is 1346782313 at the moment this sentence is written).

As time passes, the current time is going to reach the first element of the collection (the one with the lowest future value). At this moment, the following happens: the **first** element is removed from the collection, and **two** new timestamps, randomly chosen between now and now+24 hours, are added to the collection (at the relevant positions as to keep the collection ordered).

As it must be obvious to you, at each transition the number of elements in the collection increases by 1. We do not want collections to become unboundedly large, so let us describe another kind of transition, but for this we need some metrics and notations.

Given an ordered array of numbers **a**, the object **a.adj_diffs** is the array of differences between adjacent values. This array has got one less element than the original array. For instance if **a = [1, 2, 4, 10, 12]**, we have **a.adj_diffs = [1,2,6,2]**. We then consider the functions, **.min** and **.max** on arrays of numbers, which return, respectively, the smallest and biggest elements. So, **a.adj_diffs.min** is the minimum distance between two adjacent elements of the array and **a.adj_diffs.max** is the maximum distance between two adjacent elements of the array. For instance, **[1, 2, 4, 10,12].adj_diffs.min = 1** and **[1, 2, 4, 10,12].adj_diffs.max = 6**. We also have a function that returns the number of elements in an array **.count**. For instance, **[1, 2, 4, 10, 12].count = 5**. Given an array **a**, we define **a.std_dev** as the standard deviation of its elements.

Given a collection **c**, we denote by **c(+)**, the collection to which we add the current time and time+24hours, in other words, **time()** and **time()+86400**.

To limit the number of elements in a collection, we do as follows. Given a collection **c**, when (and as long as) **c(+).adj_diffs.max** is less than **3600** (number of seconds in one hour), we perform the following transition: we chose a random integer number between **1** and **c.count-3** and we remove one randomly chosen element from the collection that number of times. The resulting collection has a number of elements anywhere between **3** and **c.count-1**.

### Part 2

The previous section describes the two kinds of transitions that a collection can go through (essentially increasing its size and decreasing its size). We are now going to define the **index** of a collection.

First, we define **c.n1** as **ln( c(+).adj_diffs.std_dev + 1)**, where **ln** is the natural logarithm. The thing to remember here is that the lesser the number of elements of **c** and the more regularly spaced they are, the bigger **c.n1** is. I think that its maximum possible value is about **ln( [1,86399].std_dev ) = 11.36673**, but I haven't proven it.

Second define **c.n2** as **ln(c.count)**. The value **c.n2** is theoretically unbounded. Well, actually it has a maximum value of about **ln(86400-3600) = 11.32418** (coming from the fact that we use discrete time, the algorithm time is accurate to the second, not fractions of seconds). This said, it is obvious that even if the process had to run for as long as 10 times the age of universe, **c.n2** would not come anywhere near this upper bound. The thing to remember here is that the bigger the collection (in terms of numbers of elements), the bigger **c.n2** is.

Now, given any real number **alpha** between 0 and 1, we define

**c.index(alpha) = alpha*c.n1 + (1-alpha)*c.n2**

In other words,

**c.index(alpha)**is the weighted average of

**c.n1**and

**c.n2**parameterised by

**alpha**. Given the above discussion, its theoretical maximum value is

**[11.36673,11.32418].max = 11.36673**.

Implementing the above is a trivial exercise, nothing difficult. You could display the current value of **alpha** in real time, and you have something to look at during your long days at work... With the above algorithm, the index of the collection jumps to a new value at each collection transition.

### Part 3

So now, let us push things a little bit. The value of alpha is not constant, but slowly alternate continuously between 0 and 1. The exact formula is

**0.5*sin(theta(time) + theta_{0}) + 0.5**

where

**theta(time)**is a linear function of time and

**theta_{0}**a constant called

**shift**. The function

**theta**is such that

**theta(time)**increases by

**2 * PI**in 24 hours. In other words

**alpha**"circles" from 0 to 1 and then back to 0 in 24 hours.

### Part 4

So now, I do not have one, but three collections running in parallel, and their shifts are **0**, **2*PI/3** and **4*PI/3**. Those values are not random, they put the shifts on the vertices of an equilateral triangle on the unit circle. (I hope you guys remember high school trigonometry ^_^)

With the above running, I call hydra index the absolute value of the biggest difference between any two collection indexes. In other words, the hydra index is

**[ abs(c1.index-c2.index), abs(c1.index-c3.index), abs(c2.index-c3.index) ].max**

This index jumps a little bit at each transition of one of the three collections and is continuously sensitive to the current time (remember that the underlying alphas depend on the time modulo 12 hours). Interestingly,

**c(+).adj_diffs.std_dev**is also sensitive to time, because

**c(+)**itself is. Also note that what I said that the end of the previous part is incorrect because actually the three collections do not have the same period. Their alphas "cycle" at different speeds. They do a complete circle in, respectively, 7, 13 and 29 hours.

The demon performs any possible transition and recomputes the index every few seconds. It also dumps the computational state to a file after each collection transition. This file is read at startup and the computation carries on.

I have a real time display of the three collections indexes and the main hydra index in alexandra's HUD.

### Part 5

One more thing... The demon also keeps track of hourly hydra indexes for the past 28 days. It then notify me (both in OSX Notification Center and activity-stream) every time the current index has broken the record of the past 28 days. Hydra has been running for few days and the current record is still relatively low, but it will increase during the next hours/days, and then in the long run, as previous records become older than 28 days and are discarded, I will get a notification.... once in a while.

My only problem, now is to decide what I am going to do every time a new record is broken... Something naughty :-)

### [update, 28th September]

The values of the three collection indexes composing the main hydra index (hydra) where particularly concentrated when I woke up this morning, resulting in the index having a particularly low value, so I modified the deamon to notifiy me on low record breaking values.