This blog is highly personal, makes no attempt at being politically correct, will occasionaly offend your sensibility, and certainly does not represent the opinions of the people I work with or for.
Lucille Object Store, or the art of awesomeness

Yes, have an object store, you deserve it, and it will make you a better person.

In this entry I am going to cheerlead for an idea, an awesome idea. An old idea in some sense... But first let us see what leads to it...

In the beginning you learn programming, you are happy to manipulate data in non trivial ways. This data could be your file system and you build something like an archive utility, or just something that consumes an API you came across on the Internet.

Quickly, you discover that you want to manipulate your own data; you want to have your own, say, shopping/todo list. There are lots of those out there, both as online and desktop apps, but the problem, though, is that with the programming skills come something else: your have become mentally sophisticated; and think that if a program you use doesn't do exactly what you want, then it sucks; unforgivably. No worries, you quickly learn about databases, or a nice data storage api, and you enjoy defending your favourite nosql tool in online discussions.

As your ecosystem of programs grows, you start wondering "would not it be nice if all those programs could communicate...". Yes, you know about unix pipes, but that's not what you are talking about, you want to build APIs so that programs can query each other's services, capabilities and data.

This goes on for a while, and can actually go on for ever. Many companies are quite happy with that level of development. The others, they prefer having people manually copy-paste data from one system to another, you know, so that everybody has got a job to do...

Anyway, you keep thinking about it. You have a collection of APIs to worry about and maintain, and after a while you realise that every time you want something new, you basically, to some extent, always rebuild what it basically a CMS. This leads to stuff like this: Lucille's dataflow in October 2013 (lost link).

The thing is that, as you realise it, you are now in a stage where you are less interested in new core functionalities, but new ways to manipulate your existing data. For instance, you want to find the set of your emails which have a certain word in common with your todolist archives. To do this you would have to make a modification in your email client, and then another one in your todolist program, take the risk of breaking something while being at it, and in the end will have to save the result somewhere... You are good at programming so it's not a problem, but you keep thinking "Why am I wasting my time modifying those programs when all I want is just to compute the intersection of two sets ?.."

... and this is when it hits you: emails and todolist items (to take the above examples) ought to be in the same repository. The same.

While you ponder that, you get lucky and come across camlistore. You then think "Hoo yes baby, that's an awesome idea", but also you might not want to invest to much into a program being developed that might not totally suit your needs in its current form.

When I came across camlistore I got myself distracted by the content addressable storage part (which wasn't bad at all since it led to a redesign of by backup solutions), I also let myself distracted by the fact that camlistore's default's data organisation schema is a reproduction of the unix file tree, something I have wanted to get away from (whole purpose of Galactica). It then took me a long time (during which I was stuck in the write-try-to-use-throw-code-to-the-bin-restart cycle) before realising that I had missed the most important part about camlistore: the fact that each object has got a real unique reference and that objects (json entities) have got a very self-descriptive layout (Brad Fitzpatrick said "to facilitate the work of data archeologists hundred of years in the future") and that objects can so easily refer to each other across object types, and, cherry on the cake, the blob storage part makes the entire universe suited for managing non-json data, like your collection of pictures or pdf files, or whatever...

Eventually Lucille Object Store was born. As it says in the name, it is the new general object store of the Lucille ecosystem. It is a layer of json objects above a contents addressable (binary) store. Each object has at least a unique id, an a type (which clearly identify its schema). For the rest, everything is open. This store is where (all, almost all) my programs now store and query their data.

To come back to a previous analogy, emails and todo list items now live side by side in the same universe and programs can just query the sets that they want operate on and then commit modifications/updates/new-objects if needed. You want all emails ? Fine. You want all todolist items ? Fine. You want all objects created over the past 24 hours regardless of which program manages them (essentially regardless of their type) ? Fine as well; awesome! You want all objects that refer to this particular picture, would it be a weblog entry, an email or an activity stream item ? Fine as well, super awesome!!

An additional key, that an object can have, specifies if the object is public or not. Public objects can be seen on the web. For instance this weblog entry is an object of type 'galactica-atom' of uuid '9730b731-df4f-4af4-91aa-23df551ed56e'. Galactica atoms are in principle managed by my program called galactica, but this atom is in fact actually created by a program called glock. Glock is active as I write those lines and glock will create the galactica atom and put it in the Lucille Object Store, where it will be picked up by galactica as if galactica created it itself. Objects that can be seen online are under the cover of what I call the Lucille Object Server.

You can see this entry under various forms. First you can see it as a pure LOS object, and also you can see it as a galactica object, then, depending on whether you are reading this from the RSS feed or the website, it was given to you as RSS item or the standard weblog item (this latter one comes with facilities to add a user comment).

When the data referenced by a los object is essentially binary, for instance a picture, you can see the json blob as well as the binary blob, both served with the correct mime type.

In the end, be awesome. Get yourself a general life long object store. You won't regret it!

Now that I have done a rather good job explaining most of camlistore through the story of my own general purpose object store, there is an aspect of camlistore that I have never mentioned and which at first looked like overkill but should now make perfect sense: all the crypto stuff.

Camlistore is simply what happens when two or more people are sharing the very same internet-wide object store (for instance because they are using it to build a new kind of social network). The problem that those people now have (and that I do not) is: how do we ensure that the system allows data to be "shared" (and what does that even mean ?) while protecting its integrity (against intentional or accidental tempering).

This is not a simple problem because you might not know in advance which data might be shared, you might not know what "sharing" will actually mean (because "sharing" relates to policies that are not really covered by the store's internal logic) and you might also want to have the certainty that even the admins of the store (or whomever has priviledged access to it) won't temper with your data.

The solution chosen by the creators of camlistore is: sign every mutation of your data with your private key. Easy, simple, no fuss.

[ add a comment ]