This blog is highly personal, makes no attempt at being politically correct, will occasionaly offend your sensibility, and certainly does not represent the opinions of the people I work with or for.
Aion, the File System
avatar

When I introduced Aion some time ago, as a follow up on the work I had been doing on backups mechanism and notably improving Apple's Time Machine, I casually mentioned that it has the important particular feature of self detection of data error/tempering, but I had not yet realised how important that is, nor did I realise that I had accidentally rediscover a part of ZFS own design.

Section 1: How Aion works

First, let me explain exactly how Aion work. Say that I have a small data set I need to backup, given as follows

Documents/
    Pictures/
        holidays.jpg
        moon.png
    letter.txt
Documents has a folder called Pictures and a file called letter.txt, and Pictures has two pictures, holidays.jpg and moon.png.

If I aion this, then here is what is going to happen: First holidays.jpg will be found and stored as a file under its own sha1. The Aion repository will get a first file called sha1-eb4d93b6b1. (I use fake sha1s for convenience.) This file has exactly the same contents as the holidays.jpg file. Then aion will create an object for this file, which looks like this

{
	"type" : "file",
	"name" : "holidays.jpg",
	"contents" : ["sha1-eb4d93b6b1"]
}
As you can see the original file name is preserved (but not the other POSIX attribute, since I just don't care), and in case you wonder the reason why "contents" is an array of sha1s and not just one, it is because Aion may decide to break up files in pieces, each piece having its own sha1. So "contents" is abstractly an ordered set of hashes. Same operation for moon.png where two pieces are created sha1-e768f3c8d3 and sha1-94c7fd4de, two news files in the aion repository, each having half of moon.png's contents and a new file objects will be created, looking like
{
	"type" : "file",
	"name" : "moon.png",
	"contents" : ["sha1-e768f3c8d3","sha1-94c7fd4de"]
}

Now, the big question is where do those objects go ? Well, they are JSON serialised and stored in the aion repository under their own sha1, just like any other file, leading to two new files being written in the repository, which sha1s are sha1-33e0198f6 and sha1-26cc1090340.

At this point, the Pictures folder object can be created and looks like

{
	"type" : "folder",
	"name" : "Pictures",
	"contents" : ["sha1-33e0198f6","sha1-26cc1090340"]
}
And again this object will hit the Aion repository. Then we get
{
	"type" : "folder",
	"name" : "Documents",
	"contents" : ["sha1-7d8ab4cba2f","sha1-dc0b06624b"]
}
Where, as you have guessed, sha1-7d8ab4cba2f is the sha1 of the Pictures folder objects and sha1-dc0b06624b is the sha1 of the letter.txt file object, the letter.txt file itself having it's own sha1.

One interesting thing with this scheme is that just looking at the object representing a folder we only know how many files or folders it contains in total. We don't know which ones are files and which are folders (you need to look up the corresponding objects for that).

Now, let us assume that the top object's sha1 is sha1-fe0a79be521. You can safely go to bed and when you come back the following morning you can be sure that nobody tempered with your files during the night by taking sha1-fe0a79be521 and asking the Aion repository to give you the object under this name. First thing you do is to compute the sha1 of the blob that the Aion repository just gave you, and if it checks out you know that is this the top object you stored the night before. You do this recursively until you exhaust the tree and, voila !, you got your data back with the certainty that it was what you stored. Actually Aion's fsck sub command does exactly this: You give it a top hash and it recursively checks that the entire archive is correct (each file is present and is what it is supposed to be).

I recently discovered that ZFS does the same...

An even better feature is that if you do archive again the same tree (*1), then no new object is going to be created in the Aion repository. If you store each top object under a timestamp (outside the Aion repository itself), you just made yourself an extremely efficient archive system. And one that doesn't use hard linking (*2)

(*1) But if you modify a file, then all objects from this file up to the top object will have to be recomputed, but only those.

(*2) Time Machine, notably mine, uses hard linking, and this became a problem one day I was repairing a datablock: while rewriting the file I destroyed its inode number. None of the files that supposed to be linked to it were repaired and remained linked to the old version.

There is another interesting thing you can do with Aion repositories: garbage collection. If you decide to get rid of one top object's reference, you can recursively discover all the datablocks that are no longer relevant to any other top objects. For instance, files (and more often versions of files) that appeared only in the snapshot you logically discarded).

It is worth mentioning that since Aion can limit the size of datablocks, you can have a repository that easily synchronise other the Internet. No more of those 2Gb movies that keep you in the office because the sync program needed to move them over. If you limit datablocks to 100Mb, that's 20 blocks that can be moved one at a time.

Section 2: Data availability

One of the reasons it had taken me so long to adopt Aion, is that by design it breaks one feature that usually drives me away from some backup/archives solutions: the fact that you cannot just plug in an Aion repository and browse your files directly. I didn't want to store my data under a system which to be useful would need me to write a virtual file system. This is easy enough to do but was against higher principles.

I recently found a nice compromise. A simple command line tool can just be my window to an Aion repository, allowing to browse it from the command line. This has the added very useful feature that I can now extract a subtree of a stored tree or even just a file, and not the entire tree (as would be required if I only have top objects hashes). This is when it fully hit me: Aion *is* a file system.

Section 3: Self repair.

All the above would just be a clever thing, if I didn't also realised something far more important. Something which became a focus of attention after my recent data corruption (when .v8 and Venus both died) : if you maintain two Aion repositories and during one fsck you discover that a datablock is corrupted (bit rotting for instance), don't panic. Just look up the other repository and find the block with the same hash, check it first and then repair the corrupted one by copying the uncorrupted one over it. ZFS does the same in its storage pools :-)

Section 4

I am about to go full Aion for all my long term archives. I will tell you how it went in half a year or something...

[ add a comment ]

Archives