This blog is highly personal, makes no attempt at being politically correct, will occasionaly offend your sensibility, and certainly does not represent the opinions of the people I work with or for.
JSON and Haskell, stumbling upon the mental missing link
avatar

I have just made a relatively remarkable semantic discovery, I think this is the first time that I actually understand something new as a side effect of using Haskell... In fact I think that it's the first time that a programming language challenges me in having new thoughts.

As the regular readers of this blog know, I am a big fan of JSON. In the early days of Ajax I briefly had to deal with XML, but as soon as the web collectively decided to ditch XML and replaced it with JSON as API response format, programming became fun again.

So now, what is JSON ? When we think of JSON, we think of those strings such as '[1,2,3]' or '{"name":"Pascal"}' and their pretty print variations. I never liked calling those strings "JSON" and always have referred them as "JSON strings", mostly as a mental defence mechanism. But then the question remains: what is JSON itself ?

Imagine that for some reason you write something that outputs values of a given programming language as JSON strings (I do that often when I write in Ruby). You know that some values will easily have a JSON string transform; for instance running JSON.generate on the Ruby array [1,2,3] (or json_encode on the PHP array array(1,2,3)), you will get the string "[1,2,3]". Now, imagine that you document your little logger and want to specify which values can be fed to your transform. This is important because if you pass the wierd data types of your language, the encoding function will complain (I have never tried to json encode a PHP file handler, for good reasons...). So then you end up with this embarrassing semantic situation where you somehow can only say "The domain of this function is all the values for which it doesn't break." And note that you do not even need to go to the weird stuff: [1] and ["1"] won't break your transform, but 1 or "1" will, so there is something non trivial going on here.

Haskell, through the Aeson package, takes the following position: The only thing you can json encode (into JSON strings) are values of the Value datatype. That's all. Nothing else.

So then how do you json encode the simple [1,2,3] :: [Int] into "[1,2,3]" ? Easy, first you _have_ to create a member of the data type Value that contains the same information as the list [1,2,3] and then run json encode on that value. Sounds easy enough, right? but then building that member of the Value datatype is as easy as

$ ghci
GHCi, version 7.10.2: http://www.haskell.org/ghc/  :? for help
Prelude> import Data.Aeson as A
Prelude A> import qualified Data.Vector as V
Prelude A V> let x = A.Array $ V.fromList [A.Number 1, A.Number 2, A.Number 3]
Prelude A V> A.encode x
"[1,2,3]"

I will spare you encoding actual objects...

The first time I realised that I was about to start using Aeson (because Gaia works on existing conventions -- the hierarchy of JSON strings representing a file system that my backup system, Aion, uses), I internally thought "FuuuuuuK!!". But today, as it occurred to me, Aeson has the _right_ approach to JSON. Aeson, doesn't hide from you the mental step that I somehow missed all those years because the programming languages I was using let me get away with it. If your [put your favourite dynamic programming language] let you "conveniently" run json encode on a value without bothering you about the details, you are missing the point: JSON itself is an algebraic data type.

This is a departure from the RFC, RFC 7159, which states: "JavaScript Object Notation (JSON) is a text format for the serialization of structured data. It is derived from the object literals of JavaScript, as defined in the ECMAScript Programming Language Standard, Third Edition [ECMA-262].", thereby presenting it as a text format, which is sad, to say the least. The official source json.org, also fails to convey the correct ideas...

Anyway, I can now answer the question of what JSON actually is. JSON strings are admittedly a data exchange format, actually a data exchange format as well as a data storage format, but JSON itself is an algebraic datatype, an abstract way to organize data, which is defined independently to its standard text serialization.

Oh yes, last but not least, Aeson doesn't assume anything on how you might want to convert your datatype into a member of Value, you have to do that by hand, by making your dataype an instance of the ToJSON typeclass. (With great powers comes great responsibility.) The most difficult part of the whole Aeson thing is the reconstruction of your datatype from a JSON string. At first I found it difficult to understand but in fact thinking of it as just a map from one abstract datatype to another, things are suddenly more digestable.

[ add a comment ]

Archives