What is difference between decode and decode' functions from aeson package? What is difference between decode and decode' functions from aeson package? json json

What is difference between decode and decode' functions from aeson package?


The difference between these two is subtle. There is a difference, but it’s a little complicated. We can start by taking a look at the types.

The Value type

It’s important to note that the Value type that aeson provides has been strict for a very long time (specifically, since version 0.4.0.0). This means that there cannot be any thunks between a constructor of Value and its internal representation. This immediately means that Bool (and, of course, Null) must be completely evaluated once a Value is evaluated to WHNF.

Next, let’s consider String and Number. The String constructor contains a value of type strict Text, so there can’t be any laziness there, either. Similarly, the Number constructor contains a Scientific value, which is internally represented by two strict values. Both String and Number must also be completely evaluated once a Value is evaluated to WHNF.

We can now turn our attention to Object and Array, the only nontrivial datatypes that JSON provides. These are more interesting. Objects are represented in aeson by a lazy HashMap. Lazy HashMaps only evaluate their keys to WHNF, not their values, so the values could very well remain unevaluated thunks. Similarly, Arrays are Vectors, which are not strict in their values, either. Both of these sorts of Values can contain thunks.

With this in mind, we know that, once we have a Value, the only places that decode and decode' may differ is in the production of objects and arrays.

Observational differences

The next thing we can try is to actually evaluate some things in GHCi and see what happens. We’ll start with a bunch of imports and definitions:

:seti -XOverloadedStringsimport Control.Exceptionimport Control.Monadimport Data.Aesonimport Data.ByteString.Lazy (ByteString)import Data.List (foldl')import qualified Data.HashMap.Lazy as Mimport qualified Data.Vector as V:{forceSpine :: [a] -> IO ()forceSpine = evaluate . foldl' const ():}

Next, let’s actually parse some JSON:

let jsonDocument = "{ \"value\": [1, { \"value\": [2, 3] }] }" :: ByteStringlet !parsed = decode jsonDocument :: Maybe Valuelet !parsed' = decode' jsonDocument :: Maybe Valueforce parsedforce parsed'

Now we have two bindings, parsed and parsed', one of which is parsed with decode and the other with decode'. They are forced to WHNF so we can at least see what they are, but we can use the :sprint command in GHCi to see how much of each value is actually evaluated:

ghci> :sprint parsedparsed = Just _ghci> :sprint parsed'parsed' = Just            (Object               (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf                  15939318180211476069 (Data.Text.Internal.Text _ 0 5)                  (Array (Data.Vector.Vector 0 2 _))))

Would you look at that! The version parsed with decode is still unevaluated, but the one parsed with decode' has some data. This leads us to our first meaningful difference between the two: decode' forces its immediate result to WHNF, but decode defers it until it is needed.

Let’s look inside these values to see if we can’t find more differences. What happens once we evaluate those outer objects?

let (Just outerObjValue) = parsedlet (Just outerObjValue') = parsed'force outerObjValueforce outerObjValue'ghci> :sprint outerObjValueouterObjValue = Object                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)                     (Array (Data.Vector.Vector 0 2 _)))ghci> :sprint outerObjValue'outerObjValue' = Object                   (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf                      15939318180211476069 (Data.Text.Internal.Text _ 0 5)                      (Array (Data.Vector.Vector 0 2 _)))

This is pretty obvious. We explicitly forced both of the objects, so they are now both evaluated to hash maps. The real question is whether or not their elements are evaluated.

let (Array outerArr) = outerObj M.! "value"let (Array outerArr') = outerObj' M.! "value"let outerArrLst = V.toList outerArrlet outerArrLst' = V.toList outerArr'forceSpine outerArrLstforceSpine outerArrLst'ghci> :sprint outerArrLstouterArrLst = [_,_]ghci> :sprint outerArrLst'outerArrLst' = [Number (Data.Scientific.Scientific 1 0),                Object                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)                     (Array (Data.Vector.Vector 0 2 _)))]

Another difference! For the array decoded with decode, the values are not forced, but the ones decoded with decode' are. As you can see, this means decode doesn’t actually perform conversion to Haskell values until they are actually needed, which is what the documentation means when it says it “defers conversion”.

Impact

Clearly, these two functions are slightly different, and clearly, decode' is stricter than decode. What’s the meaningful difference, though? When would you prefer one over the other?

Well, it’s worth mentioning that decode never does more work than decode', so decode is probably the right default. Of course, decode' will never do significantly more work than decode, either, since the entire JSON document needs to be parsed before any value can be produced. The only significant difference is that decode avoids allocating Values if only a small part of the JSON document is actually used.

Of course, laziness is not free, either. Being lazy means adding thunks, which can cost space and time. If all of the thunks are going to be evaluated, anyway, then decode is simply wasting memory and runtime adding useless indirection.

In this sense, the situations when you might want to use decode' are situations in which the whole Value structure is going to be forced, anyway, which is probably dependent on which FromJSON instance you’re using. In general, I wouldn’t worry about picking between them unless performance really matters and you’re decoding a lot of JSON or doing JSON decoding in a tight loop. In either case, you should benchmark. Choosing between decode and decode' is a very specific manual optimization, and I would not feel very confident that either would actually improve the runtime characteristics of my program without benchmarks.


Haskell is a lazy language. When you call a function, it doesn't actually execute right then, but instead the information about the call is "remembered" and returned up the stack (this remembered call information is referred to as "thunk" in the docs), and the actual call only happens if somebody up the stack actually tires to do something with the returned value.

This is the default behavior, and this is how json and decode work. But there is a way to "cheat" the laziness and tell the compiler to execute code and evaluate values right then and there. And this is what json' and decode' do.

The tradeoff there is obvious: decode saves computation time in case you never actually do anything with the value, while decode' saves the necessity to "remember" the call information (the "thunk") at the cost of executing everything in place.