aeson and dlist in HP 2013.4.0.0
    Bas van Dijk 
    v.dijk.bas at gmail.com
       
    Thu Nov 28 16:42:49 GMT 2013
    
    
  
On 28 November 2013 13:42, Sven Panne <svenpanne at gmail.com> wrote:
> Just two add my 2c: Given all these new packages which would need to
> be pulled into the HP just for aeson, let's not include aeson for
> 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy
> discussions.
As the proposer for inclusion of aeson in the HP I'm beginning to agree.
There's another reason I would like to postpone the aeson inclusion: I
just started working on improving the encoding performance of aeson.
This requires some significant changes to the API. Therefore I think
it would be better to see how well this new API works out. If it works
out, release it as aeson-7 (or aeson-8) and include that release in
the HP after next. This way we have time to discuss the new
dependencies and the HP remains stable.
The following is a brief explanation of the new aeson API (you can
stop reading here if you're not interested in it):
The idea is to use the same trick that is used in the upcoming binary
package[1].
First of all toJSON will return a JsonBuilder instead of a Value:
class ToJSON a where
  toJSON :: a -> JsonBuilder
A JsonBuilder is basically a difference list:
newtype JsonBuilder = JsonBuilder (IStream -> IStream)
instance Monoid JsonBuilder where ...
The "list", here represented as an IStream, is a sequence of
instructions to the encoder:
data IStream =
    INull                              IStream
  | ITrue                              IStream
  | IFalse                             IStream
  | IDoubleQuote                       IStream
  | IChar   {-# UNPACK #-} !Char       IStream
  | IString                !String     IStream
  | IText                  !Text       IStream
  | IInt    {-# UNPACK #-} !Int        IStream
  | IInt8   {-# UNPACK #-} !Int8       IStream
  | IInt16  {-# UNPACK #-} !Int16      IStream
  | IInt32  {-# UNPACK #-} !Int32      IStream
  | IInt64  {-# UNPACK #-} !Int64      IStream
  | IWord   {-# UNPACK #-} !Word       IStream
  | IWord8  {-# UNPACK #-} !Word8      IStream
  | IWord16 {-# UNPACK #-} !Word16     IStream
  | IWord32 {-# UNPACK #-} !Word32     IStream
  | IWord64 {-# UNPACK #-} !Word64     IStream
  | IFloat  {-# UNPACK #-} !Float      IStream
  | IDouble {-# UNPACK #-} !Double     IStream
  | IInteger               !Integer    IStream
  | IScientific            !Scientific IStream
  | IComma                             IStream
  | IBeginArray                        IStream
  | IEndArray                          IStream
  | IBeginObject                       IStream
  | IEndObject                         IStream
  | IColon                             IStream
  | IValue                 !Value      IStream
  -- Fused:
  | IBeginObject_IDoubleQuote          IStream
  | IComma_IDoubleQuote                IStream
  -- TODO; more
  | IEnd
Converting a JsonBuilder to a Builder (note that I'm using the new
bytestring Builder here) is simply a matter of executing the right
Builder for each instruction:
toBuilder :: JsonBuilder -> Builder
toBuilder (JsonBuilder g) = go (g IEnd)
 where
  go :: IStream -> Builder
  go is = case is of
    INull             is' -> nullB               <> go is'
    ITrue             is' -> trueB               <> go is'
    IFalse            is' -> falseB              <> go is'
    IDoubleQuote      is' -> char8          '"'  <> go is'
    IChar        c    is' -> char           c    <> go is'
    IString      cs   is' -> string         cs   <> go is'
    IText        t    is' -> text           t    <> go is'
    IInt         i    is' -> intDec         i    <> go is'
    IInt8        i8   is' -> int8Dec        i8   <> go is'
    IInt16       i16  is' -> int16Dec       i16  <> go is'
    IInt32       i32  is' -> int32Dec       i32  <> go is'
    IInt64       i64  is' -> int64Dec       i64  <> go is'
    IWord        w    is' -> wordDec        w    <> go is'
    IWord8       w8   is' -> word8Dec       w8   <> go is'
    IWord16      w16  is' -> word16Dec      w16  <> go is'
    IWord32      w32  is' -> word32Dec      w32  <> go is'
    IWord64      w64  is' -> word64Dec      w64  <> go is'
    IFloat       f    is' -> floatDec       f    <> go is'
    IDouble      d    is' -> doubleDec      d    <> go is'
    IInteger     i    is' -> integerDec     i    <> go is'
    IScientific  s    is' -> fromScientific s    <> go is'
    IComma            is' -> char8          ','  <> go is'
    IBeginArray       is' -> char8          '['  <> go is'
    IEndArray         is' -> char8          ']'  <> go is'
    IBeginObject      is' -> char8          '{'  <> go is'
    IEndObject        is' -> char8          '}'  <> go is'
    IColon            is' -> char8          ':'  <> go is'
    IValue       v    is' -> fromValue      v    <> go is'
    -- Fused:
    IBeginObject_IDoubleQuote is'-> fixed2('{','"')<> go is'
    IComma_IDoubleQuote       is'-> fixed2(',','"')<> go is'
    -- TODO: more
    IEnd -> mempty
nullB :: Builder
nullB = fixed4 ('n',('u',('l','l')))
{-# INLINE nullB #-}
trueB :: Builder
trueB = fixed4 ('t',('r',('u','e')))
{-# INLINE trueB #-}
falseB :: Builder
falseB = fixed5 ('f',('a',('l',('s','e'))))
{-# INLINE falseB #-}
fixed2 :: (Char, Char) -> Builder
fixed2 = P.primFixed (P.char8 >*< P.char8)
{-# INLINE fixed2 #-}
fixed4 :: (Char, (Char, (Char, Char))) -> Builder
fixed4 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed4 #-}
fixed5 :: (Char, (Char, (Char, (Char, Char)))) -> Builder
fixed5 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8 >*< P.char8)
{-# INLINE fixed5 #-}
This representation allows a lot of optimizations. For example we can
define rewrite rules that "fuse" the Builders of common sequences
like:
{-# RULES
  "IBeginObject_IDoubleQuote" forall is.
   IBeginObject (IDoubleQuote is) =
   IBeginObject_IDoubleQuote is #-}
{-# RULES
  "IComma_IDoubleQuote" forall is.
   IComma (IDoubleQuote is) =
   IComma_IDoubleQuote  is #-}
The encoder can handle these common sequences more efficiently.
Of course the JsonBuilder is abstract to the user. There will be a
safe API to construct well-formed JsonBuilders. (While writing this I
realize that users will be able to use the Monoid instance for
JsonBuilders which is undesirable. I will solve this by wrapping the
JsonBuilder returned from toJSON in another newtype which doesn't have
a Monoid instance)
What do we loose? In the current API of aeson, toJSON will directly
return a Value. This Value can then be inspected or extended. In order
to do the same in the new API the JsonBuilder first has to be parsed
to a Value which is less efficient. However, if the new API proves to
be significantly more efficient for encoding I think this extra
parsing cost is warranted since it's far less common than encoding.
A first version of this API will soon be ready and I will push that to
my github. Hopefully I can come up with some convincing benchmarks!
Bas
    
    
More information about the Haskell-platform
mailing list