[cl-json-devel] proposed: improvements to decoder customization

20 Aug 2008

      On 7 Aug 2008, at 03:51, Boris Smilga wrote:
...
On 4 Aug 2008, at 00:14, Henrik Hjelte wrote:
...
2. I think the A (Accumulator) version seems to be good, and If it  
has
other advantages as you write, I think we should go for it.
Actually, I rather prefer B, which is cleaner, though, probably, worse
performing.  I think I'm going to implement both and run some heavy
workload tests to see how bad the regression could be, and then we'll
decide.
Please find attached a patch bundle with the implementation of
the decoder in two flavours: with extra arguments and with
dynamic json-structure-scope variables.  They are in two separate
files, with much code overlap between the two.  So let us choose
which one we eventually want.

The workload tests were done as follows.  First, I generated a
random huge json file (see the code in the attached file
random-json.lisp; the output was generated by running

(with-open-file (s #p"test.in" :direction :output)
   (generate-random-json s 120))

which yielded some 214 KB of json data, with maximum depth of
nesting 21).  The contents of the file was then fed to
DECODE-JSON under different setups, by running

(with-open-file (s #p"test.in")
   (time (decode-json s)))

NB: To get comparable results for CLOS semantics, I had to reboot
Lisp between tests.  This is because implementations usually
employ optimizations geared toward stable CLOS-style classes.
With the kind of transient classes that we create from
prototypeless JSON objects the optimizations actually can cause
performance to degrade, sometimes severely.  See my exchange with
Gary Byers from Clozure on the openmcl-devel mailing list:
http://clozure.com/pipermail/openmcl-devel/2008-August/008458.html,
http://clozure.com/pipermail/openmcl-devel/2008-August/008463.html.
Hence, now that we have a customizable decoder, I would either
scrape transient classes altogether, substituting some default
prototype (e.g. {"lispClass":"cons"}, or some wrapper around
alists, as in st-json), or else post big notices in red letters:
“Thou Shalt Not Omit Prototypes” all over the place.

The test results are thus:

Clozure CL v. 1.2 RC1 (DarwinPPC32) on 1.5 GHz PowerBook G4 with
1.25 GB RAM.

   List semantics:
   With the extra-args approach: 1.442 sec. run time, 10,209,120
   bytes allocated.
   With the dynamic-vars approach: 1.571 sec. run time, 12,317,928
   bytes allocated.

   CLOS semantics:
   Extra-args: 10.870 sec. run time, 34,559,256 bytes allocated.
   Dynamic-vars: 10.961 sec. run time, 34,972,216 bytes allocated.

************

SBCL 1.0.18 on the same.

   List semantics:
   Extra-args: 1.163 sec. run time, 18,111,072 bytes allocated.
   Dynamic-vars: 1.218 sec. run time, 18,407,344 bytes allocated.

   CLOS semantics:
   Extra-args: 7.193 sec. run time, 96,173,152 bytes allocated.
   Dynamic-vars: 7.451 sec. run time, 96,767,968 bytes allocated.

************

CLisp 2.40 on the same:

   List semantics:
   Extra-args: 1.351 sec. run time, 8,328,464 bytes allocated.
   Dynamic-vars: 1.406 sec. run time, 8,447,900 bytes allocated.

   CLOS semantics:
   Extra-args: 9.057 sec. run time, 28,362,544 bytes allocated.
   Dynamic-vars: 9.112 sec. run time, 28,775,504 bytes allocated.

************

SBCL 1.0.17 on 2 GHz AMD 64 3000+ with 512 MB RAM.

   List semantics:
   Extra-args: 0.472 sec. run time, 19,891,336 bytes allocated.
   Dynamic-vars: 0.471 sec. run time, 20,187,112 bytes allocated.

   CLOS semantics:
   Extra-args: 2.171 sec. run time, 80,166,760 bytes allocated.
   Dynamic-vars: 2.180 sec. run time, 80,724,960 bytes allocated.

I would say that the regression is fairly benign, never more than
9% run time and 20% memory, on average 2% run time and 4% memory.
I allow myself once again to state my preference for the
dynamic-vars approach, which has a much cleaner interface.

Cleaner—especially so as the elaborate passing-around of handler
states could be done away with altogether and replaced by
handler-side changing of the state of dynamic variables.  The
implementation of the two simple semantics in decoder-vars.lisp
illustrates this point: a beginning-of-structure handler
initializes the variables *ACCUMULATOR* and *ACCUMULATOR-LAST*,
and structure-element handlers imperatively modify their state.
As the variables have json-structure scope, sub-structures cannot
clobber their parent structures' accumulators.

There were two more important sets of changes.  Firstly, the
basic json reader was redesigned to work in terms of tokens.  A
token is either a number literal, a symbol, or a punctuation
mark (string characters and escape sequences can be seen as
tokens as well).  This modularizes the design into a lower-level
tokenizer part and a higher-level decoder part.

Secondly, I have substituted my own version of CAMEL-CASE-TO-LISP
to handle conventions like "JSONAllCapitals"
(⇒ "+JSON+-ALL-CAPITALS"), "TWO_WORDS" (⇒ "+TWO-WORDS+"), and
"camelCase_mixed_4_PARTS" (⇒ "CAMEL-CASE--MIXED--+4-PARTS+").

Sincerely,
  - B. Smilga.