[This memo started as a follow-up to the e-mail exchange between Henrik and your humble author (now recorded in the TODO file), but, on reviewing it, I thought that it would perhaps be of some interest to the broader readership of cl-json-devel. This is to explain the rather personal style of discourse, and make apologies to a reader who might —but who need not!— feel excluded.]
§1. The design of the improved decoder interface laid out below came to be as I was considering an application of CL-JSON that would involve local structured storage of transmitted data in a Berkeley DB. In such a setting one would like to bypass creating Lisp data structures from JSON input, instead packing it directly to FFI C data to be processed by libdb. The question that I asked myself was how to (re)design the decoder so that this task could be dealt with gracefully. I have also checked the scheme against the several probable sets of requirements you had described in your e-mail:
Configurable: I really would want to have it optional exactly how to decode and encode objects. You are probably right in that your solution is a good default behaviour, but it will not be perfect in every situation for every user. So if possible, I would like to have the decoding and encoding configurable. So backwards compatibility does not really mean backwards, it means compatibility between different setups. For example, when doing testcases or a simple json-bind the old alist setup is good, for a more advanced setup your code is great. For a secure setup you will probably want to have access control to what objects you are allowed to create and so on. One solution will never be sufficient for everyone.
§2. Hence, the general principles which should be obeyed by the implementation, are:
A. Separation of concerns. The implementation should comprise a fixed basic level to handle the parsing, JSON well-formedness, and flow control; and a customizable level to produce Lisp data (or perform some other JSON-driven task, as it might please the user to enjoin). You were very rightfully incensed against my intermingling parsing with looking for prototype in READ-JSON-OBJECT. It will become clear below how this can be done away with.
B. Fine grain. Not only the handling of objects, but also that of other JSON types should be customizable. In particular, the handling of arrays and strings should be customizable on elemental level, i.e. the user should have a way to determine how the decoder handles elements of arrays and strings. In handling objects, customization should be available for keys as well as for key-value pairs.
§3. This suggests a design similar in spirit to SAX, with a set of handlers triggered by “events”. The current implementation partly follows this scheme by providing the *JSON-OBJECT-FACTORY... callbacks, but there are more kinds of events than can be handled by those three; here's a tentatively exhaustive list:
1. An atomic constant (integer, real, or boolean).
2a. Beginning of string. 2e. A string character or escape sequence. 2z. End of string.
3a. Beginning of array. 3e. An array element. 3z. End of array.
4a. Beginning of object. 4k. An object property key. 4v. An object property value. 4z. End of object.
Accordingly, we need at least as many handlers. The handlers for (1), (2e), (3e), (4k/v) shall be passed the token that triggered the event; (2a), (3a), and (4a) shall produce some fresh “accumulator” value that is then piped through successive calls to (2e), (3e), and (4k/v), respectively. E.g., reading the JSON
{"f\u0151o": [1, true]}
could result in the following flow of calls to handlers (I use mnemonic names of function-valued handler variables, which are certainly not definitive):
READ-JSON-OBJECT 4a: *BEGINNING-OF-OBJECT-HANDLER* () produces some value (⇒) O READ-JSON-STRING 2a: *BEGINNING-OF-STRING-HANDLER* () ⇒ S 2e: *STRING-CHAR-HANDLER* (#x66, S) ⇒ S′ 2e: *STRING-CHAR-HANDLER* (#x151, S′) ⇒ S″ 2e: *STRING-CHAR-HANDLER* (#x6f, S″) ⇒ S‴ 2z: *END-OF-STRING-HANDLER* (S‴) ⇒ T 4k: *OBJECT-KEY-HANDLER* (T) ⇒ K READ-JSON-ARRAY 3a: *BEGINNING-OF-ARRAY-HANDLER* () ⇒ A 1: *INTEGER-HANDLER* ("1") ⇒ I 3e: *ARRAY-ELEMENT-HANDLER* (I, A) ⇒ A′ 1: *BOOLEAN-HANDLER* ("true") ⇒ B 3e: *ARRAY-ELEMENT-HANDLER* (B, A′) ⇒ A″ 3z: *END-OF-ARRAY-HANDLER* (A″) ⇒ V 4v: *OBJECT-VALUE-HANDLER* (K, V, O) ⇒ O′ 4z: *END-OF-OBJECT-HANDLER* (O′) ⇒ P, which is also the return value of READ-JSON-OBJECT.
The nature of the values O, S, S′, S″, S‴, T, etc. is of absolutely no concern to the base level of the decoder whose only duty is to faithfully pass them around. For one, a dumb JSON syntax checker can have all handlers set to (CONSTANTLY T).
§4. Now the current list semantics can be expressed straightforwardly in terms of these handlers:
*BEGINNING-OF-ARRAY-HANDLER* ⇒ (lambda () (let ((list (cons nil nil))) ; First element is never used (cons list list))) ; First and last pair of the list
*ARRAY-ELEMENT-HANDLER* ⇒ (lambda (elt accumulator) (destructuring-bind (head . last) accumulator (cons head (setf (cdr last) (cons elt nil)))))
*END-OF-ARRAY-HANDLER* ⇒ (lambda (accumulator) (coerce (cdar accumulator) *json-array-type*))
*BEGINNING-OF-OBJECT-HANDLER* ⇒ same as *BEGINNING-OF-ARRAY-HANDLER*
*OBJECT-KEY-HANDLER* ⇒ #'json-intern
*OBJECT-VALUE-HANDLER* ⇒ (lambda (key value accumulator) (destructuring-bind (head . last) accumulator (cons head (setf (cdr last) (cons (cons key value) nil)))))
*END-OF-OBJECT-HANDLER* ⇒ #'cdar
*INTEGER-HANDLER* ⇒ #'parse-integer
You can easily reconstruct the rest.
§5. Our CLOS semantics is a more tricky affair. If I may allow myself to reiterate what I previously said about the handling of prototype fields:
READ-JSON-OBJECT works by reading in the key string and the colon separator, and then recursively calling READ-JSON to consume characters from the input and construct the object which would be the corresponding value. With my modifications, when the key "prototype" (or whatever the name shoud be) is encountered on the input, the decoder is told that the value we are going to construct shall be the prototype object. If we were not able to communicate this beforehand, we'd have to do much post-processing of the factory object: look up the prototype (note that, at this point, we would not know the package and could not yet intern the keys, so that the matching would be done on strings, degrading the performance), convert it to some internal format (note that we would not be able to predict accurately enough what it would have been decoded to, as that may be influenced by user-side configuration), remove the prototype and key from the factory, and only then could we create the object.
Put rather abstractly, to overcome this difficulty we need a means to pass certain information from outer to inner recursive calls of READ-JSON-OBJECT, and between handlers invoked on the same level. I currently see two options of implementing this:
A. “Accumulator” return values of (3a/e) and (4a/k/v) handlers (what is signified by A, A′, ..., and O, O′ in the above example) are passed down as additional arguments to the recursive calls to READ-JSON-OBJECT and READ-JSON-ARRAY, and are then also passed to (3a) and (4a) handlers in these inner functions (and perhaps also to (1) and (2a)). The (4k) handler, like (4v), receives and produces an “accumulator” value rather than a representation of the key. Thus, the flow in the above example becomes:
... 4k: *OBJECT-KEY-HANDLER* (T, O) ⇒ O′ READ-JSON-ARRAY (..., O′) 3a: *BEGINNING-OF-ARRAY-HANDLER* (O′) ⇒ A ... 3z: *END-OF-ARRAY-HANDLER* (A″) ⇒ V 4v: *OBJECT-VALUE-HANDLER* (V, O′) ⇒ O″ ...
This would allow for an implementation of the CLOS semantics along the following lines:
*BEGINNING-OF-OBJECT-HANDLER* ⇒ (lambda (&optional (above-accumulator nil not-toplevel)) (let* ((prototype (if not-toplevel (caar above-accumulator))) (list (cons prototype nil))) (cons list list)))
*OBJECT-KEY-HANDLER* ⇒ (lambda (key accumulator) (destructuring-bind (head . last) accumulator (let ((prototype (car head))) (if (and (not prototype) (string= key (symbol-name *prototype-name*))) (cons (cons t (cdr head)) last) (cons head (setf (cdr last) (cons (cons key nil) nil)))))))
*OBJECT-VALUE-HANDLER* ⇒ (lambda (value accumulator) (destructuring-bind (head . last) accumulator (if (typep value 'prototype) (cons (cons value (cdr head)) last) (progn (setf (cdar last) value) accumulator))))
*END-OF-OBJECT-HANDLER* ⇒ (lambda (accumulator) (destructuring-bind ((prototype . fields) . last) accumulator (let ((*json-object-prototype* prototype)) (json-factory-make-object fields))))
B. In addition to custom handlers, the user is given the possibility to provide a list of dynamic variables which he wishes to have “JSON-structure scope”. That is, the bodies of READ-JSON-OBJECT and READ-JSON-ARRAY are wrapped in PROGVs which establish new dynamic bindings for these variables (using their outer-level bindings for respective initial values). If a handler sets a structure-scope variable, the new value is then visible to all subsequent handlers until the current READ-JSON-OBJECT or READ-JSON-ARRAY loop exits. By constrast, the handlers on the outer levels never see the value change. The flow in the example is unaltered, and the CLOS semantics is implemented thus:
*JSON-STRUCTURE-SCOPE-VARIABLES* ⇒ '(*json-object-prototype*)
*BEGINNING-OF-OBJECT-HANDLER* ⇒ (lambda () (let ((list (cons nil nil))) (cons list list)))
*OBJECT-KEY-HANDLER* ⇒ (lambda (key) (if (and (not *json-object-prototype*) (string= key (symbol-name *prototype-name*))) (setq *json-object-prototype* t)) key)
*OBJECT-VALUE-HANDLER* ⇒ (lambda (key value accumulator) (destructuring-bind (head . last) accumulator (if (typep value 'prototype) (progn (setq *json-object-prototype* value) accumulator) (cons head (setf (cdr last) (cons (cons key value) nil))))))
*END-OF-OBJECT-HANDLER* ⇒ (lambda (accumulator) (json-factory-make-object (cdar accumulator)))
The choice between the options A and B seems to be one between simplicity of interface and efficiency: a lot of nested PROGVs are likely to incur some cost overhead (I do not feel myself qualified enough to predict exactly how much). On the other hand, handlers are rather more readable, as you can see in the above comparison, and the interface also stays more intuitive that way.
§6. Returning to my BDB + CL-JSON application: the (2a), (3a), (4a) handlers coud be customized to call data constructor / initializer functions over the FFI, and (2z), (3z), (4z) to call the DB put method. The uppermost call to READ-JSON would be wrapped up as a transaction. That would be it.
Sincerely, - B. Smilga.