[This memo started as a follow-up to the e-mail exchange between Henrik and your humble author (now recorded in the TODO file), but, on reviewing it, I thought that it would perhaps be of some interest to the broader readership of cl-json-devel. This is to explain the rather personal style of discourse, and make apologies to a reader who might —but who need not!— feel excluded.]
§1. The design of the improved decoder interface laid out below came to be as I was considering an application of CL-JSON that would involve local structured storage of transmitted data in a Berkeley DB. In such a setting one would like to bypass creating Lisp data structures from JSON input, instead packing it directly to FFI C data to be processed by libdb. The question that I asked myself was how to (re)design the decoder so that this task could be dealt with gracefully. I have also checked the scheme against the several probable sets of requirements you had described in your e-mail:
Configurable: I really would want to have it optional exactly how to decode and encode objects. You are probably right in that your solution is a good default behaviour, but it will not be perfect in every situation for every user. So if possible, I would like to have the decoding and encoding configurable. So backwards compatibility does not really mean backwards, it means compatibility between different setups. For example, when doing testcases or a simple json-bind the old alist setup is good, for a more advanced setup your code is great. For a secure setup you will probably want to have access control to what objects you are allowed to create and so on. One solution will never be sufficient for everyone.
§2. Hence, the general principles which should be obeyed by the implementation, are:
A. Separation of concerns. The implementation should comprise a fixed basic level to handle the parsing, JSON well-formedness, and flow control; and a customizable level to produce Lisp data (or perform some other JSON-driven task, as it might please the user to enjoin). You were very rightfully incensed against my intermingling parsing with looking for prototype in READ-JSON-OBJECT. It will become clear below how this can be done away with.
B. Fine grain. Not only the handling of objects, but also that of other JSON types should be customizable. In particular, the handling of arrays and strings should be customizable on elemental level, i.e. the user should have a way to determine how the decoder handles elements of arrays and strings. In handling objects, customization should be available for keys as well as for key-value pairs.
§3. This suggests a design similar in spirit to SAX, with a set of handlers triggered by “events”. The current implementation partly follows this scheme by providing the *JSON-OBJECT-FACTORY... callbacks, but there are more kinds of events than can be handled by those three; here's a tentatively exhaustive list:
1. An atomic constant (integer, real, or boolean).
2a. Beginning of string. 2e. A string character or escape sequence. 2z. End of string.
3a. Beginning of array. 3e. An array element. 3z. End of array.
4a. Beginning of object. 4k. An object property key. 4v. An object property value. 4z. End of object.
Accordingly, we need at least as many handlers. The handlers for (1), (2e), (3e), (4k/v) shall be passed the token that triggered the event; (2a), (3a), and (4a) shall produce some fresh “accumulator” value that is then piped through successive calls to (2e), (3e), and (4k/v), respectively. E.g., reading the JSON
{"f\u0151o": [1, true]}
could result in the following flow of calls to handlers (I use mnemonic names of function-valued handler variables, which are certainly not definitive):
READ-JSON-OBJECT 4a: *BEGINNING-OF-OBJECT-HANDLER* () produces some value (⇒) O READ-JSON-STRING 2a: *BEGINNING-OF-STRING-HANDLER* () ⇒ S 2e: *STRING-CHAR-HANDLER* (#x66, S) ⇒ S′ 2e: *STRING-CHAR-HANDLER* (#x151, S′) ⇒ S″ 2e: *STRING-CHAR-HANDLER* (#x6f, S″) ⇒ S‴ 2z: *END-OF-STRING-HANDLER* (S‴) ⇒ T 4k: *OBJECT-KEY-HANDLER* (T) ⇒ K READ-JSON-ARRAY 3a: *BEGINNING-OF-ARRAY-HANDLER* () ⇒ A 1: *INTEGER-HANDLER* ("1") ⇒ I 3e: *ARRAY-ELEMENT-HANDLER* (I, A) ⇒ A′ 1: *BOOLEAN-HANDLER* ("true") ⇒ B 3e: *ARRAY-ELEMENT-HANDLER* (B, A′) ⇒ A″ 3z: *END-OF-ARRAY-HANDLER* (A″) ⇒ V 4v: *OBJECT-VALUE-HANDLER* (K, V, O) ⇒ O′ 4z: *END-OF-OBJECT-HANDLER* (O′) ⇒ P, which is also the return value of READ-JSON-OBJECT.
The nature of the values O, S, S′, S″, S‴, T, etc. is of absolutely no concern to the base level of the decoder whose only duty is to faithfully pass them around. For one, a dumb JSON syntax checker can have all handlers set to (CONSTANTLY T).
§4. Now the current list semantics can be expressed straightforwardly in terms of these handlers:
*BEGINNING-OF-ARRAY-HANDLER* ⇒ (lambda () (let ((list (cons nil nil))) ; First element is never used (cons list list))) ; First and last pair of the list
*ARRAY-ELEMENT-HANDLER* ⇒ (lambda (elt accumulator) (destructuring-bind (head . last) accumulator (cons head (setf (cdr last) (cons elt nil)))))
*END-OF-ARRAY-HANDLER* ⇒ (lambda (accumulator) (coerce (cdar accumulator) *json-array-type*))
*BEGINNING-OF-OBJECT-HANDLER* ⇒ same as *BEGINNING-OF-ARRAY-HANDLER*
*OBJECT-KEY-HANDLER* ⇒ #'json-intern
*OBJECT-VALUE-HANDLER* ⇒ (lambda (key value accumulator) (destructuring-bind (head . last) accumulator (cons head (setf (cdr last) (cons (cons key value) nil)))))
*END-OF-OBJECT-HANDLER* ⇒ #'cdar
*INTEGER-HANDLER* ⇒ #'parse-integer
You can easily reconstruct the rest.
§5. Our CLOS semantics is a more tricky affair. If I may allow myself to reiterate what I previously said about the handling of prototype fields:
READ-JSON-OBJECT works by reading in the key string and the colon separator, and then recursively calling READ-JSON to consume characters from the input and construct the object which would be the corresponding value. With my modifications, when the key "prototype" (or whatever the name shoud be) is encountered on the input, the decoder is told that the value we are going to construct shall be the prototype object. If we were not able to communicate this beforehand, we'd have to do much post-processing of the factory object: look up the prototype (note that, at this point, we would not know the package and could not yet intern the keys, so that the matching would be done on strings, degrading the performance), convert it to some internal format (note that we would not be able to predict accurately enough what it would have been decoded to, as that may be influenced by user-side configuration), remove the prototype and key from the factory, and only then could we create the object.
Put rather abstractly, to overcome this difficulty we need a means to pass certain information from outer to inner recursive calls of READ-JSON-OBJECT, and between handlers invoked on the same level. I currently see two options of implementing this:
A. “Accumulator” return values of (3a/e) and (4a/k/v) handlers (what is signified by A, A′, ..., and O, O′ in the above example) are passed down as additional arguments to the recursive calls to READ-JSON-OBJECT and READ-JSON-ARRAY, and are then also passed to (3a) and (4a) handlers in these inner functions (and perhaps also to (1) and (2a)). The (4k) handler, like (4v), receives and produces an “accumulator” value rather than a representation of the key. Thus, the flow in the above example becomes:
... 4k: *OBJECT-KEY-HANDLER* (T, O) ⇒ O′ READ-JSON-ARRAY (..., O′) 3a: *BEGINNING-OF-ARRAY-HANDLER* (O′) ⇒ A ... 3z: *END-OF-ARRAY-HANDLER* (A″) ⇒ V 4v: *OBJECT-VALUE-HANDLER* (V, O′) ⇒ O″ ...
This would allow for an implementation of the CLOS semantics along the following lines:
*BEGINNING-OF-OBJECT-HANDLER* ⇒ (lambda (&optional (above-accumulator nil not-toplevel)) (let* ((prototype (if not-toplevel (caar above-accumulator))) (list (cons prototype nil))) (cons list list)))
*OBJECT-KEY-HANDLER* ⇒ (lambda (key accumulator) (destructuring-bind (head . last) accumulator (let ((prototype (car head))) (if (and (not prototype) (string= key (symbol-name *prototype-name*))) (cons (cons t (cdr head)) last) (cons head (setf (cdr last) (cons (cons key nil) nil)))))))
*OBJECT-VALUE-HANDLER* ⇒ (lambda (value accumulator) (destructuring-bind (head . last) accumulator (if (typep value 'prototype) (cons (cons value (cdr head)) last) (progn (setf (cdar last) value) accumulator))))
*END-OF-OBJECT-HANDLER* ⇒ (lambda (accumulator) (destructuring-bind ((prototype . fields) . last) accumulator (let ((*json-object-prototype* prototype)) (json-factory-make-object fields))))
B. In addition to custom handlers, the user is given the possibility to provide a list of dynamic variables which he wishes to have “JSON-structure scope”. That is, the bodies of READ-JSON-OBJECT and READ-JSON-ARRAY are wrapped in PROGVs which establish new dynamic bindings for these variables (using their outer-level bindings for respective initial values). If a handler sets a structure-scope variable, the new value is then visible to all subsequent handlers until the current READ-JSON-OBJECT or READ-JSON-ARRAY loop exits. By constrast, the handlers on the outer levels never see the value change. The flow in the example is unaltered, and the CLOS semantics is implemented thus:
*JSON-STRUCTURE-SCOPE-VARIABLES* ⇒ '(*json-object-prototype*)
*BEGINNING-OF-OBJECT-HANDLER* ⇒ (lambda () (let ((list (cons nil nil))) (cons list list)))
*OBJECT-KEY-HANDLER* ⇒ (lambda (key) (if (and (not *json-object-prototype*) (string= key (symbol-name *prototype-name*))) (setq *json-object-prototype* t)) key)
*OBJECT-VALUE-HANDLER* ⇒ (lambda (key value accumulator) (destructuring-bind (head . last) accumulator (if (typep value 'prototype) (progn (setq *json-object-prototype* value) accumulator) (cons head (setf (cdr last) (cons (cons key value) nil))))))
*END-OF-OBJECT-HANDLER* ⇒ (lambda (accumulator) (json-factory-make-object (cdar accumulator)))
The choice between the options A and B seems to be one between simplicity of interface and efficiency: a lot of nested PROGVs are likely to incur some cost overhead (I do not feel myself qualified enough to predict exactly how much). On the other hand, handlers are rather more readable, as you can see in the above comparison, and the interface also stays more intuitive that way.
§6. Returning to my BDB + CL-JSON application: the (2a), (3a), (4a) handlers coud be customized to call data constructor / initializer functions over the FFI, and (2z), (3z), (4z) to call the DB put method. The uppermost call to READ-JSON would be wrapped up as a transaction. That would be it.
Sincerely, - B. Smilga.
On Wed, Jul 23, 2008 at 7:32 PM, Boris Smilga boris.smilga@gmail.com wrote:
§3. This suggests a design similar in spirit to SAX, with a set of handlers triggered by "events".
I like your idea, it will make cl-json very configurable. You are on the spot on the requirements list in my opinion. Here are some thoughts:
1. There will be a lot of variables for the callback functions. Some may say it is ugly.
But: if we can encapsule them with a function to set them all, I think it is OK.
The alternative is probably to implement this OO style with what I think they call the "strategy" pattern, but I don't think it is obviously prettier and the performance will probably be a little bit worse.
2. I think the A (Accumulator) version seems to be good, and If it has other advantages as you write, I think we should go for it.
3. Will you do it, or should we just put it on the TODO list until someone wants to do it?
By the way, some things that I might want to do if we should release a new version: * remove the parenscript dependency. * add a little sample implementation of JSONP
Any other wishes?
/Henrik Hjelte
On Sun, Aug 3, 2008 at 15:14, Henrik Hjelte henrik@evahjelte.com wrote:
Any other wishes?
I don't quite like the serialization mechanism that cl-json provides. The attempt to map from a Lisp datatype to a certain json structure is necessarily imperfect because no 1:1 relationship exists, and the requirement to first make up a data structure and then call encode-json to convert it to a json string is wasteful.
A streaming serialization API is more useful, as one has more control over the json format that is being generated and the need to make up a data structure that can uniquely be mapped to json structures is removed. It is inspired by CXML's streaming serialization which I find very easy and straightforward to use.
I have written something that allows my application code to use a similar scheme, see http://bknr.net/trac/browser/trunk/projects/quickhoney/src/handlers.lisp?rev... - The implementation of the json serializer is in http://bknr.net/trac/browser/trunk/projects/quickhoney/src/json.lisp, but it is kind of hackish because it tries to reuse some of cl-json's serialization facilities for atomic types and jumps some hoops to make the correct separators between serialized elements be generated.
This could certainly be improved, yet it works for me and I'd be glad to see this or a similar mechanism be integrated into cl-json.
-Hans
On Mon, Aug 4, 2008 at 1:58 AM, Hans Hübner hans@huebner.org wrote:
I don't quite like the serialization mechanism that cl-json provides. The attempt to map from a Lisp datatype to a certain json structure is necessarily imperfect because no 1:1 relationship exists, and the requirement to first make up a data structure and then call encode-json to convert it to a json string is wasteful.
In principle yes. But to defend the current way, it make a clear division of concerns, separating the code for formating output from the code for logic.
A streaming serialization API is more useful, as one has more control over the json format that is being generated and the need to make up a data structure that can uniquely be mapped to json structures is removed.
I agree to a large degree, but in many situations it is nice to have a simple to use format and API. What is simple is of course up to debate. But in the sample below, I think cl-jsons current encoder could save some code lines. But, why not have both, I am sure there are lots of situations were your idea will be better?
In a way this is the reverse situation of the decoder issue that Boris has.
It is inspired by CXML's streaming serialization which I find very easy and straightforward to use.
It is the SAX thing that is used for serailization that is the inspiration? CXML also has a klacks parser which is inspired from the Java Streaming XML API, and I know there is a project jettison that implements JSON for this API.
A cool feature would be if a json parser/serializer could be used as a simple drop in replacement for an existing xml parser/geneneratior, there are a few to choose from.
This could certainly be improved, yet it works for me
I haven't looked much at the internals, but the API seems very good from the example below.
and I'd be glad to see this or a similar mechanism be integrated into cl-json.
I agree, I think we should do something along your lines in cl-json. But I also want to preserve the current simple interface.
/Henrik
This is a neat sample from your code. ;;------------------------------------------------------------------------------ (defmethod handle-object ((handler json-news-archive-handler) (channel rss-channel)) (with-json-response () (with-object-element ("months") (with-json-array () (dolist (month (sort (rss-channel-archived-months channel) (lambda (a b) (if (= (first a) (first b)) (> (second a) (second b)) (> (first a) (first b)))))) (with-json-array () (encode-array-element (first month)) (encode-array-element (second month))))))))
;;------------------------------------------------------------------------------ ;; If I am not wrong, this is how it would look in the ;; current cl-json implementation.
(defmethod handle-object ((handler json-news-archive-handler) (channel rss-channel)) (encode-json-response ;; or similar (with-object-element ("months") (sort (rss-channel-archived-months channel) (lambda (a b) (if (= (first a) (first b)) (> (second a) (second b)) (> (first a) (first b))))))))
On Mon, Aug 4, 2008 at 05:47, Henrik Hjelte henrik@evahjelte.com wrote:
On Mon, Aug 4, 2008 at 1:58 AM, Hans Hübner hans@huebner.org wrote:
I don't quite like the serialization mechanism that cl-json provides. The attempt to map from a Lisp datatype to a certain json structure is necessarily imperfect because no 1:1 relationship exists, and the requirement to first make up a data structure and then call encode-json to convert it to a json string is wasteful.
In principle yes. But to defend the current way, it make a clear division of concerns, separating the code for formating output from the code for logic.
I don't quite understand what you mean by that. My concern with the current mechanism is that it is not transparent: As there is no 1:1 mapping between native Lisp datatypes and json structures, the application needs to be aware of the desired JSON structure format anyway. One would have to have kind of a "JSON Schema" that is external to the application and describes the format in an abstract way if one really wanted to keep the format specification outside of the application code. I am not sure if that would be the right direction, though. The idea of JSON is to keep things simple.
A streaming serialization API is more useful, as one has more control over the json format that is being generated and the need to make up a data structure that can uniquely be mapped to json structures is removed.
I agree to a large degree, but in many situations it is nice to have a simple to use format and API. What is simple is of course up to debate. But in the sample below, I think cl-jsons current encoder could save some code lines. But, why not have both, I am sure there are lots of situations were your idea will be better?
I am not advocating that the current API should be removed. I just don't like it and would have to have more control over the format without having to first create a data structure that accidentially serializes to what I want to have. Even with a specially crafted structure, I would have to make explicit calls to specialized JSON functions to select how I want lists to be serialized. A streaming API would be unambiguous in that respect, as it does not iterate over data structures itself.
I am a big fan of saving code lines, but in general, terseness can't overrule correctness and ambiguity should be avoided.
In a way this is the reverse situation of the decoder issue that Boris has.
It is inspired by CXML's streaming serialization which I find very easy and straightforward to use.
It is the SAX thing that is used for serailization that is the inspiration? CXML also has a klacks parser which is inspired from the Java Streaming XML API, and I know there is a project jettison that implements JSON for this API.
The CXML XML serializer is inspired by SAX.
A cool feature would be if a json parser/serializer could be used as a simple drop in replacement for an existing xml parser/geneneratior, there are a few to choose from.
I'm not sure if you would not end up with a half-baked solution that works either inefficient or restricts both the XML schemata and JSON structures that you can create interchangeably. I would leave such a plug-compatible layer up to the application programmer to write, if desired.
;;------------------------------------------------------------------------------ (defmethod handle-object ((handler json-news-archive-handler) (channel rss-channel)) (with-json-response () (with-object-element ("months") (with-json-array () (dolist (month (sort (rss-channel-archived-months channel) (lambda (a b) (if (= (first a) (first b)) (> (second a) (second b)) (> (first a) (first b)))))) (with-json-array () (encode-array-element (first month)) (encode-array-element (second month))))))))
;;------------------------------------------------------------------------------ ;; If I am not wrong, this is how it would look in the ;; current cl-json implementation.
(defmethod handle-object ((handler json-news-archive-handler) (channel rss-channel)) (encode-json-response ;; or similar (with-object-element ("months") (sort (rss-channel-archived-months channel) (lambda (a b) (if (= (first a) (first b)) (> (second a) (second b)) (> (first a) (first b))))))))
I am not sure how an ENCODE-JSON-RESPONSE function would decide whether a list should be serialized as an object or as an array. This is the basic problem that the streaming API solves. I think that no ENCODE-JSON-RESPONSE function exists that matches the requirements of every application, and as such it would be better to provide the building blocks that can be used to universally generate any JSON format that exists.
-Hans
On 4 Aug 2008, at 00:14, Henrik Hjelte wrote:
I like your idea, it will make cl-json very configurable. You are on the spot on the requirements list in my opinion. Here are some thoughts:
- There will be a lot of variables for the callback functions. Some
may say it is ugly.
But: if we can encapsule them with a function to set them all, I think it is OK.
Absolutely. I would say, something like:
(defun set-handlers (&key boolean integer float start-string string-char end-string start-array array-element end-array start-object object-key object-element end-object) (setf *boolean-handler* boolean *integer-handler* integer #| the same for other handlers |# ))
The alternative is probably to implement this OO style with what I think they call the "strategy" pattern, but I don't think it is obviously prettier and the performance will probably be a little bit worse.
Do you mean, along the lines of
(defclass json-parser () ((boolean-handler :accessor boolean-handler :initarg :boolean :type function :allocation :class) #| more handler slots |# ))
with specific parser semantics being subclasses of the same?
- I think the A (Accumulator) version seems to be good, and If it has
other advantages as you write, I think we should go for it.
Actually, I rather prefer B, which is cleaner, though, probably, worse performing. I think I'm going to implement both and run some heavy workload tests to see how bad the regression could be, and then we'll decide.
- Will you do it, or should we just put it on the TODO list until
someone wants to do it?
I'll do it. It's not that much work. For the time being, let's keep things in a separate file (or even two of them, for the two options of devolving information). We'll then copy over the better one into decoder.lisp and discard the other one.
By the way, some things that I might want to do if we should release a new version:
- remove the parenscript dependency.
The only feature of parenscript used by cl-json is the function SYMBOL-TO-JS. I can, of course, substitute my own implementation.
- add a little sample implementation of JSONP
This, in my best judgement, is not the most urgent priority, and can be easily added after improving the encoder (e.g. in the way that Hans argues for).
Sincerely, - B.Sm.
Now, as to what Hans has proposed regarding the encoder. I completely side with his idea of a streaming API. Currently, the only possible means of customization is to add methods to the ENCODE-JSON generic function. This is, obviously, not very good, because (a) the GF eventually gets cluttered by multitude obscure heterogenous methods; (b) it is not possible to have different encoding strategies for one and the same data type; (c) the library really adds no value, as the technicalities of JSON syntax still have to be dealt with inside the new method.
I have to admit I don't quite understand Henrik's objections against the proposal. As far as I see, there is no conflict at all between that and the current simple API, as the latter can be absolutely gracefully reimplemented on top of the former. E.g., instead of
(defmethod encode-json ((s sequence) stream) (with-sequence-iterator (generator s) (write-json-array generator stream)))
we could write something like
(defmethod encode-json ((s sequence) stream) (with-json-array (:stream stream) (map nil #'encode-array-element s)))
The user of the library is, of course, free to write his own encoder specialized for his own classes, as nothing prevents him from calling encode-json in his default method, e.g.:
(defmethod my-own-encode-json ((obj my-class) stream) (with-json-object (:stream stream) (with-object-element ("initial") (my-own-encode-json (get-value-for-initial-field obj) stream)) (with-object-element ("subsequent") (my-own-encode-json (get-value-for-subsequent-field obj) stream))))
(defmethod my-own-encode-json ((value t) stream) (encode-json value stream))
CL-JSON can very well provide some syntactic sugar for objects, along the lines of:
(defmacro with-json-object-for-fields ((field (&rest fields) &key stream) &body body) `(with-json-object (:stream stream) ,@(loop for (name getter) in fields collect `(with-object-element (,name) (let ((,field ,getter)) ,@body)))))
Which allows us to rewrite the method above as
(defmethod my-own-encode-json ((obj my-class) stream) (with-json-object-for-slots (slot (("initial" (get-value-for-initial-field obj)) ("subsequent" (get-value-for-subsequent-field obj))) :stream stream) (my-own-encode-json slot stream)))
This kind of encoder transparency shall make for a natural and useful counterpart to decoder customizability.
Sincerely, - B. Smilga.
P.S. I was thinking about a customizable encoder myself, but only came up with a very clumsy idea of exporting WRITE-JSON-OBJECT and WRITE-JSON-ARRAY, and having their GENERATOR-FN arguments return callbacks where customized encoding is needed. We would have
(defmethod encode-json ((fn function) stream) (funcall fn stream))
and then use that in the following manner:
(defmethod my-own-encode-json ((obj my-class) stream) (write-json-object (make-my-class-iterator obj) stream))
(defun make-my-class-iterator (obj) (let ((state :initial)) (lambda () (ecase state (:initial (setq state :subsequent) (values t :initial-field (lambda (stream) ; a callback (my-own-encode-json (get-value-for-initial-field obj) stream)))) (:subsequent (setq state :final) (values t :subsequent-field (lambda (stream) ; another one (my-own-encode-json (get-value-for-subsequent-field obj) stream)))) (:final nil)))))
Compared to Hans's approach, mine is just an insult to good taste and common sense.
On Wed, Aug 20, 2008 at 2:57 AM, Boris Smilga boris.smilga@gmail.com wrote:
Now, as to what Hans has proposed regarding the encoder. I completely side with his idea of a streaming API.
I have to admit I don't quite understand Henrik's objections against the proposal.
I really don't have any objections against it, I think it is good. The only thing is that I also want to preserve the old way, since it suits me fine.
As far as I see, there is no conflict at all between that and the current simple API, as the latter can be absolutely gracefully reimplemented on top of the former.
Great, thanks for clarifying that.
/Henrik