Hi,
I've only just started using cl-json, but kanji don't seem to be encoded properly, using the darcs version. The problem can be resolved by changing the following conditional (see diff below), so I'm guessing that's a bug?
Thanks for making cl-json available btw.
Cheers,
Chris Laux
105c105 < ((> code #x1f) ---
((<= code #x1f)
Hello, thanks for reporting the problem. My ambition with cl-json is that is should be completely trustworthy and fully tested. It has unit tests that are based on the official testcase data for json and lots of other testcases.
However, in your case I am not sure I get the problem and the fix:
On 10/23/07, Chris Laux chris@terraminds.com wrote:
but kanji don't seem to be encoded properly, using the darcs version. The problem can be resolved by changing the following conditional (see diff below), so I'm guessing that's a bug?
< ((> code #x1f)
((<= code #x1f)
Do you want to change write-json-chars to encode every character with a code *above* #x1f to a numeric unicode code? If you are having some sort of unicode decoding problem I can see that this indeed will fix your problem, but it is not a proper solution. That means you are having problems with kanji unicode somewhere else in your environment, in your Lisp, your webserver or whatever, and encoding each character as a numeric unicode number is just a quick fix. (and it is not how json is intended to look like, even though it is strictly valid). Or did I completely misunderstand it?
Also, can you elaborate on the problem. Is it possible to state the problem in a way that I can make a testcase from (expected-input > do something > expected output)?
Also, what is your environment. Which lisp, os and so on.
Can your Lisp handle unicode? That is I guess an implicit requirement for cl-json, maybe I should make that clearer.
Thanks again, I would really like to help you sort this out.
/Henrik Hjelte
This is how it the code looks now (and should look in my opinion). Characters below #x1f are encoded as hex uncode values, character above are written as characters (write-char), we trust your lisp system uses unicode. Some special characters are written in other ways, for example newlines as \n.
(defun write-json-chars (s stream) (declare (inline lisp-special-char-to-json)) (loop for ch across s for code = (char-code ch) for special = (lisp-special-char-to-json ch) do (cond ((and special (not (char= special #/))) (write-char #\ stream) (write-char special stream)) ((<= code #x1f) (format stream "\u~4,'0x" code)) (t (write-char ch stream)))))
Ok sorry about that, I was too tired for working and should have slept over that email. You're right of course, the low codes need to be escaped. I saw other JSON APIs escaping all the "high" unicode chars (e.g kanji) and thought it was part of the standard.
As I don't develop javascript myself and only offer an API with JSON output, I don't really know if there's going to be a problem with unescaped unicode. Probably not.
Thanks,
Chris
Henrik Hjelte wrote:
Hello, thanks for reporting the problem. My ambition with cl-json is that is should be completely trustworthy and fully tested. It has unit tests that are based on the official testcase data for json and lots of other testcases.
However, in your case I am not sure I get the problem and the fix:
On 10/23/07, Chris Laux chris@terraminds.com wrote:
but kanji don't seem to be encoded properly, using the darcs version. The problem can be resolved by changing the following conditional (see diff below), so I'm guessing that's a bug?
< ((> code #x1f)
((<= code #x1f)
Do you want to change write-json-chars to encode every character with a code *above* #x1f to a numeric unicode code? If you are having some sort of unicode decoding problem I can see that this indeed will fix your problem, but it is not a proper solution. That means you are having problems with kanji unicode somewhere else in your environment, in your Lisp, your webserver or whatever, and encoding each character as a numeric unicode number is just a quick fix. (and it is not how json is intended to look like, even though it is strictly valid). Or did I completely misunderstand it?
Also, can you elaborate on the problem. Is it possible to state the problem in a way that I can make a testcase from (expected-input > do something > expected output)?
Also, what is your environment. Which lisp, os and so on.
Can your Lisp handle unicode? That is I guess an implicit requirement for cl-json, maybe I should make that clearer.
Thanks again, I would really like to help you sort this out.
/Henrik Hjelte
This is how it the code looks now (and should look in my opinion). Characters below #x1f are encoded as hex uncode values, character above are written as characters (write-char), we trust your lisp system uses unicode. Some special characters are written in other ways, for example newlines as \n.
(defun write-json-chars (s stream) (declare (inline lisp-special-char-to-json)) (loop for ch across s for code = (char-code ch) for special = (lisp-special-char-to-json ch) do (cond ((and special (not (char= special #/))) (write-char #\ stream) (write-char special stream)) ((<= code #x1f) (format stream "\u~4,'0x" code)) (t (write-char ch stream)))))
If there is a problem, but I don't think there is, we could add an option to encode high unicode chars as hex.
/Henrik