[postmodern-devel] character encoding - postmodern-devel - mailman3.common-lisp.net

newer
[postmodern-devel] Create-table...

[postmodern-devel] character encoding

older
[postmodern-devel] Release 1.13

Roman Budzianowski

9 Sep 2008 9 Sep '08

11:33 p.m.

I am trying to retrieve records from a db with SQL_ASCII. Any advice on how to deal with it? I don't see a way to change the encoding for a connection. Invalid byte at start of character: 0x99 [Condition of type TRIVIAL-UTF-8:UTF-8-DECODING-ERROR] I don't think I can change the encoding of the db. Roman

Reply

Sign in to reply online Use email software

Show replies by date

Roman Budzianowski

10 Sep 10 Sep

5:18 a.m.

Sorry, I misunderstood the design. CL-USER> (pomo:query "SHOW client_encoding") (("UNICODE")) It should work, so I why am I getting this error? On Sep 9, 2008, at 4:33 PM, Roman Budzianowski wrote:

I am trying to retrieve records from a db with SQL_ASCII. Any advice on how to deal with it? I don't see a way to change the encoding for a connection.

Invalid byte at start of character: 0x99 [Condition of type TRIVIAL-UTF-8:UTF-8-DECODING-ERROR]

I don't think I can change the encoding of the db.

Roman

_______________________________________________ postmodern-devel mailing list postmodern-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/postmodern-devel

Reply

Sign in to reply online Use email software

Marijn Haverbeke

7:40 a.m.

Hello Roman,

Invalid byte at start of character: 0x99 [Condition of type TRIVIAL-UTF-8:UTF-8-DECODING-ERROR]

This seems to suggest that the text in the record is not actually ASCII, but some other 8-bit encoding. 0x7f is the highest character code ASCII text can contain. What does 'psql -l' show? Actually, the way I understood it is that if you set a UNICODE/UTF-8 encoding for the connection, which Postmodern does, it should translate any text into that encoding before sending it... but that would mean the error you're seeing could never occur. I've never seen it happen before... switching your database to UNICODE encoding is bound to fix it, or you can try setting Postmodern to use 8-bit encoding (with the 0x7f-0xff characters being interpreted in whichever way your Lisp implementation feels like, most likely latin-1) by changing the *unicode* parameter in the cl-postgres.asd file, and re-compiling the library. Best, Marijn

Reply

Sign in to reply online Use email software

Roman Budzianowski

3:46 p.m.

Marijn, I vaguely remember that I ran into it before, but since I was creating a new db I just switched to Unicode on the postgres side. I tried the same query in psql after setting the encoding to Unicode: NCS=# \encoding SQL_ASCII NCS=# \encoding UNICODE NCS=# \encoding UTF8 The query worked. Not sure how much of a proof it is, but it seems to imply that the problem might be on the lisp side. E.g. a byte might be skipped or misinterpreted. Roman On Sep 10, 2008, at 12:40 AM, Marijn Haverbeke wrote:

Hello Roman,

...
Invalid byte at start of character: 0x99 [Condition of type TRIVIAL-UTF-8:UTF-8-DECODING-ERROR]

This seems to suggest that the text in the record is not actually ASCII, but some other 8-bit encoding. 0x7f is the highest character code ASCII text can contain. What does 'psql -l' show? Actually, the way I understood it is that if you set a UNICODE/UTF-8 encoding for the connection, which Postmodern does, it should translate any text into that encoding before sending it... but that would mean the error you're seeing could never occur. I've never seen it happen before... switching your database to UNICODE encoding is bound to fix it, or you can try setting Postmodern to use 8-bit encoding (with the 0x7f-0xff characters being interpreted in whichever way your Lisp implementation feels like, most likely latin-1) by changing the *unicode* parameter in the cl-postgres.asd file, and re-compiling the library.

Best, Marijn _______________________________________________ postmodern-devel mailing list postmodern-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/postmodern-devel

Reply

Sign in to reply online Use email software

Marijn Haverbeke

3:51 p.m.

Hey,

it seems to imply that the problem might be on the lisp side. E.g. a byte might be skipped or misinterpreted.

Could be, but I can't imagine where. It would be useful if you could send me (some subset of) the DB that causes the error + the query you were trying, so I can try to reproduce the problem. Best, Marijn

Reply

Sign in to reply online Use email software

Roman Budzianowski

4:40 p.m.

Marijn, I can retrieve some records. I am attaching a file with a record that failed in lisp but was retrieved fine in psql - no guarantee of course that the bytes are exactly the same. It failed in lisp like so: Invalid byte at start of character: 0xae [Condition of type TRIVIAL-UTF-8:UTF-8-DECODING-ERROR] Restarts: 0: [ABORT] Return to SLIME's top level. 1: [ABORT] Abort entirely from this (lisp) process. Backtrace: 0: (SWANK:SWANK-DEBUGGER-HOOK #<TRIVIAL-UTF-8:UTF-8-DECODING-ERROR @ #x719c5ec2> #<Function SWANK-DEBUGGER-HOOK>) 1: (ERROR TRIVIAL-UTF-8:UTF-8-DECODING-ERROR :BYTE 174 :MESSAGE "Invalid byte at start of character: 0x~X") Locals: EXCL::DATUM = TRIVIAL-UTF-8:UTF-8-DECODING-ERROR EXCL::ARGUMENTS = (:BYTE 174 :MESSAGE "Invalid byte at start of character: 0x~X") POSTMODERN::STATEMENT-ID = :STATEMENT-1 #:G1332 = WEB-SOURCE-ITEMS PQ-RDB::ARGS = (25) #:G1333 = WEB-SOURCE-ITEMS-SELECT-BY-ID PQ-RDB::SELECT-QUERY = "(SELECT * FROM web_source_items WHERE (id = $1))" ID = NIL POSTMODERN::PARAMS = (22225) EXCL::LOCAL-0 = (:BYTE 174 :MESSAGE "Invalid byte at start of character: 0x~X") EXCL::LOCAL-1 = TRIVIAL-UTF-8:UTF-8-DECODING-ERROR EXCL::LOCAL-2 = #<TRIVIAL-UTF-8:UTF-8-DECODING-ERROR @ #x719c5ec2> EXCL::LOCAL-3 = 97 EXCL::LOCAL-4 = T EXCL::LOCAL-5 = 0 EXCL::LOCAL-6 = 8 :UNKNOWN = NIL :UNKNOWN = NIL :UNKNOWN = NIL 2: (TRIVIAL-UTF-8:UTF-8-GROUP-SIZE 174) Locals: BYTE = 174 POSTMODERN::STATEMENT-ID = :STATEMENT-1 #:G1332 = WEB-SOURCE-ITEMS PQ-RDB::ARGS = (25) #:G1333 = WEB-SOURCE-ITEMS-SELECT-BY-ID PQ-RDB::SELECT-QUERY = "(SELECT * FROM web_source_items WHERE (id = $1))" ID = NIL POSTMODERN::PARAMS = (22225) EXCL::LOCAL-0 = 174 3: (TRIVIAL-UTF-8:READ-UTF-8-STRING #<MULTIVALENT #1=stream socket #1#connected from #2=localhost/33896 to #2#/5435 @ #x719a7dba> :BYTE- LENGTH 1353) 4: ((:INTERNAL (:TOP-LEVEL-FORM "interpret.lisp" 4106) 0) #<MULTIVALENT #1=stream socket #1#connected from #2=localhost/33896 to #2#/5435 @ #x719a7dba> 1353) 5: ((FLET POSTMODERN::SYMBOL-ALIST-ROW-READER CL-POSTGRES:NEXT- FIELD) #<CL-POSTGRES::FIELD-DESCRIPTION @ #x719c4542>) 6: (POSTMODERN::SYMBOL-ALIST-ROW-READER #<MULTIVALENT #1=stream socket #1#connected from #2=localhost/33896 to #2#/5435 @ #x719a7dba> #(#<CL-POSTGRES::FIELD-DESCRIPTION @ #x719c370a> #<CL-POSTGRES::FIELD- DESCRIPTION @ #x719c3912> #<CL-POSTGRES::FIELD-DESCRIPTION @ #x719c3b1a> #<CL-POSTGRES::FIELD-DESCRIPTION @ #x719c3d22> #<CL- POSTGRES::FIELD-DESCRIPTION @ #x719c3f2a> #<CL-POSTGRES::FIELD- DESCRIPTION @ #x719c4132> ...)) 7: (CL-POSTGRES::SEND-EXECUTE #<MULTIVALENT #1=stream socket #1#connected from #2=localhost/33896 to #2#/5435 @ #x719a7dba> "STATEMENT-1" (22225) POSTMODERN::SYMBOL-ALIST-ROW-READER) 8: ((LABELS CL-POSTGRES:EXEC-PREPARED #:G3648)) 9: (CL-POSTGRES:EXEC-PREPARED #<CL-POSTGRES:DATABASE-CONNECTION @ #x719a7a9a> "STATEMENT-1" (22225) POSTMODERN::SYMBOL-ALIST-ROW-READER) 10: ((:INTERNAL WEB-SOURCE-ITEMS-SELECT-BY-ID) 22225) 11: (LET* ((PQ-RDB::SELECT-QUERY (GET # #)) (PQ-RDB::SELECT-FUNCTION (OR # #))) (APPLY PQ-RDB::SELECT-FUNCTION PQ-RDB::ARGS)) 12: (WEB-SOURCE-ITEMS-SELECT-BY-ID 22225) On Sep 10, 2008, at 8:51 AM, Marijn Haverbeke wrote:

Hey,

...
it seems to imply that the problem might be on the lisp side. E.g. a byte might be skipped or misinterpreted.

Could be, but I can't imagine where. It would be useful if you could send me (some subset of) the DB that causes the error + the query you were trying, so I can try to reproduce the problem.

Best, Marijn _______________________________________________ postmodern-devel mailing list postmodern-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/postmodern-devel

Reply

Sign in to reply online Use email software

Marijn Haverbeke

4:54 p.m.

Hey Roman, There are indeed only ascii characters in that record. Anyway, I can't reproduce the problem with this information: I still don't know what your table looks like, and which encoding the database uses... if the database isn't too huge, could you try dumping it, importing it into a new database with the same encoding, testing whether the problem still occurs, and if yes, delete most of the rows, dump again, and send to me? Best, Marijn

Reply

Sign in to reply online Use email software

Roman Budzianowski

11 Sep 11 Sep

6:18 p.m.

Marijn, I tried setting cl-postgres-system:*unicode* to nil and reloading. It works! I notice a non-ascii char: the registered sign. The database is in SQL_ASCII encoding, but it contains non ascii chars. Somehow psql in unicode mode deals with it, but trivial-utf-8 doesn't. So, it doesn't have anything to do with postmodern. However, maybe it would be useful to be able to set encoding mode per connection? Any other suggestions? It seems that this is a not uncommon case. Roman CL-USER> (pomo:query "SELECT item_description FROM web_source_items WHERE (id = 22225)") (("<br>The Aeroneb® Go Micropump Nebulizer is a fast, efficient, simple-to-use device developed for the millions of patients worldwide who require respiratory therapy in and away from home, and eliminates many of the problems associated with current methods of ...... and water, and disinfected (if desired).")) On Sep 10, 2008, at 9:54 AM, Marijn Haverbeke wrote:

Hey Roman,

There are indeed only ascii characters in that record. Anyway, I can't reproduce the problem with this information: I still don't know what your table looks like, and which encoding the database uses... if the database isn't too huge, could you try dumping it, importing it into a new database with the same encoding, testing whether the problem still occurs, and if yes, delete most of the rows, dump again, and send to me?

Best, Marijn _______________________________________________ postmodern-devel mailing list postmodern-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/postmodern-devel

Reply

Sign in to reply online Use email software

Marijn Haverbeke

6:45 p.m.

Hey Roman,

I notice a non-ascii char: the registered sign. The database is in SQL_ASCII encoding, but it contains non ascii chars.

Ah, I somehow missed the (R). This seems like rather awful behaviour on the part of postgres -- there's no (R) in ASCII, and sending it in latin encoding over a utf-8 encoded connection is just wrong. But yes, it would be nice if Postmodern didn't choke on it. However, I have no idea how to distinguish cases like this from genuine utf-8 without completely killing performance.

So, it doesn't have anything to do with postmodern. However, maybe it would be useful to be able to set encoding mode per connection? Any other suggestions?

That would probably be nice. The current character-encoding approach is rather ad-hoc -- babel didn't exist yet at the time, and flexi-streams was so slow it wasn't even an option, but nowadays it should be possible to use babel to decode any format the server is likely to throw at us. Unfortunately, the current behaviour is not wrong enough to distress me to the point of making it a priority to fix it, and I have enough stuff on my hands. If anyone feels up to it, I'd be happy to review and incorporate a patch, of course. Best, Marijn

Reply

Sign in to reply online Use email software

6136

Age (days ago)

6138

Last active (days ago)

Download

8 comments

2 participants

tags

participants (2)

Marijn Haverbeke
Roman Budzianowski