Hello,
I just stumbled upon an encoding problem with the authorization header. It cannot really handle UTF-8 encoded user names (or passwords), because base64:base64-string-to-string does not respect the used encoding.
In my local instance, I fixed this by changing hunchentoot's authorization function as follows:
(defun authorization (&optional (request *request*)) "Returns as two values the user and password (if any) as encoded in the 'AUTHORIZATION' header. Returns NIL if there is no such header." (let* ((authorization (header-in :authorization request)) (start (and authorization (> (length authorization) 5) (string-equal "Basic" authorization :end2 5) (scan "\S" authorization :start 5)))) (when start (let* ((auth-octets (base64:base64-string-to-usb8-array (subseq authorization start))) (auth (octets-to-string auth-octets :external-format *hunchentoot-default-external-format*))) (destructuring-bind (&optional user password) (split ":" auth) (values user password))))))
Or as patch: 286,288c286,293 < (destructuring-bind (&optional user password) < (split ":" (base64:base64-string-to-string (subseq authorization start))) < (values user password))))) ---
(let* ((auth-octets (base64:base64-string-to-usb8-array (subseq authorization start))) (auth (octets-to-string auth-octets :external-format
*hunchentoot-default-external-format*)))
(destructuring-bind (&optional user password) (split ":" auth) (values user password))))))
Regards, Christian
On Mon, 12 May 2008 11:48:14 +0200, Christian Haselbach ch@mr-co.de wrote:
I just stumbled upon an encoding problem with the authorization header. It cannot really handle UTF-8 encoded user names (or passwords), because base64:base64-string-to-string does not respect the used encoding.
In my local instance, I fixed this by changing hunchentoot's authorization function as follows:
My understanding of the standard is that what you are doing is wrong. Basic authentication is described in RFC 2617 where it says that the "userid" and "password" parts must be "TEXT" and "TEXT" is defined in RFC 2616 as "any octet", /but/
"words of *TEXT may contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047."
So, you'd have to add support for proper RFC 2047 parsing to make the function deal with this correctly. (And I'd say that that would probably be worth its own library, maybe based on FLEXI-STREAMS.)
Or as patch:
BTW, note that your email program rendered the patch unusable.
Thanks, Edi.
Edi Weitz wrote:
My understanding of the standard is that what you are doing is wrong.
I guess, you are right.
So, you'd have to add support for proper RFC 2047 parsing to make the function deal with this correctly. (And I'd say that that would probably be worth its own library, maybe based on FLEXI-STREAMS.)
Well, yes, looks like it. It seems to be a bit overprotective to have something base64 encoded and then again encoded inside, which might again use base64.
Regards, Christian
On Wed, 14 May 2008 21:10:26 +0200, Christian Haselbach ch@mr-co.de wrote:
So, you'd have to add support for proper RFC 2047 parsing to make the function deal with this correctly. (And I'd say that that would probably be worth its own library, maybe based on FLEXI-STREAMS.)
Well, yes, looks like it. It seems to be a bit overprotective to have something base64 encoded and then again encoded inside, which might again use base64.
And, BTW, it probably won't buy you much unless the main browsers play along. I just tried with Firefox 2.0 and if you enter characters as login or password which aren't in Latin-1, Firefox simply sends the lower octet of each character's code point. Not very helpful...
Edi Weitz wrote:
And, BTW, it probably won't buy you much unless the main browsers play along. I just tried with Firefox 2.0 and if you enter characters as login or password which aren't in Latin-1, Firefox simply sends the lower octet of each character's code point. Not very helpful..
That's no big problem (at this point), because the header is written by the javascript client. That's also the reason it works for me with the patch I sent. But now it only works in violation of the standards, as you pointed out. So I just have to fix this in the javascript client, which I have control over.
Regards, Christian
Edi Weitz wrote:
So, you'd have to add support for proper RFC 2047 parsing to make the function deal with this correctly. (And I'd say that that would probably be worth its own library, maybe based on FLEXI-STREAMS.)
I tried it and called it cl-rf2047 and can be found at http://mr-co.de/projects/cl-rfc2047/
Though, I guess you do not want to add such a young library from someone with limited common lisp experience to the project's dependencies.
BTW, note that your email program rendered the patch unusable.
Thanks for the hint. This time I attached the patch.
Regards, Christian
On Mon, 19 May 2008 21:43:24 +0200, Christian Haselbach ch@mr-co.de wrote:
I tried it and called it cl-rf2047 and can be found at http://mr-co.de/projects/cl-rfc2047/
I haven't looked at everything in detail, but it looks good at first sight.
One comment: I'd make the two regular expression variables constants and I'd replace the expression
(string+ *decoded-line-regexp* "(" *crlfsp* *decoded-line-regexp* ")*")
with
#.(string+ +decoded-line-regexp+ "(" +crlfsp+ +decoded-line-regexp+ ")*")
(You'll have to wrap the definition of STRING+ with EVAL-WHEN if you want to keep everything in one file.)
Though, I guess you do not want to add such a young library from someone with limited common lisp experience to the project's dependencies.
As I said in a previous reply, even Mozilla (which I think is one of the more standards-compliant browsers) doesn't seem to use RFC 2047, so it probably doesn't make much sense to make this an integral part of Hunchentoot (not to mention that it's easy to add if you need it), but I'll add a link to the website (and to Drakma's as well).
Thanks, Edi.
Edi Weitz wrote:
As I said in a previous reply, even Mozilla (which I think is one of the more standards-compliant browsers) doesn't seem to use RFC 2047, so it probably doesn't make much sense to make this an integral part of Hunchentoot (not to mention that it's easy to add if you need it), but I'll add a link to the website (and to Drakma's as well).
That's true. Thanks for the tips and the links.
I just rechecked how twitter does it for their REST API. They seem not allow non-ASCII characters.
Regards, Christian
Leslie P. Polzer wrote:
I like the approach of having a separate login name and nickname (the SimpleMachines Forum does this, for example).
We get a bit OT here. The display name + login name approach is common and ok for things like Jabber, where the client stores this information for you. But it is yet another detail the user has to remeber. A lot of applications use the e-mail address as login, which is quite sensible. But this also means that the application knows your e-mail address, which is not necessarily what you want. In short: I'd like to give the user as much choice as possible and at most as much complexity as needed.
Anyway, I've now also got a patch for jQuery, to send RFC2047 compliant auth header. :) I'll publish it soon on the cl-rfc2047 page.
Regards, Christian