Hi,
On Tue, 20 Nov 2007 09:31:01 -0500, "Kyle R. Burton" kyle.burton@gmail.com wrote:
I'm not sure if there is a mailing list that would be more appropriate.
Really?
http://weitz.de/hunchentoot/#mail
I'd appreciate if we could continue this discussion on the list, see Cc. (You have to subscribe first, of course.)
I'd like to start by saying thanks for Hunchentoot! (and all the other CL libraries you've developed).
You're welcome... :)
I am just starting with Hunchentoot, but I noticed that the query string parser supports ampersand (&) as a pair separator and I rememberd semi-colon (;) also being a valid separator for query strings. I only remember this from having worked with Perl's CGI. A bit of quick searching on the net lead me to:
http://en.wikipedia.org/wiki/Query_string http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
to confirm that the semi-colon is mentioned as a separator.
I have to admit that this is the first time I've heard of this recommendation. A quick test with the vastly popular PHP shows that they don't implement it either:
<? print_r($_GET); ?>
Try these:
http://zappa.agharta.de/test.html?foo=bar&baz=quux http://zappa.agharta.de/test.html?foo=bar;baz=quux
So, to everyone on the list - what's your experience with semicolons as query string separators? Is this normal or esoteric? And, even more interesting, does any client out there actually use this convention?
I looked into the sources and I think the minor change is in defmethod initialize-instance :after, to just update the split from:
(form-url-encoded-list-to-alist (split "&" query-string))
to
(form-url-encoded-list-to-alist (split "[;&]" query-string))
A quick empirical test with a browser shows symptoms of working:
/lisp/test?foo=bar&qux=baz&blarf=qbozzle /lisp/test?foo=bar;qux=baz;blarf=qbozzle
I'm new to CL so I'm not entirely confident of my suggestion...
Seems OK, but it might break existing applications.
Also, is there a place in the source tree for any kind of unit tests or assertions? I'd have added one for this test, but I didn't recognize where it would have been appropriate to put it.
No, only the demo in the "test/" directory.
Is this something you'd consider enhancing Hunchentoot with?
Lots of stuff...
Cheers, Edi.
On Nov 22, 2007 9:30 AM, Edi Weitz edi@agharta.de wrote:
Hi,
On Tue, 20 Nov 2007 09:31:01 -0500, "Kyle R. Burton" kyle.burton@gmail.com wrote:
I'm not sure if there is a mailing list that would be more appropriate.
Really?
http://weitz.de/hunchentoot/#mail
I'd appreciate if we could continue this discussion on the list, see Cc. (You have to subscribe first, of course.)
I'd like to start by saying thanks for Hunchentoot! (and all the other CL libraries you've developed).
You're welcome... :)
I am just starting with Hunchentoot, but I noticed that the query string parser supports ampersand (&) as a pair separator and I rememberd semi-colon (;) also being a valid separator for query strings. I only remember this from having worked with Perl's CGI. A bit of quick searching on the net lead me to:
http://en.wikipedia.org/wiki/Query_string http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
to confirm that the semi-colon is mentioned as a separator.
I have to admit that this is the first time I've heard of this recommendation. A quick test with the vastly popular PHP shows that they don't implement it either:
<? print_r($_GET); ?>
Try these:
http://zappa.agharta.de/test.html?foo=bar&baz=quux http://zappa.agharta.de/test.html?foo=bar;baz=quux
So, to everyone on the list - what's your experience with semicolons as query string separators? Is this normal or esoteric? And, even more interesting, does any client out there actually use this convention?
Esoteric these days; you still occasionally see it on the websites of newspapers using a certain CMS. I suspect it exists because of HTML's special treatment of the ampersand character; modern browsers are good at understanding ampersands used in HTTP requests, but I can imagine they were problematic in the past.
I can't imagine any client does; not all that much server software supports it. It might be worth supporting it from a compliance point of view, but there may be a potential for breaking things; I don't think most clients will bother encoding semicolons, so things which previously worked might break. (Say, http://localhost/?phrase=this+is+a+test;+it+contains+a+semicolon ) Rob.
On Thu, 22 Nov 2007 13:50:25 +0000, "Robert Synnott" rsynnott@gmail.com wrote:
but there may be a potential for breaking things; I don't think most clients will bother encoding semicolons, so things which previously worked might break. (Say, http://localhost/?phrase=this+is+a+test;+it+contains+a+semicolon )
Right, that's what I was fearing as well. If someone needs this desperately enough to send a patch, then it should be made configurable and the default should be that the semicolon as a separator is disabled.
Thanks, Edi.
Edi Weitz edi@agharta.de writes:
So, to everyone on the list - what's your experience with semicolons as query string separators?
I've used them a lot; they're my preferred method of separating query args.
Is this normal or esoteric?
Something less than normal and more than esoteric, methinks.
And, even more interesting, does any client out there actually use this convention?
The clients don't create URL args; those are generated by the site in question or by the user's typing.
For the same reason, I'm not certain that compatibility is a worry: if the client is requesting a URL the server previously served, surely that URL is A-OK.
On Sat, 2007-11-24 at 08:43 -0700, Robert Uhl wrote:
Edi Weitz edi@agharta.de writes:
So, to everyone on the list - what's your experience with semicolons as query string separators?
I've used them a lot; they're my preferred method of separating query args.
I personally never ever saw them used.
Is this normal or esoteric?
Something less than normal and more than esoteric, methinks.
I disagree - the RFC (http://www.w3.org/TR/html4/interact/forms.html#form-content-type) is actually rather clear about this. Section 17.13.4 "Form content types" reads as follows:
* application/x-www-form-urlencoded This is the default content type. Forms submitted with this content type must be encoded as follows:
1. Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A'). 2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'.
And that's it. Nowhere does it even mention the use of #; as a separator of name/value pairs.
And, even more interesting, does any client out there actually use this convention?
The clients don't create URL args; those are generated by the site in question or by the user's typing.
??? I think you must missunderstand something here - of course the client application needs to construct an URL every time a user fills out a html form that uses the GET method. Please refer to section 17.13.3 "Processing form data" of the RFC.
For the same reason, I'm not certain that compatibility is a worry: if the client is requesting a URL the server previously served, surely that URL is A-OK.
GET urls are usually _not_ served by the server but generated by the client application. How should a server process the following request:
GET /foo/bar?name1=value1;&name2=value2
Do we expect 'name1' -> 'value1;' and 'name2' -> 'value2' or do we expect 'name1' -> 'value1' and ';name2' -> 'value2'????
Cheers, Ralf Mattes
Ralf Mattes rm@seid-online.de writes:
Is this normal or esoteric?
Something less than normal and more than esoteric, methinks.
I disagree - the RFC (http://www.w3.org/TR/html4/interact/forms.html#form-content-type) is actually rather clear about this. Section 17.13.4 "Form content types" reads as follows:
application/x-www-form-urlencoded This is the default content type. Forms submitted with this content type must be encoded as follows:
- Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
- The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'.
And that's it. Nowhere does it even mention the use of #; as a separator of name/value pairs.
That's for forms. Not all URLs are created from forms; in fact, the vast majority of URLs are _not_ thus created. Most are simply href anchors.
And, even more interesting, does any client out there actually use this convention?
The clients don't create URL args; those are generated by the site in question or by the user's typing.
??? I think you must missunderstand something here - of course the client application needs to construct an URL every time a user fills out a html form that uses the GET method. Please refer to section 17.13.3 "Processing form data" of the RFC.
I'd forgotten about forms. I was thinking about the normal case for GET urls.
For the same reason, I'm not certain that compatibility is a worry: if the client is requesting a URL the server previously served, surely that URL is A-OK.
GET urls are usually _not_ served by the server but generated by the client application.
*boggle* Every single time you click a link, that's a GET request. The vast, vast, _vast_ majority of GET requests are not generated by forms but by following links.
How should a server process the following request:
GET /foo/bar?name1=value1;&name2=value2
Do we expect 'name1' -> 'value1;' and 'name2' -> 'value2' or do we expect 'name1' -> 'value1' and ';name2' -> 'value2'????
RFC 2396 states that semicolons are reserved, and that if they conflict with a reserved purpose then they must be escaped; further, it states that 'Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.' Thus those semicolons would have meaning within a query.
AFAIK, it's up to the server how to parse those characters, and thus Hunchentoot could be compliant and not parse them as most folks have come to expect.
Me, I would parse your example as 'name1' -> 'value1'; an empty pair; 'name2' -> 'value2'. That is, I see semicolon and ampersand as equivalent to one another.
Tim Berners-Lee in http://www.w3.org/DesignIssues/MatrixURIs.html suggests a syntax for matrix URLs which uses semicolons. With a '?' prefixed it can be adapted to use the query syntax.
Also, semicolons do not need to be HTML-escaped; ampersands must be. Thus it's convenient for HTML-writers to avoid ampersands; otherwise URLs must be written as /foo/bar?name1=value1&name2=value2 which gets very old very quickly.
Scribit Edi Weitz dies 22/11/2007 hora 10:30:
So, to everyone on the list - what's your experience with semicolons as query string separators? Is this normal or esoteric? And, even more interesting, does any client out there actually use this convention?
I've seen them even recently, and they're quite useful, as said already (no escaping hell).
Seems OK, but it might break existing applications.
Any client-side application that doesn't send them to a server escaped is buggy, and any server-side application that parse the query string without using them a separator as well.
If I have a form with a text field name foo and type "a=1;b=2&c=3" in it, my current browser, Firefox, will send the following query string:
foo=a%3D1%3Bb%3D2%26c%3D3
In a PHP program, then $_GET['foo'] would evaluate as "a=1;b=2&c=3".
If arg_separator.input in php.ini contains #;, then it would parse the following query strng as defining two parameters, a and b:
foo=a%3D1%3Bb%3D2%26c%3D3;b=bar
Note that the default value is "&" and here is the commented line in my default php.ini:
#arg_separator.input = ";&"
Making Hunchentoot configurable on this point would be the best, I think.
Quickly, Pierre
On Sat, 1 Dec 2007 12:45:22 +0100, Pierre THIERRY nowhere.man@levallois.eu.org wrote:
If arg_separator.input in php.ini contains #;, then it would parse the following query strng as defining two parameters, a and b:
foo=a%3D1%3Bb%3D2%26c%3D3;b=bar
Note that the default value is "&" and here is the commented line in my default php.ini:
#arg_separator.input = ";&"
Making Hunchentoot configurable on this point would be the best, I think.
I fully agree. That's what I said as well. Except that my point is that right I now I don't need it and I'm busy with other things anyway, so it has a very low priority for me. Patches welcome, of course.
Edi.