Ralf Mattes rm@seid-online.de writes:
Is this normal or esoteric?
Something less than normal and more than esoteric, methinks.
I disagree - the RFC (http://www.w3.org/TR/html4/interact/forms.html#form-content-type) is actually rather clear about this. Section 17.13.4 "Form content types" reads as follows:
application/x-www-form-urlencoded This is the default content type. Forms submitted with this content type must be encoded as follows:
- Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
- The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'.
And that's it. Nowhere does it even mention the use of #; as a separator of name/value pairs.
That's for forms. Not all URLs are created from forms; in fact, the vast majority of URLs are _not_ thus created. Most are simply href anchors.
And, even more interesting, does any client out there actually use this convention?
The clients don't create URL args; those are generated by the site in question or by the user's typing.
??? I think you must missunderstand something here - of course the client application needs to construct an URL every time a user fills out a html form that uses the GET method. Please refer to section 17.13.3 "Processing form data" of the RFC.
I'd forgotten about forms. I was thinking about the normal case for GET urls.
For the same reason, I'm not certain that compatibility is a worry: if the client is requesting a URL the server previously served, surely that URL is A-OK.
GET urls are usually _not_ served by the server but generated by the client application.
*boggle* Every single time you click a link, that's a GET request. The vast, vast, _vast_ majority of GET requests are not generated by forms but by following links.
How should a server process the following request:
GET /foo/bar?name1=value1;&name2=value2
Do we expect 'name1' -> 'value1;' and 'name2' -> 'value2' or do we expect 'name1' -> 'value1' and ';name2' -> 'value2'????
RFC 2396 states that semicolons are reserved, and that if they conflict with a reserved purpose then they must be escaped; further, it states that 'Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.' Thus those semicolons would have meaning within a query.
AFAIK, it's up to the server how to parse those characters, and thus Hunchentoot could be compliant and not parse them as most folks have come to expect.
Me, I would parse your example as 'name1' -> 'value1'; an empty pair; 'name2' -> 'value2'. That is, I see semicolon and ampersand as equivalent to one another.
Tim Berners-Lee in http://www.w3.org/DesignIssues/MatrixURIs.html suggests a syntax for matrix URLs which uses semicolons. With a '?' prefixed it can be adapted to use the query syntax.
Also, semicolons do not need to be HTML-escaped; ampersands must be. Thus it's convenient for HTML-writers to avoid ampersands; otherwise URLs must be written as /foo/bar?name1=value1&name2=value2 which gets very old very quickly.