[hunchentoot-devel] url-encode doesn't encode apostrophes

I'm using url-encode to encode strings that I use for keys (see e.g. <http://octopodial-chrome.com/tasting-notes/beer/Fuller%26rsquo%3Bs/2000%20Vintage%20Ale>; the brewer and beer name strings are both Unicode/HTML and contain characters meaningful in a URL). While running my pages through the w3c validator, I discovered that url-encode doesn't encode apostrophes ('), so now I run my strings through (cl-who:encode-string (hunchentoot:url-encode string)); is this the appropriate way to do things? It might be: url-encode turns a string into a string suitable for a URL and encode-string turns a string into a string suitable for an HTML attribute value. Still, it seems a bit...complex. -- Robert Uhl <http://public.xdi.org/=ruhl> I'm proud to be an old-fashioned bigoted unixoid. They'll take my keyboard away from me when they pry it from the shattered skulls of my enemies. --Mark Hughes

While running my pages through the w3c validator, I discovered that url-encode doesn't encode apostrophes ('), so now I run my strings through (cl-who:encode-string (hunchentoot:url-encode string)); is this the appropriate way to do things?
I'm trying to follow along here, but I wasn't aware of cl-who had an encode-string function. What version are you using? Second question: why does the apostrophe need to be encoded? Is it that the w3c validator wants it to be? Cheers, Chris Dean

On Fri, 21 Dec 2007 13:57:15 -0700, Robert Uhl <eadmund42@gmail.com> wrote:
I'm using url-encode to encode strings that I use for keys (see e.g. <http://octopodial-chrome.com/tasting-notes/beer/Fuller%26rsquo%3Bs/2000%20Vintage%20Ale>; the brewer and beer name strings are both Unicode/HTML and contain characters meaningful in a URL). While running my pages through the w3c validator, I discovered that url-encode doesn't encode apostrophes ('), so now I run my strings through (cl-who:encode-string (hunchentoot:url-encode string)); is this the appropriate way to do things?
It might be: url-encode turns a string into a string suitable for a URL and encode-string turns a string into a string suitable for an HTML attribute value. Still, it seems a bit...complex.
It seems complex, but it is the right way to do it. FWIW, the CL-WHO function is called ESCAPE-STRING, not ENCODE-STRING, and that shows its intent more clearly. As you said, you URL-encode the string to make it suitable for a URL. You might want to use the result for a header value in an HTTP reply, or as a URL you're giving to a client like Drakma. That's fine. But if you want to put the URL-encoded string into an HTML page, then the HTML rules apply, and you might have to escape the string in order not to create conflicts. That's life... :) Still, I'm surprised to see parts like "r%26rsquo%3Bs" in your example URL. With recent Hunchentoot and CL-WHO I get this: CL-USER 4 > (hunchentoot:url-encode "Fuller's") "Fuller's" CL-USER 5 > (cl-who:escape-string *) "Fuller's" And if I put a link like <a href="http://weitz.de/foo?a=Fuller's">click me</a> into an HTML file, then Firefox will go to http://weitz.de/foo?a=Fuller's if I click the link. Your example almost looks as if you have it the other way around. Edi.

On Sat, 22 Dec 2007 04:03:53 +0100, Edi Weitz <edi@agharta.de> wrote:
FWIW, the CL-WHO function is called ESCAPE-STRING
And, I forgot to mention, you don't need CL-WHO for this: http://weitz.de/hunchentoot/#escape-for-html Edi.

Edi Weitz <edi@agharta.de> writes:
And, I forgot to mention, you don't need CL-WHO for this:
Ah, thanks. I'm using CL-WHO and Hunchentoot, but in this particular place within my code it's a lot more logical to use Hunchentoot. I hadn't found ESCAPE-FOR-HTML because I was searching for 'encode' rather than 'escape' (silly me). Thanks again for the good work. I couldn't have gotten where I am without it. -- Robert Uhl <http://public.xdi.org/=ruhl> ...benefits arising from moderate use of liquor have been experienced in all armies and are not to be disputed. --George Washington, writing to John Hancock

Edi Weitz <edi@agharta.de> writes:
It seems complex, but it is the right way to do it. FWIW, the CL-WHO function is called ESCAPE-STRING, not ENCODE-STRING, and that shows its intent more clearly.
Doh! Typing from memory. Because of course with a computer capable of billions of operations per second it's just too painful to look up these minor details *grin*
As you said, you URL-encode the string to make it suitable for a URL. You might want to use the result for a header value in an HTTP reply, or as a URL you're giving to a client like Drakma. That's fine. But if you want to put the URL-encoded string into an HTML page, then the HTML rules apply, and you might have to escape the string in order not to create conflicts. That's life... :)
Yeah, it makes sense--just a bit surprising I guess.
Still, I'm surprised to see parts like "r%26rsquo%3Bs" in your example URL. With recent Hunchentoot and CL-WHO I get this:
CL-USER 4 > (hunchentoot:url-encode "Fuller's") "Fuller's"
CL-USER 5 > (cl-who:escape-string *) "Fuller's"
The '’' is part of the original string (for hysterical raisins all strings in the database are HTML strings); it then gets encoded into %26rsquo%3B by URL-ENCODE; OTOH when I have an ASCII apostrophe (_not_ a right single quote) then it gets passed as-is by URL-ENCODE but gets escaped by ESCAPE-STRING. -- Robert Uhl <http://public.xdi.org/=ruhl> OTOH there are some of us in .UK who would rather like NAFTA to become the North Atlantic Free-Trade Area, and for .UK to sign up to it instead of us piddling around on the periphery of some wannabe second-rate-superpower European bureaucracy that, in all honesty, couldn't come to a consensus on how many primary colours there are. --Tanuki

On Fri, 21 Dec 2007 21:30:53 -0700, Robert Uhl <eadmund42@gmail.com> wrote:
OTOH when I have an ASCII apostrophe (_not_ a right single quote) then it gets passed as-is by URL-ENCODE
Yes, that's correct behaviour. You can test it here: http://www.albionresearch.com/misc/urlencode.php Edi.
participants (3)
-
Chris Dean
-
Edi Weitz
-
Robert Uhl