Hello folks,
I'm trying to use drakma to fetch urls that contain utf8 characters but HTTP-REQUEST automatically url encodes any non latin-1 ascii characters.
On my cursory reading of the RFCs, this seems conforming behavor, but in this case it is definitely unwanted.
For example, the following url
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
if entered directly into the browser correctly returns the text-to-speech audo file but
when attempting to use HTTP-REQUEST, the url is being url encoded into
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
of which google does not url-decode and fails to return the correct data.
So, for this case, the url-encoding is unwanted.
I am willing to submit patch an additional argument into HTTP-REQUEST to disallow the encoding.
Thoughts?
Thank you, William
You are not allowed to send arbitrary characters in the request line. FWIW, I just tried your example with Firefox and this is what the browser sends according to LiveHttpHeaders:
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
And Google returns the requested audio file.
Edi.
On Sat, Mar 10, 2012 at 1:08 AM, William Halliburton whalliburton@gmail.com wrote:
Hello folks,
I'm trying to use drakma to fetch urls that contain utf8 characters but HTTP-REQUEST automatically url encodes any non latin-1 ascii characters.
On my cursory reading of the RFCs, this seems conforming behavor, but in this case it is definitely unwanted.
For example, the following url
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
if entered directly into the browser correctly returns the text-to-speech audo file but
when attempting to use HTTP-REQUEST, the url is being url encoded into
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
of which google does not url-decode and fails to return the correct data.
So, for this case, the url-encoding is unwanted.
I am willing to submit patch an additional argument into HTTP-REQUEST to disallow the encoding.
Thoughts?
Thank you, William
drakma-devel mailing list drakma-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/drakma-devel
Thanks much. After some wiresharking, I found that google doesn't like the drakma user-agent and changing it to firefox did the trick.
On Sat, Mar 10, 2012 at 2:46 AM, Edi Weitz edi@agharta.de wrote:
You are not allowed to send arbitrary characters in the request line. FWIW, I just tried your example with Firefox and this is what the browser sends according to LiveHttpHeaders:
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
And Google returns the requested audio file.
Edi.
On Sat, Mar 10, 2012 at 1:08 AM, William Halliburton whalliburton@gmail.com wrote:
Hello folks,
I'm trying to use drakma to fetch urls that contain utf8 characters but HTTP-REQUEST automatically url encodes any non latin-1 ascii characters.
On my cursory reading of the RFCs, this seems conforming behavor, but in this case it is definitely unwanted.
For example, the following url
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
if entered directly into the browser correctly returns the text-to-speech audo file but
when attempting to use HTTP-REQUEST, the url is being url encoded into
http://translate.google.com/translate_tts?tl=ru&q=%D0%B2%D1%8B
of which google does not url-decode and fails to return the correct data.
So, for this case, the url-encoding is unwanted.
I am willing to submit patch an additional argument into HTTP-REQUEST to disallow the encoding.
Thoughts?
Thank you, William
drakma-devel mailing list drakma-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/drakma-devel
drakma-devel mailing list drakma-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/drakma-devel
On Sun, Mar 11, 2012 at 6:50 AM, William Halliburton whalliburton@gmail.com wrote:
After some wiresharking, I found that google doesn't like the drakma user-agent
Bastards... :)