The real-remote-addr function currently returns the value of the x-forwarded-for header if it's set, or remote-addr if it's not. In the case of chains of proxies, this gives unexpected results as each proxy appends the address it's proxying for onto the end of the list.
Since I imagine this function is intended to be used in situations where Hunchentoot is sitting behind proxies of its own, I've written a function to split things up to give a particular entry in this chain. Most of the time, I imagine you'd just want the address added by the closest proxy but if (for example) you're behind mod_proxy behind squid, this function can tell you the address of the agent that hit the squid server.
I'm not actually quite sure what to call this function. Originally at my local site I'd called it "real-remote-addr", replacing Hunchentoot's own function, but the name's not sitting easily with me. For one thing, it's got a different API since it takes the number of hops along the proxy chain to look, and for another it returns NIL if you fall off the end of the chain (I'm not quite sure that's correct, but it seems to me to fail better than DWIMly returning the last available address, since values supplied by the user's original request are useless anyway). Anyway, here it is, still with the name real-remote-addr.
(defvar *proxy-count* 0 "The length of the chain of server-side proxies in front of Hunchentoot.")
(defun real-remote-addr (&optional (nth *proxy-count*) (request *request*)) "Returns the address of the NTH host in the chain of proxies that set the X-Forwarded-For header. 0 is the address of the last proxy (or the client, if there are no proxies), 1 the address of the second proxy (or client if there is only one), and so forth." (if (zerop nth) (remote-addr request) (let* ((proxies (loop with xff = (header-in :x-forwarded-for request) for pos = 0 then (1+ comma) for comma = (position #, xff :start pos) collect (string-trim " " (subseq xff pos comma)) while comma)) (position (- (length proxies) nth))) (if (minusp position) nil (nth position proxies)))))
On Thu, 02 Nov 2006 11:15:56 -0700, "Robert J. Macomber" tbnl@rojoma.com wrote:
The real-remote-addr function currently returns the value of the x-forwarded-for header if it's set, or remote-addr if it's not. In the case of chains of proxies, this gives unexpected results as each proxy appends the address it's proxying for onto the end of the list.
Since I imagine this function is intended to be used in situations where Hunchentoot is sitting behind proxies of its own, I've written a function to split things up to give a particular entry in this chain. Most of the time, I imagine you'd just want the address added by the closest proxy but if (for example) you're behind mod_proxy behind squid, this function can tell you the address of the agent that hit the squid server.
Hmm, I see the problem, but that actually wasn't the only situation this was written for. I also imagined proxies I wouldn't have control of like those used by, say, AOL customers. To be honest, I didn't even know that chained proxies will add to the XFF header instead of just replacing it. Is this behaviour specified somewhere?
Anyway, I was thinking that maybe a better API would look like this:
1. If there is no XFF header, return REMOTE-ADDR as it is now.
2. If there is a XFF header, return two values - the second one is a list of all IP addresses in the header, the first one is the last element of this list.
How about that?
Cheers, Edi.
On Sun, Nov 05, 2006 at 09:18:42PM +0100, Edi Weitz wrote:
On Thu, 02 Nov 2006 11:15:56 -0700, "Robert J. Macomber" tbnl@rojoma.com wrote: Hmm, I see the problem, but that actually wasn't the only situation this was written for. I also imagined proxies I wouldn't have control of like those used by, say, AOL customers.
To be honest, I didn't even know that chained proxies will add to the XFF header instead of just replacing it. Is this behaviour specified somewhere?
I'm not sure. I couldn't find a formal specification anywhere, but proxies (in particular Apache's mod_proxy, which is the one I'm concerned with) certainly do this, specified or not. Section 4.2 of RFC2616 requires multiple fields to be combined by appending new values onto the end, but I can't see anything that requires a proxy not to outright replace things.
Anyway, I was thinking that maybe a better API would look like this:
If there is no XFF header, return REMOTE-ADDR as it is now.
If there is a XFF header, return two values - the second one is a list of all IP addresses in the header, the first one is the last element of this list.
How about that?
Well, if you actually want the XFF header, there's always (header-in :x-forwarded-for) but this would save some processing. If you want the "real remote address", presumably you're thinking of looking past some known proxy (or chain of known proxies) and want the single address that came into it. I'm sure the "last address in the list" is the common case, though.
That's a roundabout way to say "yes, that sounds about right".
Oh, and on another note, COOKIE-OUT returns a (name . cookie) pair instead of just the cookie. Is there a reason for this? I'm pretty sure it's just a forgotten call to CDR.
On Sun, 05 Nov 2006 13:48:18 -0700, "Robert J. Macomber" tbnl@rojoma.com wrote:
Well, if you actually want the XFF header, there's always (header-in :x-forwarded-for) but this would save some processing. If you want the "real remote address", presumably you're thinking of looking past some known proxy (or chain of known proxies) and want the single address that came into it. I'm sure the "last address in the list" is the common case, though.
Er, if the proxies add to the end of the list (which I didn't take into account), it'd be better to return the first and not the last element of the list, right?
Oh, and on another note, COOKIE-OUT returns a (name . cookie) pair instead of just the cookie. Is there a reason for this? I'm pretty sure it's just a forgotten call to CDR.
Yep, will be fixed.
Thanks, Edi.
On Sun, Nov 05, 2006 at 09:59:24PM +0100, Edi Weitz wrote:
Er, if the proxies add to the end of the list (which I didn't take into account), it'd be better to return the first and not the last element of the list, right?
Depends on whether you're interested in the user's (claimed) real address, or the address the user presents to the web server to which he thinks he's speaking. In my case, I'm interested in the latter (that is, I'm just interested in what REMOTE-ADDR would be if Hunchentoot weren't behind a server-side proxy). Having looked at the documentation for sessions in the past five minutes, I see that that machinery is interested in the former.
On Sun, 05 Nov 2006 14:18:50 -0700, "Robert J. Macomber" tbnl@rojoma.com wrote:
Depends on whether you're interested in the user's (claimed) real address, or the address the user presents to the web server to which he thinks he's speaking. In my case, I'm interested in the latter (that is, I'm just interested in what REMOTE-ADDR would be if Hunchentoot weren't behind a server-side proxy). Having looked at the documentation for sessions in the past five minutes, I see that that machinery is interested in the former.
OK, I've implemented the version I described in my previous email. The new release also fixes the bug in COOKIE-OUT you mentioned.
Thanks, Edi.
On Sun, Nov 05, 2006 at 11:57:39PM +0100, Edi Weitz wrote:
OK, I've implemented the version I described in my previous email. The new release also fixes the bug in COOKIE-OUT you mentioned.
Both work beautifully for me. Thanks!
Edi Weitz wrote:
Er, if the proxies add to the end of the list (which I didn't take into account), it'd be better to return the first and not the last element of the list, right?
I've made a few tests with open proxies on the net.
Most, if not all, add to the end of the list, instead of replacing it.
Some don't bother to append/replace the list with the client address (I guess they would be called "anonymous") and some even append 127.0.0.1 or other internal addresses at the end, for whatever reason.
In any case, seeing as a X-Forwarded-For header is quite easy to forge, trusting the first element of the list doesn't make much sense.
I guess the only real use would be getting the n-th to the last item, to trim away n known (and trusted) proxies and get to the real client address, as seen by the proxy + lisp image server setup.
Toby