On Sun Feb 25, 2007 at 11:25:04AM +0100, Edi Weitz wrote:
On Sat, 24 Feb 2007 16:39:54 -0800, Jeffrey Cunningham jeffrey@cunningham.net wrote:
In this case I set it to make a substitution for the 'bad' character. Is it possible for there to be more than one?
Not yet. See current discussion on the FLEXI-STREAMS mailing list.
And more generally, should there not be a way to set drakma so it may take a performance hit but is guaranteed not to die on any html that is thrown at it?
It's not dying, it just signals an error.
And, no, I don't think there's a way to provide meaningful results and at the same time to be prepared to accept whatever bogus data or headers the server choses to send. If you find something like that, send patches, but it sounds like magic (or at least very good AI) to me.
I guess I disagree.
If I try to access a page like that using: links, lynx, wget, mozilla, firefox, or any html parsing entity I can think of they don't stop functioning, signal an error, or whatever you want to call it. They give me their best approximation of the content. Seems like that ought be the goal here, or at least a possibility.
In an automated process, signaling an error means that processing has stopped (or 'died'). The source of the error signal may be in flexi-streams (I have read the discussions in the that list), but its drakma that has to deal with its consequences.
How do the above mentioned applications manage this problem? Certainly not by magic. And I doubt the AI in links or lynx is very sophisticated.
--Jeff