A few years back I used the standalone 'antiword' binary to convert .doc files to plaintext. It seemed to work pretty well.
-Shaneal
On Mon, Feb 28, 2011 at 5:20 PM, Daniel Herring dherring@tentpost.com wrote:
On Mon, 28 Feb 2011, Mark H. David wrote:
Does anyone know of any CL libraries for dealing with Microsoft Word files? Tools for creating them, reading from them, parsing them, converting them to plain text or other formats, things like that?
I suspect that RDNZL might provide the best results. You can use it to hook into the beast itself.
Your other approach is to hook into the code for another office suite such as Open/LibreOffice, AbiWord, or KWord.
In addition to Apache POI, there is also wvWare, but it doesn't support the new XML formats...
Right when the libraries were becoming good at doc, MS went and changed formats. Funny coincidence, that.
Later, Daniel
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro