Word Html 2 Formatting Objects
WH2FO is a Java application that processes an HTML output, created with Word 2000, and transforms it into an XML content file and an XSL stylesheet file. From these files, a standard XSLT processor may be used to obtain a file containing only XSL-FO markup. You can also apply a stylesheet that converts the XML back into HTML discarding all the extra markup added by Word. Using an XSL-FO renderer, such as FOP, you can also render your document into PDF.
I was looking for a way to convert heavily-formatted Word docs to html, and was starting to gibber in fear of having to write a parser of my own. But someone has (perhaps) already made that unnecessary. In my dreams, I was hoping to generate real XML, but this tool claims to do just that.