Wednesday, November 04, 2009

Old Word 2002 glitch can make a document's links appear to be corrupted

Occasionally, I have run into a situation where an HTML document with many hyperlinks and derived from a Microsoft Word document and then converted to HTML develops corruption in the links.

A symptom is that after adding a new link, suddenly the file (when the cursor is run over the link or the link is visited) shows links either above or below the link that is intended. And sometimes whole passages of text seemed to be included in a link.

Microsoft Word 2000 was the first to offer automatic conversion to HTML. But it seems that Word 2002 had a bug in the way it generated XSL meta-code, which could cause this to happen.

In the past, I corrected it by editing the document in Notepad and deleting manually the excess XSL code that generated spurious “a” links. This could be a tedious process. But today I encountered it again with a document that had been converted in Word 2002, then edited in Front Page. Because of changes, I was editing it in Word 2007 and ran into the problem again. This time I restored the old copy of the file and created a new file to give new movie review links (on my website, the cable movies file).

A visitor to a file with corrupted links might believe that it is infected by a virus. But in this case it is not; the corruption is due to a past software bug, not malware. It sounds conceivable that website advisor services like McAfee Site Advisor or Web of Trust might flag the sites with warnings, but so far I haven’t run into this.

Microsoft stopped supporting Word 2002 sometime around 2004 or so.

Note: In 1997, as I was completing my book in Word 95, I had one large file with many footnotes go bad with some of it changing to jibberish. I restored it from a floppy and never had the problem again. Another large file turned to jibberish when printing at Kinko's, but printed OK at home. One wonders.

No comments: