Saving Word Files as HTML

Saving Word Files as HTML

Category: Software

"When I use the 'Save As' feature in Microsoft Word to save a document as a web page, the resulting HTML is a bloated mess. Is all that formatting stuff really needed? If not, is there a way to get rid of it?"

I'd Like to Phone a Friend, Regis

I asked my friend Allan Wyatt, who is an internationally recognized author, software expert, and publisher of the WordTips newsletter to handle this question. Here's what he says:

When Word 2000 creates a Web document, it saves quite a bit of information in the HTML document. This information is Word-specific. It is not necessary for your Web browser, and is only useful if you are planning on loading the HTML document back into Word 2000 at a later date. One element that it records is font sizes. The Web, by default, doesn't support a large number of different font sizes and typographical conventions. It certainly doesn't support as many as Word can. So Word 2000 stores that information in a created HTML document anyway, tucked away so that it can decipher it when you later load up the document in Word.

Some people don't like the way font formatting is done by Word, and prefer to take advantage of the "relative" font sizing that is natural to the Web. The relative font sizing allows the browser--and the user through the browser--to specify the relative size of the text that appears on-screen. This can be a great feature to some people. Word, however, doesn't use the relative font sizing, instead trying to make the font appear as close to what the document author used as possible.

If you are not going to load the document back into Word, you can get rid of all that extra baggage. You can either do this the tedious way, or the somewhat-less-tedious way. The tedious way, of course, involves opening the HTML file in a text editor and removing all but the bare HTML code that is necessary for displaying your information. This requires, of course, that you be fairly conversant in HTML coding.

The somewhat-less-tedious way involves the use of a Microsoft add-in for Word 2000 (called the Office 2000 HTML Filter) that will remove all the Word-specific HTML code for you. The add-in is free; you can learn more about it (and download it) at the following address:

http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN

Even after running the Office 2000 HTML Filter, you may still want to open the file and examine to resulting HTML code to make sure it displays information exactly as you intend. While this may require some knowledge of HTML, it doesn't require all the tedious steps of doing the removal and recoding yourself.

Thanks, Allen!


Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Articles > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction