User Tools

Site Tools


zzz_archive:docs:dig_word_conversion

Converting Microsoft Word to DocBook 5

Many Evergreen community documents (documentation, guides, training manuals, etc.) are currently in Microsoft Word format. Some of these documents are being used as the basis of Evergreen "core" documentation, which will be produced in DocBook 5 XML.

There are three options for converting Word to DocBook 5: using brute force (rekeying the documents), using a commercial conversion tool, or using Open Office to do a crude, one-way, partial Word-to-DocBook conversion.

The DocBook wiki also maintains a list of conversion tools.

Brute Force Method

This is sometimes the best way to convert very small documents, especially those that are in types not supported by conversion tools, such as glossaries.

Using commercial conversion tools

(Needs description)

Converting Word to DocBook with OpenOffice

The Open Office method has a clear cost advantage (where labor is free or low-cost), but the bad news is that the resulting files will be in the DocBook 4 format, and will also be limited to a few tags and document types. The files will require extensive cleanup and reformatting. But it's a start.

Step 1: Convert text and layout

Open the source file (the Word document) in OpenOffice. The file can now be re-saved in different formats by clicking File > Save As, then selecting the desired file type from the Save as type dropdown. Save the file as type DocBook (.xml).

This creates a DocBook XML file with the text and general layout of the source. Again, this produces a DocBook 4 file that will still need extensive editing/conversion to meet Evergreen project requirements (DocBook 5, to start with), but at least it's XML.

If you don't need to convert images from the source document, you're done; othwerwise proceed to step 2.

Step 2: Extract embedded images

Much like HTML, DocBook XML files do not have embedded images. Instead a DocBook file "points" to separate image files to display. To render the images in DocBook you must first extract them from the source DOC file.

Use File > Save As to re-save the source file as type HTML Document. This will create several new files: one HTML file and an image file for each picture embedded in the source document.

The last step is to link the extracted images to the DocBook XML file created in Step 1. Delete the HTML file, rename the image files (optional), then open the DocBook file in an XML or text editor and edit the <imagedata> tags to point to the corresponding image files.

zzz_archive/docs/dig_word_conversion.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.