Jeremy Buhler, Karen Schneider, Tina Ji
Basically any text processor that can save plain text files can be used as a DocBook authoring tool. Professional XML editors will help you to write error-free XML documents, validate your XML against a DTD or a schema, and force you to stick to a valid XML structure.
There are numerous XML editors available. Unless otherwise noted the editors below have the following features:
XML Mind Personal Edition (Transformation function is not available.)
For more DocBook authoring tools, go to DocBook Wiki
Many Evergreen community documents (documentation, guides, training manuals, etc.) are currently in Microsoft Word format. Some of these documents are being used as the basis of Evergreen "core" documentation, which will be produced in DocBook 5 XML.
There are three options for converting Word to DocBook 5: using brute force (rekeying the documents), using a commercial conversion tool, or using Open Office to do a crude, one-way, partial Word-to-DocBook conversion.
The DocBook wiki also maintains a list of conversion tools.
This is sometimes the best way to convert very small documents, especially those that are in types not supported by conversion tools, such as glossaries.
The Open Office method has a clear cost advantage (where labor is free or low-cost), but the bad news is that the resulting files will be in the DocBook 4 format, and will also be limited to a few tags and document types. The files will require extensive cleanup and reformatting. But it's a start.
Open the source file (the Word document) in OpenOffice. The file can now be re-saved in different formats by clicking File > Save As, then selecting the desired file type from the Save as type dropdown. Save the file as type DocBook (.xml).
This creates a DocBook XML file with the text and general layout of the source. Again, this produces a DocBook 4 file that will still need extensive editing/conversion to meet Evergreen project requirements (DocBook 5, to start with), but at least it's XML.
If you don't need to convert images from the source document, you're done; othwerwise proceed to step 2.
Much like HTML, DocBook XML files do not have embedded images. Instead a DocBook file "points" to separate image files to display. To render the images in DocBook you must first extract them from the source DOC file.
Use File > Save As to re-save the source file as type HTML Document. This will create several new files: one HTML file and an image file for each picture embedded in the source document.
The last step is to link the extracted images to the DocBook XML file created in Step 1. Delete the HTML file, rename the image files (optional), then open the DocBook file in an XML or text editor and edit the <imagedata> tags to point to the corresponding image files.
The DocBook website provides an excellent collection of default XSL stylesheets. In theory, these stylesheets can be further tweaked and customized. As the Evergreen documentation project gets under way, more guidance will be available, but for now, the best advice is to stick with the default stylesheets. It will be challenging enough to get these to work the first few times.
Transformed with the standard XSL stylesheets, DocBook XHTML is, well, ugly – unstyled HTML. DocBook is a markup language, not a style language. Most DocBook sites style their HTML with CSS (cascading stylesheets). Usually, these files are called in the XSL transformation process.
The Evergreen project currently does not have a set of CSS for DocBook.
If you are using an XML editor that does not have document transformation function, you may set up the transformation process by the following guide.
Every Linux distribution seems to ship with different tools for transforming DocBook. It is relatively simple to set up a set of transforms to XHTML and PDF using the standard XSLT stylesheets and FO tools. This guide will get you up and running quickly.
We require just one binary package included in your distribution. Every distribution makes the libxslt processor,
xsltproc, available in some package. Look for a package named
libxslt and install it.
In this phase, we download, extract, and create symbolic links to the build tools. You can probably use more recent versions of the tools as they become available.
mkdir doctools cd doctools # Install the DocBook RelaxNG schema wget http://www.docbook.org/xml/5.0CR5/rng/docbook.rng wget http://downloads.sourceforge.net/docbook/docbook-xsl-1.73.2.tar.bz2 # Install the DocBook XSL stylesheets tar xjf docbook-xsl-1.73.2.tar.bz2 ln -sf docbook-xsl-1.73.2 docbook # Install Apache FOP wget http://apache.sunsite.ualberta.ca/xmlgraphics/fop/fop-0.94-bin-jdk1.4.tar.gz tar xzf fop-0.94-bin-jdk1.4.tar.gz ln -sf fop-0.94 fop # Install hypenation support for Apache FOP wget http://downloads.sourceforge.net/offo/offo-hyphenation.zip unzip offo-hyphenation.zip
The following is a simple script for generating XHTML and PDF from a DocBook source file. It assumes that your tools are installed in a subdirectory called
doctools within your home directory:
FOP=~/doctools/fop XSL=~/doctools/docbook DOC=~/eg_manual # Generate XHTML xsltproc $XSL/xhtml/docbook.xsl $DOC/index.xml > $DOC/index.html #Generate PDF via FO xsltproc $XSL/fo/docbook.xsl $DOC/index.xml > $DOC/index.fo $FOP/fop $DOC/index.fo -pdf $DOC/index.pdf -c $FOP/fop.xconf
Sagehill.net points to the most popular free XML tools for Windows transforms. XMLMind and Eclipse are two Windows XML editors frequently mentioned on discussion lists.
Some XML processors that run on Windows, such as oXygen ($), automate all or part of the following, and for substantial editorial work in a Windows environment, investing in a serious tool may be worth your while. But the following will get you going with a free XML editing toolkit.
Note: The above packages can be downloaded from http://xmlsoft.org/sources/win32/ or ftp://ftp.zlatkovic.com/pub/libxml/. Besides these two packages, you may also need their dependencies - iconv and zlib - also available from these sources. They may also need to be placed within your PATH as described below.
Then on your computer:
C:\XMLTOOLS\to contain the DocBook tools.
You can set this permanently in your computer's environment variable settings if you like. (from My Computer's properties, look for the Advanced tab, then a button for Environment Variable.)
rename C:\WINDOWS\system32\libxml2.dll libxml2.old
(although note that this might break some other tool on your Windows system…)
xsltproc C:\XMLTOOLS\docbook-xsl\xhtml\chunk.xsl C:\XMLTOOLS\docbook-xsl\tests\refentry.007.ns.xml
This should produce three HTML files in your current working directory. If it does, then your DocBook processing toolchain is set up to successfully produce XHTML.