This document describes dbtotexi, a simple utility for converting XML documents that conform to a subset of the DocBook DTD into GNU texinfo format. The dbtotexi program is implemented using the XSL Transformations language as described in the working document http://www.w3.org/TR/1999/WD-xslt-19990421. A Java based XSL engine1 carries out the actual transformation as determined by the style sheet dbtotexi.xsl. A small amount of additional Java code provides a few utility routines not provided by the XSL implementation.
This software is subject to the terms of the GNU General Public License. Please see the file COPYING for details. The license terms that apply to the supplied third party software contained in the files sax.jar, xp.jar and xt.jar are specified in the files sax-copying.txt, xp-copying.txt and xt-copying.txt respectively.
Once the tar archive has been unpacked2, check the Makefile to see if the settings at the top are suitable for your site and then just type make and make install. By default, the dbtotexi bash shell script goes into /usr/local/bin and the support files into /usr/local/share/dbtotexi. A compiled version of the Java support code is supplied so that you do not need a Java compiler unless you change the Java code.
The installation defaults to using Sun's jre VM but any JDK 1.1 compliant implementation (such as Kaffe3) should work. No GUI facilities or additional libraries are required. If you use a different VM then the shell script, dbtotexi.sh may need editing.
A DocBook source file, foo.xml, is converted to texinfo format very simply:
dbtotexi foo.xml
Will produce output in foo.texinfo. The name of the output file can be explicitly specified as a second argument. If the output file name is specified as -, the output is sent to stdout. A third argument will specify the name of the info file to produce, this defaults to the input file name modified to have a .info suffix. Any DocBook elements that are not recognised (due to either an error in the input document or because the translator does not yet support a translation for that element) are reported to stderr and shown in the output in bold.
A document that conforms to the SGML DocBook DTD must first be converted to XML before it can be processed by dbtotexi. This can be done using the sx program that is part of James Clark's SP SGML toolset. Typical usage would be:
sx -xlower foo.sgm > foo.xml
Note
The XML version of the DocBook DTD is not actually required by the conversion process (but see sec_texinfopi). In fact, if the document to be converted doesn't contain a
DOCTYPEdeclaration then the conversion process is somewhat quicker. Irrespective of whether the document contains aDOCTYPEdeclaration, it should be valid (i.e. it conforms to the DocBook XML DTD).
This section describes how the translation of some the elements are influenced by the setting of the element's role attribute.
indexterm
role attribute can be set to one of c, f, v, k, p and d to indicate which index the entry should be entered in. If the role attribute is not specified the entry will be entered into the concept index by default.
index
role attribute can be set to one of c, f, v, k, p and d to indicate which index should be output. If the role attribute is not specified the concept index will be output by default.
variablelist
role attribute can be set to one of bold or fixed to indicate that the list's terms should be displayed in bold or fixed-width font respectively. If the role attribute is not specified, the list's terms be displayed "as is".
texinfo Processing InstructionThe texinfo processing instruction can be used within a document to insert arbitrary markup into the output. The characters @, { and } are not escaped. This facility can be used to define entities that contain texinfo markup. For example, given that the following general entity declaration is placed in the DTD subset:
<!ENTITY hellip "<?texinfo @dots{}?>">
One can write … and expect to get dots...!
dircategory & direntry Processing InstructionsThe dircategory and direntry processing instructions may be used to set the resulting info file's directory category and menu entry. These processing instructions are best positioned after the document type declaration but before the first element (<book> or <article>). Here's what this document uses:
<?dircategory Texinfo documentation system?> <?direntry * Dbtotexi: (dbtotexi). DocBook to Texinfo convertor.?>
A few Unicode characters are recognised in element content and converted into the equivalent texinfo command. Unrecognised Unicode characters are passed through unchanged. Norman Walsh's DocBook XML DTD defines the ISO entity set in terms of Unicode characters. app_unicode lists the set of Unicode characters that are currently recognised.
A couple of points should be born in mind:
More information can be found from these links:
http://www.w3.org/TR/WD-xslt
http://www.jclark.com/
xt and the SP toolset.
http://nwalsh.com/
http://www.kaffe.org/
The following table lists the set of Unicode characters that are currently recognised. The name of the XML entity that yields each character is also listed.
| Unicode Character | Rendered As | Entity Name
|
00a0 | nbsp
| |
00a1 | ¡ | iexcl
|
00a3 | £ | pound
|
00a9 | © | copy
|
00bf | ¿ | iquest
|
00c6 | Æ | AElig
|
00df | ß | szlig
|
00e6 | æ | aelig
|
2022 | | bull
|
2026 | ... | hellip
|
|
| ||
0131 | i | inodot
|
|
| ||
00a8 | ¨ | uml
|
00e4 | ä | auml
|
00c4 | Ä | Auml
|
00eb | ë | euml
|
00cb | Ë | Euml
|
00ef | ¨i | iuml
|
00cf | Ï | Iuml
|
00f6 | ö | ouml
|
00d6 | Ö | Ouml
|
00fc | ü | uuml
|
00dc | Ü | Uuml
|
00ff | ÿ | yuml
|
0178 | ¨Y | Yuml
|
|
| ||
00b4 | ´ | acute
|
00e1 | á | aacute
|
00c1 | Á | Aacute
|
00e9 | é | eacute
|
00c9 | É | Eacute
|
00ed | ´i | iacute
|
00cd | Í | Iacute
|
00f3 | ó | oacute
|
00d3 | Ó | Oacute
|
00fa | ú | uacute
|
00da | Ú | Uacute
|
00fd | ý | yacute
|
00dd | Ý | Yacute
|
0107 | ´c | cacute
|
0106 | ´C | Cacute
|
01f5 | ´g | gacute
|
013a | ´l | lacute
|
0139 | ´L | Lacute
|
0144 | ´n | nacute
|
0143 | ´N | Nacute
|
0155 | ´r | racute
|
0154 | ´R | Racute
|
015b | ´s | sacute
|
015a | ´S | Sacute
|
017a | ´z | zacute
|
0179 | ´Z | Zacute
|
|
| ||
00b8 | ¸ | cedil
|
00e7 | ç | ccedil
|
00c7 | Ç | Ccedil
|
0122 | ¸G | Gcedil
|
0137 | ¸k | kcedil
|
0136 | ¸K | Kcedil
|
013c' | ¸l | lcedil
|
013b | ¸L | Lcedil
|
0146 | ¸n | ncedil
|
0145 | ¸N | Ncedil
|
0157 | ¸r | rcedil
|
0156 | ¸R | Rcedil
|
015f | ¸s | scedil
|
015e | ¸S | Scedil
|
0163 | ¸t | tcedil
|
0162 | ¸T | Tcedil
|
|
| ||
00af | ¯ | macr
|
0101 | a¯ | amacr
|
0100 | A¯ | Amacr
|
0113 | e¯ | emacr
|
0112 | E¯ | Emacr
|
012a | I¯ | Imacr
|
012b | i¯ | imacr
|
014c | O¯ | Omacr
|
014d | o¯ | omacr
|
016b | u¯ | umacr
|
016a | U¯ | Umacr
|
|
| ||
00e2 | â | acirc
|
00c2 | Â | Acirc
|
00ea | ê | ecirc
|
00cA | Ê | Ecirc
|
00ee | ^i | icirc
|
00ce | Î | Icirc
|
00f4 | ô | ocirc
|
00d4 | Ô | Ocirc
|
00db | û | ucirc
|
00fb | Û | Ucirc
|
0109 | ^c | ccirc
|
0108 | ^C | Ccirc
|
011d | ^g | gcirc
|
011c | ^G | Gcirc
|
0125 | ^h | hcirc
|
0124 | ^H | Hcirc
|
0135 | ^j | jcirc
|
0134 | ^J | Jcirc
|
015d | ^s | scirc
|
015c | ^S | Scirc
|
0175 | ^w | wcirc
|
0174 | ^W | Wcirc
|
0177 | ^y | ycirc
|
0176 | ^Y | Ycirc
|
|
| ||
00e0 | à | agrave
|
00c0 | À | Agrave
|
00e8 | è | egrave
|
00c8 | È | Egrave
|
00ec | `i | igrave
|
00cc | Ì | Igrave
|
00f2 | ò | ograve
|
00d2 | Ò | Ograve
|
00f9 | ù | ugrave
|
00d9 | Ù | Ugrave
|
|
| ||
00e3 | ã | atilde
|
00c3 | Ã | Atilde
|
00f1 | ñ | ntilde
|
00d1 | ~N | Ntilde
|
00f5 | õ | otilde
|
00d5 | Õ | Otilde
|
0129 | ~i | itilde
|
0128 | ~I | Itilde
|
0169 | ~u | utilde
|
0168 | ~U | Utilde
|
Currently, I am using James Clark's xt.
You must have done that already to be reading this!
http://www.kaffe.org/