LSB Addendum: SGML & XML LSB Addendum: SGML & XML Enclosed is a proposal, submitted by Eric Bischoff, for the LSB regarding SGML & XML. A more general proposal has been submitted to the Filesystem Hierarchy Specification workgroup to be adopted. It is proposed that the enclosed detailed draft be adopted as an addendum to the LSB written specification. A new Sourceforge CVS module would be created so this document would be initially maintained separately from the ongoing API/ABI written specification. This document is still being heavily discussed. It is neither a standard nor a recommendation, and is currently being used only as a basis for further discussion. Introduction In a normalization effort, about thirty people, including packagers of some Linux distributions, and developers of SGML related tools such as the SGML-Tools and DocBook Tools project, discussed informally and agreed on a series of recommendations that will be submitted as a draft to the Linux Standard Base project. A reference implementation will also be done as part of the DocBook-tools project. This document's redaction started as an attempt to end the nightmare of DocBook distributions, but it appeared quickly to be generic enough to apply to any SGML or XML DTD. Explanations about the reasons for all our choices are given in a separate document. Following a list of definitions, you will find a set of recommendations: R001--SGML Directory layout R002--DocBook Directory layout (standard names for directories, their contents) R003--Open Catalogs usage for SGML R004--Open Catalogs usage for DocBook (for the centralized catalogs and for the individual catalogs) R005--Configuration files (other /etc/sgml files) R006--ISO-entities (file names and FPI declarations) R007--Packages (how to package this type of material) We'd like to thank the following people who have participated intensively in this normalization effort: Camille Begnis (MandrakeSoft) camille@mandrakesoft.com Eric Bischoff (Caldera, KDE) eric@caldera.de Karl Eichwalder (SuSE) ke@suse.de Mark Galassi (DocBook-tools) rosalia@lanl.gov Jorge Godoy (Conectiva) godoy@conectiva.com.br Cees de Groot (SGML-tools) cg@cdegroot.com Jochem Huhmann joh@revier.com David Mason (RedHat, Gnome) dcm@redhat.com Manoj Srivastava (Debian) srivasta@datasync.com Norman Walsh (Sun, OASIS) ndw@nwalsh.com and all the other many people that helped with their own contribution. Definitions In the scope of this document, we will use the following terms: Centralized catalog An Open Catalog that includes only comments and CATALOG directives pointing to other catalogs (or DELEGATE directives if supported). DTD A Document Type Definition. It specifies the syntax used in documents. Examples of well-known DTDs include: HTML, XHTML, DocBook, TEI, MathML, MusicML, etc. SGML and XML give a framework for writing DTDs. Open Catalog A set of directives defined by the OASIS TR9401 Catalog, mostly used for defining equivalences between FPIs (Formal Public Identifiers) and real file names (see TR9401:1997 on http://www.oasis-open.org). Package A set of files assembled together for distribution. It includes RPMs, DEBs and any other kind of packaging system. SGML/XML computer program Any program used to view, edit, convert, use or apply any kind of treatment to a document written using a SGML or XML DTD (Document Type Definition). This includes command-line utilities as well as GUI-based applications. Style sheets Declarations or scripts that define formatting during some conversion or edition process of a SGML or XML document. They can be written in any style sheets language: CSS, DSSSL, FOSIs, XSL, ... Super catalog An Open Catalog pointing to all the centralized catalogs. R001--SGML Directory layout /etc/sgml/ Configuration files, including centralized catalogs. It includes: *.conf: generic configuration files sgml-docbook.cat, tei.cat, ...: DTD-specific centralized catalogs catalog: the super catalog ... /usr/share/sgml/ Architecture-independent files used by SGML/XML computer programs: Open Catalogs (not the centralized ones), DTDs, entities, style sheets, and other declarative files, if any. It is organized into DTD-specific subdirectories: docbook/ tei/ html/ ... Data that are not DTD-specific go directly into /usr/share/sgml, preferably under their own directory. At least for the present, all XML documents are also SGML documents, so it seems unnecessary to create /usr/share/xml and /etc/xml. R002--DocBook Directory layout This is the layout for a Jade-based or an Openjade-based system. Systems based on other SGML/XML computer programs can use this layout as well. The lower level directories are package-related. They are also version-numbered. /usr/share/sgml/ sgml-iso-entities-8879.1986/ xml-iso-entities-8879.1986/ (the ISO entities) jade-1.2.1/ openjade-1.3/ ... (the parsers and DSSSL engines architecture-independent files) ... /usr/share/sgml/docbook/ sgml-dtd-3.1/ sgml-dtd-4.0/ xml-dtd-4.0/ (the DocBook DTD) dsssl-stylesheets-1.54/ xsl-stylesheets-1.12/ (DSSSL style sheets for DocBook) kde-customization-0.1/ gnome-customization-0.1/ ldp-customization-0.1/ (customized DTDs, entities and style sheets for the various projects) ... (version number examples are arbitrary in this list) R003--Open Catalog usage for SGML Open Catalog files include: the individual catalogs provided with the DTDs, sylesheets or entities. the centralized catalogs used as central source of information that is specific to docbook, tei, or any other DTD the super catalog that references indirectly all the available catalog files The centralized catalog file names must end in .cat and reside in /etc/sgml. They contain only comments and CATALOG directives pointing to the real catalogs, like: -- sample contents of /etc/sgml/foo-1.05.cat -- CATALOG /usr/share/sgml/foo/xml-dtd-1.05/catalog CATALOG /usr/share/sgml/foo/xsl-stylesheets-0.1/catalog One can use DELEGATE instead of CATALOG if this directive is known to be supported. The centralized catalogs are DTD-specific and can be version-numbered. Here are examples of such centralized catalogs: /etc/sgml/ sgml-docbook.cat sgml-docbook-3.1.cat sgml-docbook-4.0.cat xml-docbook-4.0.cat Version-less centralized catalogs could be only symbolic links to the latest version (or to any other older version). /etc/sgml/catalog is the super catalog. It contains CATALOG pointers to all the centralized catalogs: -- sample contents of /etc/sgml/catalog -- CATALOG /etc/sgml/sgml-docbook.cat CATALOG /etc/sgml/xhtml.cat CATALOG /etc/sgml/mathml.cat One can use DELEGATE instead of CATALOG if this directive is known to be supported. It should not point to centralized catalogs that are merely symbolic links and therefore are already mentioned. Users should be able to define their own centralized catalogs and their own super catalog in their home directories: $HOME/.sgml-docbook.cat $HOME/.catalog The SGML/XML computer programs are not supposed to use centralized catalogs, although their use is strongly encouraged: if other mechanisms allow one to locate the real catalogs, they can be used as well. However distribution packagers should always take care of feeding the right entries into the super catalog and the centralized catalogs. The interface for a script named install-catalog that does these maintenance tasks is described here: install-catalog --add--remove centralized_catalog ordinary_catalog Example: bash# install-catalog --add \ /etc/sgml/sgml-docbook-3.1 \ /usr/share/sgml/docbook/dsssl-stylesheets-1.54/catalog The other catalogs should be placed in subdirectories of /usr/share/sgml. They should all be named catalog. They are the ones who do the real work of mapping the FPIs to file names (among other tasks). R004--Open Catalog usage for DocBook This recommendation is merely a consequence of the preceding recommendations. For a distribution of DocBook based on Jade or OpenJade, we suggest the following names. Again, other SGML or XML DTDs and other computer programs can use a similar structure. /etc/sgml/ sgml-docbook.cat xml-docbook.cat sgml-docbook-3.0.cat sgml-docbook-3.1.cat sgml-docbook-4.0.cat xml-docbook-4.0.cat /usr/share/sgml/sgml-iso-entities-8879.1986/catalog /usr/share/sgml/xml-iso-entities-8879.1986/catalog /usr/share/sgml/jade-1.2.1/catalog /usr/share/sgml/openjade-1.0/catalog /usr/share/sgml/docbook/sgml-dtd-3.0/catalog /usr/share/sgml/docbook/sgml-dtd-3.1/catalog /usr/share/sgml/docbook/sgml-dtd-4.0/catalog /usr/share/sgml/docbook/xsl-dtd-4.0/catalog /usr/share/sgml/docbook/dsssl-stylesheets-1.54/catalog /usr/share/sgml/docbook/xsl-stylesheets-1.12/catalog R005--Configuration files Other configuration files may also reside in /etc/sgml, either DTD-specific or program-specific. Their name should end in .conf and they should follow ordinary rules for files residing in /etc as defined by LSB. The user should be able to redefine them in his/her home directory. Their syntax and purpose is not defined in this document. R006--ISO-entities The file names should be fixed to: ISOamsa.ent ISOamsb.ent ... The identifiers should be fixed to: "ISO 8879:1986//ENTITIES Added Math Symbols: Arrow Relations//EN" During the transitory period, symbolic links and duplicate declarations will be allowed as a means of preserving compatibility with previous naming schemes. R007--Packages C programs can get compiled with any version of a given compiler. SGML documents can't use any version of a given DTD. They need the corresponding DTD to reside on the same system, or at least to be reachable. The various versions of a given DTD in turn may imply certain versions of the style sheets. This leads to a unusual situation where the old DTDs and style sheets should not be replaced during a package update. We would like to make distribution packagers aware of the suggested solutions. They may choose to: put the version number in the package name field (example: docbook-dtd-3.1-1.0.rpm) not put the version number and use subpackages for each version