The Crash Course to DocBook

1.4. Markup based on content

So how do you mark your documents such that useful information can be extracted and indexed? The approach in DocBook is to provide a very rich set of markup tags that all relate to the structure and nature of the document's content.

To give you a couple of examples of tags that could help with generating automatic indices: <attribution> and <command>. If you have a large body of documentation (for example, all Sun software and hardware is documented with DocBook) you can do a very easy search for any document that discusses a command called mount, or a quote attributed to Ken Thompson. On top of that, with such a structured search you would only find occurances of mount when it is a command name, and of Thompson when he is the author of a quote.

Now imagine for a moment what would happen if the entire World Wide Web used a rich content–based markup language instead of HTML: a search engine would give you the information you need without all the extra references which just happen to use those words casually. A search for mount on the web would almost certainly not find you references on the UNIX mount command.

So a rich markup language like DocBook is a good idea from many points of view, but it can also be difficult to use. DocBook has hundreds of tags (as opposed to just a few in HTML), so you might find the learning curve steep. That is true, and the only way around that is to write documentation on how to use DocBook!

On the other hand, once you are quite familiar with DocBook it will not slow you down too much to type in markup all the time. Keep in mind that most of the time a person is not writing, but rather worrying about meta–level problems with their document. If you use DocBook well you will spend a bit more time writing and a lot less time worrying about other issues like the layout on paper. (There is nothing you can do about it anyway!)