html2pot and po2html
html2pot and po2html are command-line tools for translating web sites. They work in conjunction with all tools that can handle po files: GNU gettext, KDE's KBabel, etc... The purpose is to separate the contents to be translated from the markup. These tools are published under GPL license.
Advantages:
- avoid being annoyed by the HTML markup when translating web sites;
- be sure to have exactly the same structure of the web pages in the translation as in the original;
- translate only once a text that occurs at several places on several web pages;
- translate only the changed strings when the web site evolves;
- have the comfort of modern translation tools like KBabel;
- have an exact estimation of the quantity of text to translate;
- translate at different rythms in many languages, do not stop updating the web site while translating;
- publish web sites that are only half-translated.
Requirements:
- To work, html2pot and po2html need a bash shell (as found on Linux, Mac OS X, etc), basic utilities like awk and sed, and a XSLT processor like libXSLT.
- html2pot and po2html have a major limitation / requirement. The web site must be "clean XHTML", i.e. validate against XHTML DTD. You can check that on W3C validator. Well, on one hand, that's not that much a requirement, but a guarantee of quality for your web site ;-).
- Another current limitation is that those scripts know little about PHP, javascript, and other stuff embedded in your web site. That could change in the future, however.
Work cycle:
- Modify the web pages with your favourite HTML editing tools. Make sure they validate against XHTML.
- Extract the po template: html2pot *.html > en.pot
- Create first empty translation: cp en.pot de.po
or merge with previous translations: msgmerge de.po en.pot > de.new; mv de.new de.po
- Translate the new or fuzzy strings: kbabel de.po &
- Publish the translation:
for file in *.html; do po2html de.po $file > de/$file; done
Usage:
- html2pot file1.html file2.html ... > template.pot
extract the strings to be translated
- po2html translation.po original.html > translation.html
merge the translation with the HTML skeleton
Known bugs:
- Problem with empty tags like <hr></hr>.
Please use <hr /> as a temporary workaround.
- HTML comments like <-- --> get lost. It's a problem for
embedded javascript code, for example.
- Whitespace at the beginning of preformatted lines in <pre>
tags gets lost.
- Shell part is hard to maintain and slow. Needs to be rewritten from
scratch (in C?).
Download:
This is beta software. Use at your own risks. Latest version is version number 0.1.4:
Credits:
Many thanks to Stephan Kulow for the inspiration (xml2pot and po2xml scripts for DocBook) !
If you are illegally collecting email addresses, just follow this link (explanations here).
Last updated: 2005-05-23. Maintained by webmaster@bureau-cornavin.com.