html2pot and po2html
html2pot and po2html are command-line tools for translating web sites. They work in conjunction with all tools that can handle po files: GNU gettext, KDE's Lokalize, etc... The purpose is to separate the contents to be translated from the markup. These tools are published under GPL license version 2.
Advantages:
- avoid being annoyed by the HTML markup when translating web sites;
- be sure to have exactly the same structure of the web pages in the translation as in the original;
- translate only once a text that occurs at several places on several web pages;
- translate only the changed strings when the web site evolves;
- have the comfort of modern translation tools like Lokalize;
- have an exact estimation of the quantity of text to translate;
- translate at different rythms in many languages, do not stop updating the web site while translating;
- publish web sites that are only half-translated.
Requirements:
- To work, html2pot and po2html need a bash shell (as found on Linux, Mac OS X, etc), basic utilities like awk and sed, and a XSLT processor like libXSLT.
- html2pot and po2html have a major limitation / requirement. The web site must be "clean XHTML", i.e. validate against XHTML DTD. You can check that on W3C validator. Well, on one hand, that's not that much a requirement, but a guarantee of quality for your web site ;-).
- Another current limitation is that those scripts know little about PHP, javascript, and other stuff embedded in your web site. That could change in the future, however.
Work cycle:
- Modify the web pages with your favourite HTML editing tools. Make sure they validate against XHTML.
- Extract the po template: html2pot *.html > en.pot
- Create first empty translation: cp en.pot de.po
or merge with previous translations: msgmerge de.po en.pot > de.new; mv de.new de.po
- Translate the new or fuzzy strings: kbabel de.po &
- Publish the translation:
for file in *.html; do po2html $file de.po > de/$file; done
Usage:
- html2pot file1.html file2.html ... > template.pot
extract the strings to be translated
- po2html original.html translation.po > translation.html
merge the translation with the HTML skeleton
Known bugs:
- XML and DTD declarations get lost during the transformation and you
might want to re-add them manually.
- HTML comments like <-- --> get lost. It's a problem for
embedded javascript code, for example.
- Whitespace at the beginning of preformatted lines in <pre>
tags gets lost.
- Shell part is hard to maintain and slow. Needs to be rewritten from
scratch (in C?).
Download:
Latest version is version number 0.3:
Credits:
Many thanks to Stephan Kulow for the inspiration (xml2pot and po2xml scripts for DocBook)!
If you are illegally collecting email addresses, just follow this link (explanations here).
Last updated: 2011-01-11. Maintained by webmaster@bureau-cornavin.com.