Internationalization (I18N), Localization (L10N), and Globalization (G11N)

Evergreen started as a project for the state of Georgia. Fortunately, the developers provided hooks for translating most of the Evergreen catalog and staff client interface. This page tries to explain what is currently possible, identify outstanding problems, and point in the direction that Evergreen is headed. Feel free to help out!

Localization

Build environment

The translatable strings from the i18n files are being converted to GNU gettext (POT and PO) format for ease of translation. The corresponding tools for converting the files to and from the gettext format and their native format live in the build/i18n/ directory. They require a few supplemental tools over and above the normal Evergreen requirements.

Prerequisites

Install the translation build tools:

make -f Open-ILS/src/extras/Makefile.install <osname>-translator

Creating or updating a set of PO files for translation

To create or update a set of PO files for a particular locale from the translatable files in your current Evergreen source directory:

# Change into the i18n directory
cd build/i18n
# THIS MUST BE DONE ALSO
mkdir locale
# Update the en-US POT files
make newpot
# Create or update the PO files for the locale 'fr-CA'
make LOCALE=fr-CA updatepo

Creating translated project files from translated PO files

To create the translated project files for a locale from a set of translated PO files in the build/i18n/po/<locale>/ directory:

# Create new project files for the locale 'fr-CA'
make LOCALE=fr-CA install

This will automatically update the PO files with the latest definitions from the en-US source POT files, then generate the full set of project files for the requested locale. Any strings that have not been translated, or that are marked as "fuzzy", will be substituted with the English version of the strings.

NOTE If you have trouble installing your translated .po-files, make sure you are installing your files in valid gettext-standard PO-format. Things to look for include (but not limited to):

BOM BOM! Gettext tools do not support BOM (http://en.wikipedia.org/wiki/BOM)
Runaway newlines. Some translators like to end their translations to a newline. On each row, msgid and msgstr definitions must have a starting and ending double quote

Inserting the strings into the database

A number of initial "seed" value strings are stored in the database. For each translated locale, we generate a "seed" value file of INSERT statements in Open-ILS/src/sql/Pg/950.data.seed-values-<LOCALE>.sql. You can add the translated strings for a given locale <LOCALE> to an existing Evergreen database instance by issuing the following command:

# Install the fr-CA strings in the Evergreen database:
#   * hostname 'localhost'
#   * username 'evergreen'
psql -h localhost -U evergreen -f Open-ILS/src/sql/Pg/950.data.seed-values-fr-CA.sql

NOTE This does not copy the correct files into /openils/var/web/opac/common/js/<locale>/

NOTE If you fail to INSERT your 950.data.seed-values-<locale>.sql, you need to add your language definition to config.i18n_locale issuing the following command:

INSERT INTO config.i18n_locale VALUES ('fi-FI','fin','Suomi','Suomen komia kieli');

Localizing the TPAC

To make your localization visible in the TPAC, you should follow these instructions: http://docs.evergreen-ils.org/2.3/_creating_a_new_skin_the_bare_minimum.html

Technical details behind localization

The technical details of how we handle localization in Evergreen - staff client, catalogue, and OpenSRF - have been moved to a separate page to avoid confusing this more practical page.

Other internationalization features

We will want to explore how the catalog handles collation, diacritics, etc. I wrote a small blog post about these features at http://www.coffeecode.net/archives/105-Evergreen-internationalization-chat.html after talking to Mike Rylander for the first time in November 2006. An evaluation of these features on a working system is one of my goals for the research proposal I will be starting in the very near future.

Collation

The collating sequence for the entire PostgreSQL database cluster is determined by the initial –lc-collate parameter to the initdb command.

Mike has this crazy idea where, if you search for works written or performed in a specific language in advanced search, the collating sequence should dynamically switch to sort the results according to the rules for that particular language. If he can get that to work – awesome!

See Jan Pazdziora's project site for info and code for arbitrary collation support for PG.

Spell checking

Evergreen uses aspell with the en-US dictionary by default (this can be set in opensrf.xml in the /apps/open-ils.search/app_settings/spelling_dictionary element). It should be possible, however, to teach Open-ILS/src/perlmods/OpenILS/Application/Search.pm in Evergreen to use the dictionary corresponding to the user's chosen interface language.

Diacritics in search

Search currently ignores diacritics (e == é == è) as diacritics are removed during indexing normalization. This can be made an optional normalization step in the future.

Number, date, time, currency formatting

Locale-specific data is currently formatted according to en-US conventions. We're looking at using Dojo in the catalog and staff client interfaces to format data according to the user's chosen locale.

Globalization (G11N)

We need to be able to reflect the requirements of different countries. Consider this a stub section that will eventually be replaced by a HOWTO document explaining, for example, how to alter the patron templates for states and zip codes to provinces and postal codes.

A simple approach is to modify the labels in the DTD for the staff client and the OPAC (Open-ILS/web/opac/locale/ll-LL/lang.dtd and Open-ILS/web/opac/locale/ll-LL/lang.dtd, respectively); for example, the labels are defined in en-US as:

$ grep -i zip *.dtd

lang.dtd:<!ENTITY staff.patron_display.mailing.post_code.label 'Mailing ZIP:'>
lang.dtd:<!ENTITY staff.patron_display.physical.post_code.label 'Physical ZIP:'>
lang.dtd:<!ENTITY staff.patron_search_form.post_code.label 'ZIP:'>
opac.dtd:<!ENTITY myopac.summary.address.zip "Zip">

There are other implications in the client code (for example, the JavaScript includes regular expressions that check the contents of a field and expect it to be a five-digit ZIP code). But it's a start.

Conify interface g11n problems

Just keeping track as I take a first pass through these interfaces…

Org unit interface:
- Major: Has inline JavaScript regexes for telephone number, zip code

i18n, devdocs

Evergreen DokuWiki

Table of Contents