QA Labs brings you practical tips and tools on testing, quality assurance (QA), and related topics through this monthly e-newsletter.
Planning Ahead for Software Internationalization
Introduction
Localization does not begin with the translation of the source material. It starts during the development phase, by designing the product to be adaptable to different markets and customers, flexible to language, and culture diversity. This process is called Internationalization. A general definition of software internationalization and localization has been outlined from the marketing and testing perspectives in our newsletters [Issue #3 and Issue #5]. The following addresses the common language and cultural issues involved in the software internationalization process.
There are many issues to consider when internationalizing software; however the following are among the most important.
Text Expansion
Some languages require more space than others to communicate the same idea. It is not possible to predict precisely how long a translation might be, however some studies comparing languages have found interesting results. For example, on average, a text in English (100%) would expand to 128% in Dutch and to 109% in Italian.
Text expansion may cause many headaches and becomes particularly crucial in the case of software designed for small devices, such as mobile phones. A title that fits nicely when in English could be severely truncated when in German.
Considering text expansion in advance, the designers can implement different solutions to avoid truncations, such as by auto-wrapping in two lines, using different fonts, or making the sizing of dialog boxes, and their controls, dynamic.
Reusing Strings
Text might have different translations depending on the context where it appears; therefore it's not always advisable to reuse individual strings. In the example below, the word "Unknown" appears in a property dialog in two different situations:
"Date": "Unknown"
"Size": "Unknown"
Why not consider "Unknown" as a single string and pay for just one translation? Effective internationalization takes into account that different situations may result in different language translations. For example, in Spanish "date" is feminine and "size" is masculine, hence "Unknown" would be translated differently for each case:
"Fecha" [Date]: "Desconocida" [Unknown]
"Tamańo" [Size]: "Desconocido" [Unknown]
Concatenation
Concatenation means to put together two or more strings to construct a new different string. Translators may have to match fragments that must belong together and translation becomes a risky task, as shown in the example below, using Hungarian.
"Unable to". (verb) (object) = "Nem".
"open" = "megnyitás"
"save" = "mentés"
"file" = "fájl"
"image" = "kép"
Concatenation of these strings causes an incorrect translation:
"Unable to open file" = "Nem megnyitás fájl"
"Unable to save image" = "Nem mentés kép"
Hungarian uses word endings instead of prepositions and the sentence structure is different from that used in English. To avoid incorrect or strange language constructions it is better not to concatenate and instead to create different strings for each case.
In other words, phrases/sentences should be made as independent as possible in the development phase to permit straight and accurate localization, as shown below:
"Unable to open file" = "A fájl nem nyitható meg"
"Unable to save image" = "A kép nem mentheto"
Variables
Variables are a replaceable part of a string, e.g., "You received %S messages". Translation of strings with variables is a challenge for Internationalization. First of all, the item to replace the variable should be clearly identified, e.g., number of files or e-mail addresses, etc.
Then language-specific characteristics such as gender, plural, prepositions, etc., need to be considered to cover all possible combinations. In cases where there are several variables, even the order in which they appear may need to be considered with respect to localization.
"Downloading %S of %S pictures" could be "Downloading 1 of 5 pictures"
The string is correct if there are several pictures to be downloaded, but what if there is only 1 picture?
"Downloading 1 of 1 pictures"
To avoid problems with variables it is advisable to create different strings for each case, in this example, one for singular and another for plural.
"Downloading %S of %S picture"
"Downloading %S of %S pictures"
Another possible solution when localizing strings with variables is to use non-sentence constructions:
"Number of pictures downloaded: %S", where "S" can be "0", "1", "2" or "30" with no grammatical problems.
Locale Data
The term "locale" means a "collection of rules and data specific to a language and a geographic area" [1]. Locales include representation of date and time, currency symbols, collation, etc., some of which are described below.
Calendars
Calendars may be localized to display different layouts. For example, in some countries the week doesn't start on Sunday, and the working days may vary as well. And, even though most countries may use the Gregorian calendar, some have another calendar system, such as the Chinese lunar calendar.
Date Formats
- Short date format: dd/mm/yy
- Long date format: day month, year, e.g., "15 de mayo, 2005"
- Order of date components: year-month-day, day-month-year
- Separator: slash, comma, hyphen, full stop, e.g., "15-05-05"
- Leading zero: can be used or not, e.g. "01/01/05" or "1/1/05"
- Abbreviations: in some cases (like MS Outlook) calendars may need days and months abbreviated in 1 or 2-characters, e.g., "01/Ja/05"
- Capitalization: days and months can be either capitalized or not, e.g., in English "January", but in Italian "gennaio"
Time Formats
- 12 or 24-hour clock: e.g., "5:30pm" or "17:30"
- Time separators: colon, comma, full stop, "h" (hour), e.g., "2:35" or "2h 35"
- AM/PM format: can be in upper or lower caps, e.g., "5:30am" or "5:30AM"
- Position of AM/PM symbol: before or after the time expression
- Trailing space before AM/PM symbol: can be used or not, e.g., "5:30pm" or "5:30 pm"
- Leading zero: can be used or not, e.g., "05:00 h"
Numbers, Currency, Units
- Leading/trailing currency symbol: e.g., "20?" or "?20"
- Space between symbol and numbers: e.g., "$8,500" or "$ 8,500"
- Number of decimal places used for currency: one, two, e.g., "3.41" or "3,4"
- Negative sign: represented with or without brackets, e.g., "(100)" or "-100"
- Thousands separators: comma, full stop, space, e.g., "25,000" or "25 000"
- Decimal separators: comma, full stop, e.g., "14.6" or "14,6"
- Different measurement units: metric, imperial, etc.
Sorting Order
Languages have different alphabets and therefore different sorting orders. For example in French "ö" is a variation of "o" and not a different character, but in Swedish it is the last letter in the alphabet.
Sort order can vary even between countries that use the same language. That's the case of the letter "ch" which, until few years ago, was considered a single letter in Latin America and sorted between "c" and "d", while in Spain it was "c", sorted between "c + e" and "c + i".
Although this is no longer the case, an example is shown below:
American Spanish: a b c ch d e f g h .
avión, beso, casa, celeste, cintura, crisis, choza, detalle...
European Spanish: a b c d e f g h .
avión, beso, casa, celeste, choza, cintura, crisis, detalle...
Conclusion
The guidelines and examples explained above can help to understand some of the issues that should be considered when planning for internationalization of a software product.
As explained in Accent on Internationalization: Guidelines For Software Internationalization [1]: "Thinking ahead and properly developing a product for multiple markets will help avoid a long, painful, and costly localization experience...The goal of software internationalization is to permit localization that does not require changes to application code."
Keeping these issues in mind will help avoid problems in the final phase of the localization process, during translation, and therefore avoid code rework, schedule delays, and unplanned costs.
Finally, a poorly internationalized product will make a negative impression on the end-users and therefore impact the product acceptance and sales.
References
[1] International Language Engineering Corporation. Accent on Internationalization: Guidelines for software internationalization. Second edition. Boulder, CO. 1997.
[2] Bert Esselink. A practical guide to localization. Philadelphia, PA. John Benjamins Publishing Company, 2000.
[3] Lingo Systems. The guide to translation & localization. Portland, OR. 2004.
QA Labs is a powerful player on your team supplying the critical competitive advantage you need today. Our mission is to help you make your software products succeed in the marketplace, whatever the climate. We work with you to make wise choices that reflect project constraints, industry trends, and business considerations. We are the largest independent software quality assurance and testing service provider in Canada. For more information, please visit www.qalabs.com.