Continuation of a discussion in OoLacksMathArgument.
Objective: A non-trivial application must be customized for multiple languages.
Associated Issues and Challenges:
- Integrating the UI Model with the appropriate translations. That is, what sort of code needs be written to obtain a certain expression? And how much context is needed? How does the UI react if a translation for a particular resource is not available? (Can the user select which fallback language to use, or is one selected for her?) How do you avoid conflicts in keyboard-based menu-shortcuts for different translations?
- Convenience:
- translators should not need to know how to code, and programmers should not need to know how to translate.
- programmers may also not be writers; it may be a translator needs to take text from the programmer and distill or refine it into something a layman can grok.
- especially for short-text, such as menu and button text, an intelligent translation may require more knowledge of context than a one or two word English phrase. In a program, this context is often expressed as a comment. How do we ensure these comments survive?
- there may be multiple translators working concurrently, so there may be conflict-resolution issues
- to what degree can the approach support indirect translations (i.e. Japanese -> Chinese -> English?)
- can arbitrary users also become translators? this might be useful in a project that cannot afford translators. A 'translate me!' icon next to any untranslated text might be nice, but how, then, does one vote a given translation into a more official position?
- the change-impact for a fix to an expression or addition of a new language should be minimal
- Adaptations:
- different read orders (i.e. right-to-left vs. left-to-right, and even combinations thereof i.e. when dealing with numbers)
- different widths for simple expressions and impact on, say, button widths
- different simple numeric expressions (i.e. commas vs. periods, long billion vs. short billion)
- different quote styles (<<text>>, "text", 'text', etc.)
- different date/time formats and expressions
- different units of measure (feet vs. meters, different latitude-longitude formats). Often context-dependent! (i.e. knots vs. mph for sea vs. land travel).
- Modifiers - esp. important when producing text dynamically
- verb tense forms
- feminine vs. masculine modifiers
- context-dependent expressions (i.e. use of articles)
- counts and counting expressions - i.e. Japanese use different words or modifiers to describe counts of different things (i.e. stick-shaped objects vs. vaguely-spherical objects), or so I've been told.
- different complex numeric expressions (articles like a, an, the; how small numbers (like 'twenty-two') are expressed;
- Performance: keep the screens zippy. This is likely a relatively minor challenge compared to the other ones: in practice, a simple caching/memoization (possibly multi-staged for local persistence if the official translations are across a network) should keep storage model of resources pretty much irrelevant for performance.
Assume an international character-set and a common string encoding (such as UTF-8), and associated fonts. No need to increase the challenge even further by introducing Big-5, explicit font management, and so on.
Also, International UI is a subset of all 'accessibility' issues. We aren't concerning ourselves with support for the blind or deaf or motor-function limited, etc.
Approach 1: DB-centric
Approach 1:
Application
---------------
ResourceID
.
Resource File (one for each language) Typo? This is not a DB
------------------
ResourceID
Text
Approach 2:
Table: translations
-------------
widgetRef
languageRef
transText
Approach 3:
Table: translations
-------------
phraseID
English
Spanish
Japanese
Arabic
Etc.....
transNote
Another variation:
Table: messages
---------------
msgID // could be number or mnemonic
desc // what the programmer wants the message to 'mean'
.
Table: Languages
----------------
langID
langDescript // example: Spanish, French, English, etc.
.
Table: languageMessages
-----------------------
msgRef
langRef
msg
The reason "English" is in the Messages table instead of the languageMessages table is to serve as a description also. If you find this odd, then don't use it.
I think it useful: it can serve as a description by the programmer of what a message is intended to mean, and translators can still usefully translate it to English. In practice, translators need more than just a short-text English description to perform an intelligent translation. They need context, and this description might provide it. I hope you don't mind, but I modified the above to express this different view.
Approach 2: flat files
The text of every UI component is loaded from resource files. The original monolingual resource file is sent to translators. They provided equivalent text in different languages, each in its own resource file. Users pick which language they want and the app uses the appropriate resource file.
A very long discussion ensued. It boiled down to this:
- Participants who'd internationalized applications testified that flat files and standard tools do the job just fine.
- A participant who hadn't internationalized an application argued that an RDBMS might be useful if there are other factors involved (e.g., integrated resource editing and billing). However, as these concerns were not part of the problem statement nor did they arise in practice, this was considered MovingGoalPosts.
- Dear EditWarrior? who keeps removing my comments. Please explain what goalposts were moved? Did the original goals dictate implementation? If so, that is not the proper way to request a system. You dictate needs, not implementation. Files are not necessarily simpler than using DB's. This page appears to be C-centric and file-centric in its thinking. Why assume people are using only C-family langs? Most biz apps I work on already use a DB. --top
- First-hand experience of working on localization efforts involving over 30,000 strings here. Having used both flat files and a database-driven application to do so, the flat file wins - hands down, every time. Ever tried to track "fuzzy" strings or run diffs against a database? I have. ItSucks?, big time. Even to use a DB as a repository from which the actual code is spat out sucks, because you still have to check the strings into the database, and don't gain anything. Conflict resolution doesn't happen, because you don't know that someone else has already translated a string in advance of you doing it.''
- I agree that there are tools common to files that are not as common to DB's, such as text difference detectors. And, what are examples of "fuzzy" strings? Some RDBMS have text indexing capabilities. How is "checking strings into a database" more complicated than with files? You seem to be talking about using file-based source-code version managers. But a scenario I gave (perhaps deleted in the EditWar) was where you don't want translators to have direct file access. Say you wish to create a web-based form for distributed international translators to work on various translations so that they aren't touching actual code files. Tracking, who, when, what, where, and costs would seem easier in a DB. Files have no inherent AcId handling, for example. Again, I am not saying that file-based editing does not have its place, but some got upset at me for discussing a DB version and deleted stuff. Quite rude. --top
- In GNU gettext, strings are marked as "fuzzy" where the underlying (English) string has changed and a translation automatically appropriated - sometimes this will be a change in punctuation or spelling, other times the meaning of the phrase will have changed dramatically. It's intended to cut down on having to re-enter a translated string just because someone made a mistake in the original, but without having potential mistranslations exposed to the user. The scenario you described in "Say you wish ..." is precisely what I've had to put up with in the past, and it was useless. It utterly failed on non-ASCII characters because there was no way to guarantee the encoding would be correct in all cases, whereas we had a set of repeatable, known-good settings for people to set up their editors. Co-ordination was impossible, as there is no way of preventing two people from working on the same strings, whereas when passing files around it didn't matter - run them through diff(1) and change the credits field, and its sorted. The Web-based system also had the obvious weakness of requiring that people had access to the Web. I believe AcId is a RedHerring here, because they don't help in this context - in fact, atomicity is something you explicitly don't want when localizing fast-changing software.
- It is possible to use file-based parsing utilities on any database text. It just requires a few more steps to implement to go from db-to-file-to-db. If you want to use "locks" or branch versioning instead of (or in addition to) AcId, that is also possible. And lack of web access is an odd complaint in 2008. If you have a distributed team of translators in many countries working on many projects, then a database with a web front-end seems more and more appropriate.
- You still haven't provided a reason to move. Not working for people who have might have access to email to exchange the files but no long-term Web access is a show-stopper (Hint: parts of GnomeDesktop? are available in AmharicLanguage?, NepaliLanguage? and TurkmenLanguage? - languages spoken primarily in countries not known for their high-speed cable). You've suggested plenty of things which suggest that using a database might be JustAsGood as using normal files (remember, in most sane setups the strings are stored in catalog files separately from the rest of the code), but you haven't suggested actual concrete ways in which it might be better. Be warned that I'll be happy to shoot down any cute theory that doesn't work in practice (such as your "this seems more and more appropriate" nonsense).
- "Move"? WAN is no longer the default. Text is usually fine over dial-up browsing if the website is designed properly (no artsy newbie web designers running amok). Further, if such is too slow for HTML forms, the it would be too slow for say FTP also. Thus, I reject the high-speed argument. And keeping track of changes, charges, flakers, and the other stuff mentioned via files stinks. I know some people enjoy reinventing databases with file systems, but I am not one of them. Anyhow, I never proposed that such a solution was ideal for all situations. It is one solution among many. If it does not fit your particular situation, then use the other approaches. It's good to have different options. I've already agreed a DB would be overkill for a short-lived or small project. But if you were a shop that specialized in international software and had wide-ranging translators, then keeping track of more attributes and a web-based editor interface would warrant a DB to track the intersection of money, text, changes, and staff. That's common in the industry. They don't use file systems for that kind of thing unless they are afraid of DB's.
- Thus, I reject the high-speed argument. What high-speed argument? The argument put forward was in relation to access in general. I still don't see why people might move to a bespoke DB, when many projects track their plain-text string files in version control, which is JustAsGood. Evidently, you couldn't integrate an external database into source control without a lot of additional work, whereas if you're using a sane sstring file format and good versioning it's a lot of work for very little gain. As I said - cute theories, but no practical merit.
- Remotely? WAN's don't work so well across across the ocean, can be a security risk, and are difficult to setup at the remote "station". Its seems clear that a web-based tool would be much easier for an often-changing, globally-distributed team; which may be necessary to get varied language experts without lots of hassle. You guys need to get with the times. You are thinking 80's style. Web-based apps are briskly replacing WAN-based apps.
A compromise variation got deleted somehow here. I also proposed using a DB to store the language strings, but generating code so that the app performs faster. In other words, the DB is a developer-side "language manager", but not necessarily the run-time source of the language strings. --top
Language Width Differences
In some languages the word/phrase equivalent to "Save" may be say 25 characters. Designing a UI to make room for the largest possible string in all common languages could be really messy. Even if the UI engine had auto-stretch features, it may be difficult to control screen width and/or auto-wrapping behavior. Computers have no esthetic sense. It's a sticky problem.
However, there's two things that could reduce the pain. First, have horizontally scalable text such that the aspect ratio changes for longer phrases. In other words, "squish fonts" to make the aspect narrower to force a fit.
And second, if there is still not enough space or the aspect adjustment above makes the text hard to read, then make sure mouse roll-overs are working to pop-up the full-sized description when needed. -t
Fixed-size, non-fluid, non-stretchy, brochure-style UIs haven't been considered good design for over a decade, but mobile devices of varying resolutions are forcing developers to adopt fluid, dynamic, automatic UI layout. Stop poking about in forms designers -- let the code do it dynamically. It's got a better understanding of layout aesthetics than you do, anyway.
Mobile devices often don't do anything sophisticated enough to test the limits. They are very MS-Wizard-like in their one-step-at-a-time UI's to change settings etc. While good enough for casual use, it's not sophisticated enough when user productivity matters, usually in the work-place. Related: CoordinateVersusNestedGui.
You're quibbling over the example, but missing the point. Coordinate-based GUIs are obsolete.
That topic explains and visually demonstrates why it's nice to have the option of both and different situations require different approaches. Computers lack esthetic sense and don't get things "right" in tight quarters. Maybe "tight quarters" is a design smell, but if the boss wants to cram as much as possible onto one screen, that's his/her prerogative. The picky and illogical people who pay us want things a certain way and don't want lectures on architecture purity or claims of flexibility 5 years down the road. They want fine control now. DilbertIsNoJoke. -t
See also: DoubleDispatchExample
CategoryInternationalization, CategoryExample