Uni Code

UniCode != Universal Text Format != UtfEight != UtfSixteen

Post AsciiCode, describes any characters you can imagine (Japanese, Hebrew, Arabic, Korean, Latin...)

IIRC, unification of chinese, japanese and korean versions of chinese letters is a political choice made for unicode. This causes much quarrel.

Its "political" if you don't have to deal with interfaces in, say, Chinese and Japanese at the same time where the character strokes are sometimes considerably different. If you do, it becomes a concrete problem where you can't be satisfied with merely including accurate translation strings, but instead have to tie each language to a subset of fonts (not a simple process unless you have the luxury of a rather large localization department). If you want to provide a polished user experience you have to double check that those fonts look correct, because the actual text you are providing is not capable of producing an accurate glyph because of this choice to merge all CJK together. --CraigEverett?

The best single-webpage introduction to Unicode for programmers: http://www.joelonsoftware.com/articles/Unicode.html

Small nitpick: He confuses UTF-16 and UCS-2. Very good introduction to the subject, though. Presents the motivations and some basics about the issue, including how it affects everyday C++ programming.

Recently seen bumper sticker:


which I see in my browser as:

Too bad UtfSeven never caught on for mail. When everything is Base64-encoded UTF-8 or latin-1, you can't just look at the archives using a simple text pager, or grep them. --ClaesWallin

See http://www.unicode.org/

EditText of this page (last edited January 18, 2013) or FindPage with title or text search