The purpose of this exercise is to gain some appreciation of the problems of internationalization, specifically the wide variations in languages and scripts; bidirectional text; and the differences between character sets, fonts, and character encodings.

You will need Java installed on your laptop, so that you can run the sample program. You'll also need a modern web browser like Firefox/Safari/Opera, and a reasonably recent version of Windows, Mac OS, or Linux. Older platforms may have weaker support for i18n.


Run the CharacterEncoding program, just so you can see what its interface looks like. Then examine its code,, particularly the main constructor for the class, with an eye for internationalization. What would need to be changed to internationalize it?

Character Encodings

Now type some text into the box labeled Text. What do the other two boxes display?

Now copy the word été into the Text box. What do you observe?

Now visit this Russian Wikipedia page and copy a word of it into the Text box. What do you observe?

Finally, visit this Japanese Wikipedia page and copy a few characters of it into the Text box. What do you observe?


Visit this Arabic Wikipedia page or this Hebrew Wikipedia page. How is the page laid out, relative to other Wikipedia pages?

Try to understand the effect of bidirectional text on editing by making selections. What happens when you select multiple lines? Notice that the page has English words embedded in it, like "International Business Machines." Make some selections that start in one language and end in the other. What happens?

If you have Firebug installed, you can use Inspect to highlight a paragraph, drill down into it in Firebug's tree display, and actually edit its text. Explore bidirectional editing by changing both the Arabic/Hebrew text and the English text. (You'll need to use copy-and-paste to simulate editing the Arabic or Hebrew, unless you know how to enter those characters at your keyboard.)