Adapted from a newsletter article by Devin Asay
If you have ever tried to create stacks in a language other than English and the more common West European languages you may have run into the problem of how to produce all the character glyphs that the language requires. Fortunately, Unicode is there to help us out. The following lesson will teach you how to use Unicode in your Livecode stacks.
Note: This lesson has been tested and works as described on Mac and Windows platforms. Some aspects of it may not conform exactly as described on Linux.
Important Deprecation Notes:
- As of LiveCode 7.0 the unicodeText property has been deprecated. It will continue to work as in previous versions but should not be used in new code as the existing behaviour is incompatible with the new, transparent Unicode handling (the resulting value will be treated as binary data rather than text). This functions is only useful in combination with the also-deprecated uniEncode and uniDecode function.
- As of LiveCode 7.0 the uniEncode function has been deprecated. It will continue to work as in previous versions but should not be used in newcode as the existing behaviour is incompatible with the new, transparent Unicode handling (the resulting value will be treated as binary data rather than text). This function is only useful in combination with the also-deprecated uniDecode function and unicodeText field property. Instead, for converting text between encodings, use the textEncode and textDecode functions.
Step 1: Typing unicode text into fields
This is a good place to start because it's the easiest. Livecode fields can handle Unicode text input without any intervention by the developer. That is because Livecode simply uses the text input methods supplied by the host operating system. So if you want to type Japanese characters into a field, you simply select the Japanese text input system you want to use and start typing. Livecode knows how to render it properly in the field, and it is then ready for use. If you want to learn how to select the text input method on your OS, see the help documentation for that OS.
However, there are a couple of issues with Unicode text input in Livecode. Livecode currently has trouble rendering right-to-left languages like Hebrew and Arabic while you are typing them. Specifically, it will properly render characters in a word from right to left, but when you type a space to begin a new word, the new word is inserted to the right of the previous word, not to the left as it should be. For this reason it is recommended that you create Hebrew and Arabic text outside of Livecode and import it, rather than trying to type them within Livecode.
By default text in Livecode is ASCII text. So let's first look at some of the ways Livecode provides for working with ASCII text encoding. We're all familiar with the rich collection of tools that Livecode provides for working with text. Among them are two functions, charToNum() and numToChar(), that allow us to work with the ASCII value for any character. They work like this (try it in the message box):
Step 2: Use of the charToNum and numToChar functions
You can use the numToChar() function to create a rudimentary ASCII table. Just create a new field, name it "ascii" and run this routine in the message box:
put empty into field "ascii"
repeat with i = 0 to 255
put i & tab & numToChar(i) & crlf after fld "ascii"
That's how these two functions work by default. But you can tell Livecode to expect Unicode values for these two functions by first setting the useUnicode property to true.
An important thing to remember: The useUnicode property only affects the charToNum() and numToChar() functions. No other text operations are affected by this property.
Let's look at how this works in practice. Let's say you have a field "russText" containing the sentence Я люблю тебя. The sentence begins with the upper case Russian letter 'Я'. If you wanted to find out which Unicode code point corresponds to that letter you would do this:
set the useUnicode to true
put charToNum(char 1 to 2 of fld "russText")
Step 3: Using the useUnicode property
Conversely, to render a Unicode character using its code point do this - the letter 'Я' should appear in the field.
set the useUnicode to true
set the unicodeText of fld "russText" to numToChar(1071)
The unicodeText property.
The previous example is a good way to introduce another important tool for using Unicode in Livecode: the unicodeText property. If you want to move unicode text from field to field, you have to use this property. In the normal ASCII world you can just do this:
put field 1 into field 2
However, if you want to put Unicode text into a field you have to set its unicodeText property:
set the unicodeText of fld "newPlace" to the unicodeText of fld "oldPlace"
Step 4: Using the UnicodeText property
Another important thing to remember: The secret to manipulating Unicode text in fields lies in the unicodeText of the field.
So if you want to move chunks of text, you have to refer to chunks of the unicodeText:
1. Copying a Unicode character to another field -
set the unicodeText of fld "letter" to char 1 to 2 of fld "sentence"
2. Moving words -
set the unicodeText of fld "other" to word 1 to 2 of the unicodeText of fld "this"
3. Inserting Unicode text from one field into another -
get the unicodeText of fld "info"
set the unicodeText of fld "info" to it && word 2 of line 2 of the unicodeText of fld "bottom"
Step 5: Converting between single and double-byte encodings
When using Unicode text, especially if you are importing or exporting text from or to other systems or environments, you may need to convert your Unicode to a single-byte encoding system, or vice-versa. The most common reason for doing this is reading and writing UTF-8 files. As I mentioned above, I recommend storing your Unicode text in UTF-8 format if you are planning to share it with others or send it over the internet. UTF-8 is part of the Unicode standard, and is a way to store Unicode (double-byte) text in an ASCII (single-byte) text file. UTF-8 is especially important for encoding Unicode text for use in web browsers and email.
The keys to using UTF-8 text in Livecode are the uniEncode() and uniDecode() functions. Let's say you've gotten some UTF-8 text from a web site and you want to display it in your Livecode stack. You store it in a file called myUniText.ut8. This is how you would read it in:
put url ("binfile:/path/to/file/myUniText.ut8") into tRawTxt
set the unicodetext of fld "display" to uniencode(tRawTxt,"UTF8")
Conversely, to save Unicode text from Livecode to a UTF-8 file, use uniDecode():
get the unicodeText of fld "myUniText"
put unidecode(it,"utf8") into url "binfile:/path/to/file/myUniFile.ut8"
This is another important thing to remember: For reliably transporting Unicode text, convert it and store it as UTF-8 text.
Step 6: Using Unicode in buttons and menus
So far, we've only been talking about Unicode text in fields. Almost none of that applies to buttons, primarily because buttons have no unicodeText property. Instead, the basic approach for displaying Unicode text in buttons and menus consists of two steps:
1. Set the textFont of the button to a Unicode font;
2. Set the label of the button to the desired Unicode text.
Unicode font names in Livecode take the form Font Name,language, where Font Name is the name of any font installed on the system, and language is the name of the language you want, or the term "unicode". For example, for Russian Cyrillic text I might use "Arial,Russian" as the font name; for Japanese, "Osaka,Japanese"; and for Greek, "Geneva,Unicode". Not every language can be used as the second part of a Unicode font name. For a complete list of valid language names see the Livecode Dictionary entry for uniEncode.
One way to assign a Unicode label to a button is to reference some existing Unicode text in a hidden field. Let's say, for example, that we are making a stack for Mandarin Chinese speakers and we want to give our Start button a Chinese label, 開始. We could type or import the Unicode text to a field and use that field as the source text for the button label:
set the textFont of button "start" to "BiauKai,Chinese"
set the label of button "start" to the unicodeText of fld "hiddenChinText"
Note: As of LiveCode 5.5 DP2 it is not longer posible to set the unicode text by marking the text font property with a unicode tag. The unicode nature of of text and/or label properties of buttons, groups and graphics is now an intrinsic property that cannot be altered by script. You can copy and paste unicode text into the label fields in the property inspector instead.
One technique that works well for creating Unicode button labels is to store the Unicode label text in a custom property of the button. When yout do this, store it as UTF-8 text to avoid the byte order problem when moving the stack from machine to machine. So first you would store the unicode text in a custom property:
set the chinLabel of button "start" to unidecode(the unicodeText of fld "hiddenChinText","UTF8")
Once that was in place you would use the custom property as the source of the Unicode text:
set the textFont of button "start" to "BiauKai,Chinese"
set the label of button "start" to uniencode(the chinLabel of btn "start","UTF8")
One more note on Unicode buttons: Because Unicode text doesn't always "travel" well from platform to platform, I usually set Unicode button labels and menu contents each time I go to the card, in a preOpenCard handler.
Step 7: Using Unicode in Ask and Answer dialogs
Ask and answer dialog prompts can have Unicode prompts, but you can't pass Unicode text in the ask and answer command arguments. Instead you use another handy technique for setting Unicode text—store the Unicode as entities in HTML text. Storing the htmlText of a field that contains Unicode text is another reliable way of keeping the Unicode text intact during transfers. It also is the only way to display Unicode text in ask and answer dialog prompts.
To see what this means, let's look at the Chinese start button example above. In the first case we had the Unicode Chinese text 開始 in a text field "hiddenChinText". If I were to examine the htmlText of this field it would look something like this:
Notice that the two Chinese characters are embedded in the htmlText as Unicode entities: 開 and 始. HTML Unicode entities like this will reliably render as the proper Unicode characters in Livecode, regardless of the operating system the stack is running on. So to use Unicode characters in ask and answer prompts, do something like this:
put the htmlText of fld "hiddenChinText" into tChinPrompt
answer tChinPrompt with "Cancel" or "OK"
There is one other advantage of saving Unicode text as HTML entities—it is the best way to save Unicode text with text styles like bold and italic and font attributes like size and color.
Step 8: Setting a Unicode stack title
I'll conclude this lesson with one more piece of functionality in Livecode—the ability to use Unicode text for title of the stack window. Just set the unicodeTitle property of the stack to a valid unicode string. Here's an example:
set the unicodeTitle of this stack to the unicodeText of fld "russTitle"