How Do I Use UTF-8 Text With Fields?

UTF-8 is a popular text encoding format. While not natively supported in Revolution fields you can display UTF-8 text using some of the built-in Revolution functions. This lesson will show you how.

UTF-8 Encoded Text File

UTF-8 Encoded Text File

This is the text file I will be trying to display in a Revolution field. Notice that the encoding is UTF-8.

The Revolution Field

The Revolution Field

Here is the field I will display the text in. The button allows me to select a text file and then assigns the contents to the field.

Setting Field Contents: Attempt #1

Setting Field Contents: Attempt #1

Here is a simple script that does the following:

1) Asks the user to select a file.

2) Displays the contents of the file in a field named "Text" by setting the text property of the field.

The script is assigned to the button.

The Result

The Result

After clicking on the button and selecting my UTF-8 encoded file this is what I see in the field when running on OS X. Not exactly the result I was after. What went wrong? The text property of a field expects text encoded in Mac OS Roman on OS X or ISO 8859-1 on Windows. The characters in my UTF-8 encoded text file are not encoded using the Mac OS Roman character set.

Setting Field Contents: Attempt #2

Setting Field Contents: Attempt #2

A field also has a unicodeText property. Let's see what happens if I set the unicodeText property of the field (1) to the contents of my text file.

Notice that I switched from using the file keyword to the binfile keyword when loading the file contents (2). When using file Revolution converts line endings of the text read in. By using binfile Revolution will read in the data without performing any conversions.

The Result

The Result

The result is actually worse then my first attempt. The problem is that the unicodeText property of a field expects the text to be encoded using UTF-16 with the same byte order as the processor of the computer that Revolution is currently running on.

So What Is the Solution?

So What Is the Solution?

The solution is to use the uniencode function to convert the UTF-8 encoded text into UTF-16 encoded text (1). By passing UTF8 as the second parameter (2) Revolution will perform the necessary conversion.

The Result

The Result

Now the text displays correctly.

Retrieving UTF-8 Text From a Field

Retrieving UTF-8 Text From a Field

To retrieve UTF-8 encoded text from a field I can just reverse the operation. I retrieve the unicodeText property of the field (1) and then pass that text through the uniDecode function (2), passing UTF8 as the 2nd parameter (3).

0 Comments

Add your comment

E-Mail me when someone replies to this comment