LiveCode LessonsHow To - LiveCode Sample Scripts TextListing all the unique words in a piece of text

Listing all the unique words in a piece of text

This lesson demonstrates how to list each unique word in a piece of text.

The uniqueWords function

The uniqueWords function takes one parameter, pText. The function uses a repeat loop to check each word in turn creating an array variable named tWordsList. Each element of tWordsList is associated with a different word; the element's key is the word, and the element's contents is a number. For example, if the first word of the string is "Cans", then after the first word is processed, the array "wordsList" contains one element, named "Cans", which contains the number 1.

When a word is processed, the handler adds 1 to the element corresponding to that word. If there is no array element with that name already, one is created automatically by the add command. In general, changing a variable, a chunk in a variable, or an element in an array variable creates the variable, chunk, or element automatically, if it doesn't already exist. If there is already an element with that name, that is, if the word already exists in the array, 1 is added to that existing element.

After all the words have been processed, the function exits the repeat loop. At this point, the array variable tWordsList contains an element for each unique word, whose name is the word itself. The keys of tWordsList, therefore, is a list of all the unique words in the string.

LiveCode chunk expressions

This form of word-by-word processing is possible because LiveCode uses chunk expressions to manage text. A chunk expression is a way of describing a specific portion of a container. LiveCode can directly address individual words, characters, lines, and items (delimited by any character).

In this example, we use the repeat for each chunk form of the repeat control structure:

repeat for each word tWord in tString

This repeat structure loops through each word in the parameter pString, putting the current word into a variable called tWord. You can also loop through other chunk types in a repeat structure, processing each character, line, or item.

The uniqueWords function code

function uniqueWords pString
	local tWordsList
   
	repeat for each word tWord in pString
		add 1 to tWordsList[tWord]
	end repeat

	return the keys of tWordsList
end uniqueWords

A note on efficiency

This example uses the repeat for each word form of the repeat control structure. When looping over chunk types in a string, this form is the fastest. The following repeat structure is functionally equivalent to the one in this example, but is much slower:

repeat with x = 1 to the number of words in pString
	add 1 to wordsList[tWordsList x of pString]
end repeat

0 Comments

Add your comment

E-Mail me when someone replies to this comment