How to implement audio transcription (speech-to-text) in LiveCode on Windows
Whisper is an AI-based automatic speech recognition system. It can be used to produce a transcription for various media files and translations of such. This lesson will cover installation of Whisper and a basic form of using Whisper as a medium to transcribe your own speech in LiveCode.
Install Python
Whisper and some of the libraries it uses are Python projects. It is unclear if the latest version of Python will work with Whisper but at the time of writing 3.12.x will work as intended. After installing Python, you may wish to locate the folder containing Python and adding it your PATH environment variable.
Install pip
pip is a package manager for Python. If it wasn't installed with Python, download and run this script with Python. To run the script, open the folder containing it, type cmd into the address bar (this should open command prompt and set the location to that folder), then in the command prompt type:
python get-pip.py
Again, check environment variables to ensure it is included in PATH.
Install PyTorch
With pip installed, you can now use it to install PyTorch, whose libraries Whisper depends on. Open cmd.exe as administrator and enter the following:
pip3 install torch torchvision
Note: You may find towards the end that you encounter an error like "OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed. Error loading "...\torch\lib\c10.dll" or one of its dependencies." In that case, you should attempt either to download and install the Microsoft Visual C++ Redistributable or, if that does not help, downgrade PyTorch to an earlier version with something like the following in cmd.exe:
pip install torch==2.8.0 torchvision==0.23.0 torchaudio=2.8.0
Install Chocolatey (optional)
For the next step, you may wish to download either the licensed or individual edition of the Chocolatey package manager. The exact sequence for installing each of those differs but either way, you will need to use PowerShell.
Install FFmpeg
ffmpeg is an open source multimedia project that Whisper relies on to interpret audio. If you acquired Chocolatey as above then in PowerShell, enter:
choco install ffmpeg
Otherwise, you can download from it here: https://www.ffmpeg.org/download.html. While only the source code is offered on that website, it does provide external links to other parties that offer compiled binaries for it.
Install whisper
Now that all of the prerequisites are handled, we can install Whisper. In command prompt, enter:
pip install -U openai-whisper
Once it is installed, you can check that it works by opening command prompt, use cd to navigate to a folder containing spoken-word audio and type:
whisper <filename>
where <filename> is the name of audio file in question. If Whisper is working correctly, it will do the following:
- Respond with, among other things, what it believes to be a transcript of the audio file (see below).
- Create a multitude of files of differing formats in the same folder, each containing that transcript.
Creating the app
It is now time to have LiveCode use Whisper. Starting with a new stack, add a single button and a single field. The purpose of the button will be to start recording if it is not already recording and if it is recording, to stop and have Whisper tell it what was recorded, which will be what is output to the field.
For the button script, rather than have it directly control the recording, it will simply indicate whether or not it would tell a card script to record or stop recording.
on mouseup
get the label of me
if it is "Record" then
startRecording
set the label of me to "Stop"
else
stopRecording
set the label of me to "Record"
getTranscript
end if
end mouseup
The card script will be a little more involved. It will be responsible for ensuring there is somewhere for the stack to place files and manage the internal state of the stack.
local sFolder, sFile, sCamera
on opencard
put specialFolderPath("desktop") & "/recordings" into sFolder
if there is no folder sFolder then create folder sFolder
put sFolder & "/test.wmv" into sFile
put "thisRecorder" into sCamera
end opencard
on closeCard
deleteCamera
end closeCard
Because this stack is being created with specifically Windows in mind, we can not use the start/stop recording commands as they are not supported. We find our workaround in the cameraControl functionality. We will set up the camera and we will define the commands that the button script will invoke:
on createCamera
cameraControlCreate sCamera
cameraControlSet sCamera, "audioDevice", "default"
cameraControlSet sCamera, "videoDevice", ""
end createCamera
command deleteCamera
if sCamera is in cameraControls() then
cameraControlDelete sCamera
end if
end deleteCamera
command startRecording
if there is a file sFile then
delete file sFile
end if
if there is a file (sFolder & "\test.txt") then
delete file (sFolder & "\test.txt")
end if
deleteCamera
createCamera
cameraControlDo sCamera, "startRecording", sFile
end startRecording
on stopRecording
if sCamera is in cameraControls() then
cameraControlDo sCamera, "stopRecording"
end if
end stopRecording
Lastly, it will also pass the recording to Whisper (via the shell() command), asking it to transcribe the audio and place a txt version of the transcription somewhere that the stack will find it, then read and output the file created. There are several flags invoked in this:
- --model: The model used. By default this is "turbo" but "tiny" has been chosen here for the sake of performance.
- --language: Assumed language. Whisper will by default attempt to detect this but again, for performance it is being specified in this example.
- -o: Output folder. We have chosen to make that the same as the output folder for the audio recording.
- -f: File output format(s). By default, there is a multitude of outputs but our example only requires one.
command getTranscript
local tCommand
put "whisper --model tiny --language en" && "-o" && quote & sFolder & quote \
&& "-f txt" && quote & sFile & quote into tCommand
get shell(tCommand)
put url("file:" & sFolder & "\test.txt") into field 1
end getTranscript

0 Comments
Add your comment