How to implement audio transcription (speech-to-text) in LiveCode on Windows

Whisper is an AI-based automatic speech recognition system. It can be used to produce a transcription for various media files and translations of such. This lesson will cover installation of Whisper and a basic form of using Whisper as a medium to transcribe your own speech in LiveCode.

Install Python

Whisper and some of the libraries it uses are Python projects. It is unclear if the latest version of Python will work with Whisper but at the time of writing 3.12.x will work as intended. After installing Python, you may wish to locate the folder containing Python and adding it your PATH environment variable.

Install pip

pip is a package manager for Python. If it wasn't installed with Python, download and run this script with Python. To run the script, open the folder containing it,  type cmd into the address bar (this should open command prompt and set the location to that folder), then in the command prompt type:

python get-pip.py

Again, check environment variables to ensure it is included in PATH.

Install PyTorch

With pip installed, you can now use it to install PyTorch, whose libraries Whisper depends on. Open cmd.exe as administrator and enter the following:

pip3 install torch torchvision

Note: You may find towards the end that you encounter an error like "OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed. Error loading "...\torch\lib\c10.dll" or one of its dependencies." In that case, you should attempt either to download and install the Microsoft Visual C++ Redistributable  or, if that does not help, downgrade PyTorch to an earlier version with something like the following in cmd.exe:

pip install torch==2.8.0 torchvision==0.23.0 torchaudio=2.8.0

Install Chocolatey (optional)

For the next step, you may wish to download either the licensed or individual edition of the Chocolatey package manager. The exact sequence for installing each of those differs but either way, you will need to use PowerShell.

Install FFmpeg

ffmpeg is an open source multimedia project that Whisper relies on to interpret audio. If you acquired Chocolatey as above then in PowerShell, enter:

choco install ffmpeg

Otherwise, you can download from it here: https://www.ffmpeg.org/download.html. While only the source code is offered on that website, it does provide external links to other parties that offer compiled binaries for it.

Install whisper

Now that all of the prerequisites are handled, we can install Whisper.  In command prompt, enter:

pip install -U openai-whisper

Once it is installed, you can check that it works by opening command prompt, use cd to navigate to a folder containing spoken-word audio and type:

whisper  <filename>

where <filename> is the name of audio file in question. If Whisper is working correctly, it will do the following:

  • Respond with, among other things, what it believes to be a transcript of the audio file (see below).
  • Create a multitude of files of differing formats in the same folder, each containing that transcript.

Creating the app

It is now time to have LiveCode use Whisper. Starting with a new stack, add a single button and a single field. The purpose of the button will be to start recording if it is not already recording and if it is recording, to stop and have Whisper tell it what was recorded, which will be what is output to the field.

For the button script, rather than have it directly control the recording, it will simply indicate whether or not it would tell a card script to record or stop recording.

on mouseup
   get the label of me
   if it is "Record" then
      startRecording
      set the label of me to "Stop"
   else
      stopRecording
      set the label of me to "Record"
      getTranscript
   end if
end mouseup

The card script will be a little more involved. It will be responsible for ensuring there is somewhere for the stack to place files and manage the internal state of the stack.

local sFolder, sFile, sCamera

on opencard
   put specialFolderPath("desktop") & "/recordings" into sFolder
   if there is no folder sFolder then create folder sFolder
   put sFolder & "/test.wmv" into sFile
   put "thisRecorder" into sCamera
end opencard

on closeCard
   deleteCamera
end closeCard

Because this stack is being created with specifically Windows in mind, we can not use the start/stop recording commands as they are not supported. We find our workaround in the cameraControl functionality. We will set up the camera and we will define the commands that the button script will invoke:

on createCamera
   cameraControlCreate sCamera
   cameraControlSet sCamera, "audioDevice", "default"
   cameraControlSet sCamera, "videoDevice", ""
end createCamera
command deleteCamera
   if sCamera is in cameraControls() then
      cameraControlDelete sCamera
   end if
end deleteCamera
command startRecording
   if there is a file sFile then
      delete file sFile
   end if
   if there is a file (sFolder & "\test.txt") then
      delete file (sFolder & "\test.txt")
   end if
   deleteCamera
   createCamera
   cameraControlDo sCamera, "startRecording", sFile
end startRecording
on stopRecording
   if sCamera is in cameraControls() then 
      cameraControlDo sCamera, "stopRecording"
   end if
end stopRecording

Lastly, it will also pass the recording to Whisper (via the shell() command), asking it to transcribe the audio and place a txt version of the transcription  somewhere that the stack will find it, then read and output the file created. There are several flags invoked in this:

  • --model: The model used. By default this is "turbo" but "tiny" has been chosen here for the sake of performance.
  • --language: Assumed language. Whisper will by default attempt to detect this but again, for performance it is being specified in this example.
  • -o: Output folder. We have chosen to make that the same as the output folder for the audio recording.
  • -f: File output format(s). By default, there is a multitude of outputs but our example only requires one.
command getTranscript
   local tCommand
   put "whisper --model tiny --language en" && "-o" && quote & sFolder & quote \ 
         && "-f txt" && quote & sFile & quote into tCommand
   get shell(tCommand)
   put url("file:" & sFolder & "\test.txt") into field 1
end getTranscript

0 Comments

Add your comment

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.