How to Use Whisper: A Free Speech-to-Text AI Tool by OpenAI
Abstract: and we want to just test it out to see what it can do. Table of Contents Quick Video Demo Using Whisper For Speech Recognition Using Google Colab Open
Whisper is automatic speech recognition (ASR
) system that can understand multiple languages. It has been trained on 680,000 hours of supervised data collected from the web.
Whisper is developed by OpenAI, it’s free and open source, and p
Speech processing is a critical component of many modern applications, from voice-activated assistants to automated customer service systems. This tool will make it easier than ever to transcribe and translate speeches, making them more accessible to a wider audience. OpenAI hopes that by open-sourcing their models and code, others will be able to build upon their work to create even more powerful applications.
Whisper can handle transcription in multiple languages, and it can also translate those languages into English.
I’m not very knowledgeable in speech recognition, but given how well this tool performs, and considering the fact that it’s free and open-source, I think it is fantastic. This will probably be used by a lot of people who don’t have the time or money to invest in a commercial speech recognition tool.It will also be used by commercial software developers who want to add speech recognition capabilities to their products. This will help them save a lot of money, since they won’t have to pay for a commercial speech recognition tool.
I think this tool is going to be very popular, and I think it has a lot of potential.
In this tutorial we’ll get started using Whisper in Google Colab. We’ll quickly install it, and then we’ll run it with one line to transcribe an mp3
file. We won’t go in-depth, and we want to just test it out to see what it can do.
- Quick Video Demo
- Using Whisper For Speech Recognition Using Google Colab
- Open a Google Colab Notebook
- Enable GPU
- Install Whisper
- Upload an Audio File
- Run Whisper to Transcribe Speech to Text
- Using Whisper Models
- Conclusion
- Useful Resources & Acknowledgements
This is a short demo showing how we’ll use Whisper in this tutorial.
Watch this video on YouTube Using Whisper For Speech Recognition Using Google Colab Google Colab is a cloud-based service that allows users to write and execute code in a web browser. Essentially Google Colab is like Google Docs, but for coding in Python.
You can use Google Colab on any device and you don’t have to download anything.
If you don’t have a powerful computer or don’t have experience with Python, using Whisper on Google Colab will be much faster and hassle free. For example, on my computer (CPU
I7-7700k/GPU
1660 SUPER) I’m transcribing 30s
in a few minutes, whereas on Google Colab it’s a few seconds.
First we’ll need to open a Colab Notebook. To do that you can just visit this link https://colab.research.google.com/#create=true and Google will generate a new Colab notebook for you. Alternatively you can go anywhere in your Google Drive > Right Click (in an empty space like you want to create a new file) > More > Google Colaboratory. A new tab will open with your new notebook. It’s called Untitled.ipynb
but you can rename it anything you want.
Next we want to make sure our notebook is using a GPU. Google often allocates us a GPU by default, but not always.
To do this, in our Google Colab menu go to Runtime
> Change runtime type
.
Next a small window will pop up. Under Hardware accelerator
there’s a dropdown. Make sure GPU
is selected and click Save
.
Now we can install Whisper. (You can also check install instructions in the official Github repository).
To install it just paste the following lines in a cell. To run the commands click the play button at the left of the cell or press Ctrl + Enter
. The install process should take 1-2 minutes.
!pip install git+https://github.com/openai/whisper.git !sudo apt update && sudo apt install ffmpegInstall Whisper Upload an Audio File
Now we can upload a file to transcribe it. To do this open the File Browser at the left of the notebook, by pressing the folder icon
.
Now you can press the upload file
button at the top of the file browser, or just drag and drop a file from your computer and wait for it to finish uploading.
Next we can simply run Whisper to transcribe the audio file using the following command. If this is the first time you’re running Whisper, it will first download some dependencies.
!whisper "Rick Astley - Never Gonna Give You Up Official Music Video.mp3"
In less than a minute it should start transcribing.
Whisper Transcribing Never Gonna Give You Up by Rick AstleyWhen it’s finished you can find the transcription files in the same directory, in the file browser:
Transcription Files Using Whisper ModelsWhisper comes with multiple models. You can read more about Whisper’s models here.
Whisper’s Models A model is a statistical representation of the speech to text engine. The model is trained to recognize speech and convert it to text for the user. There are many different types of models, each designed for a specific purpose.By default it it uses the small
model. It’s faster, but not as accurate as a larger model. For example let’s use the medium
model.
We can do this by running the command:
!whisper AUDI_FILE – model medium
In my case:
!whisper "Rick Astley - Never Gonna Give You Up Official Music Video.mp3" – model mediumUsing the Whisper Medium Model
The result is more accurate when using the medium
model than the small
one.
In this tutorial we covered the basic usage of Whisper by running it via the command-line in Google Colab. This tutorial was meant for us to just to get started and see how OpenAI’s Whisper performs.
You can easily use Whisper from the command-line or in Python, as you’ve probably seen from the Github repository. We’ll most likely see some amazing apps pop up that use Whisper under the hood in the near future.
Useful Resources & Acknowledgements- The Github Repository for Whisper – https://github.com/openai/whisper. It has useful information on Whisper, as well as some nice examples of using Whisper from the command-line, or in Python.
- OpenAI Whisper – MultiLingual AI Speech Recognition Live App Tutorial – https://www.youtube.com/watch?v=ywIyc8l1K1Q. A very useful intro on Whisper, as well as a great demo on how to use it with a simple Web UI using Gradio.
- Hacker News Thread – https://news.ycombinator.com/item?id=32927360. You can find some great insights in the comments.