A hosted version of the open-source Whisper speech-to-text model that the business released in September, Whisper API was launched by OpenAI on Wednesday.
Whisper is an artificial speech recognition system that costs $0.006 per minute and, according to OpenAI, enables “robust” transcription in numerous languages as well as translation into English from other languages.
Among the file types it accepts are M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.
Speech recognition systems, which are at the heart of software and services from digital behemoths like Google, Amazon, and Meta, have been developed by countless businesses.
But according to OpenAI president and chairman Greg Brockman, what sets Whisper apart is that it was trained using 680,000 hours of multilingual and “multitask” data that was gathered from the web, which increased recognition of distinctive dialects, background noise, and technical jargon.
“We released a model, but that actually was not enough to cause the whole developer ecosystem to build around it,” Brockman said in a video chat with TechCrunch yesterday afternoon.
“The Whisper API is the same large model that you can get open source, but we’ve optimized to the extreme. It’s much, much faster and extremely convenient.”
Whisper is not without flaws, though, especially when it comes to “next-word” prediction. Whisper may include words in its transcriptions that weren’t actually said because the system was trained on a lot of noisy data, presumably because it’s simultaneously trying to anticipate the next word in audio and transcribe the audio recording.
Whisper also doesn’t function equally well across linguistic barriers, exhibiting a higher error rate for speakers of languages underrepresented in the training set.
Despite this, OpenAI believes that Whisper’s transcription capabilities will be applied to enhance already-existing software, services, and applications.
The Whisper API is already being used by the AI-powered language learning app Speak to enable a brand-new in-app virtual speaking companion.
That may be quite profitable for OpenAI, a firm sponsored by Microsoft if it can successfully break into the speech-to-text business in a significant way. One estimate puts the segment’s potential market value at $5.4 billion by 2026, up from $2.2 billion in 2021.
Brockman stated, “Our picture is that we really want to be this universal intelligence,”
We aspire to be a force multiplier for that attention by being able to extremely flexibly take in any kind of data you have or task you want to complete.