Amazon Transcribe, an automatic speech recognition (ASR) service, has expanded automatic speech recognition to over 100 languages, besides offering new AI capabilities for customers. Announced during the AWS re:Invent event, Amazon Transcribe can now recognize more spoken languages and makes it easy to offer transcription. AWS customers can use Transcribe to add speech-to-text capabilities to their apps on the AWS Cloud.
On the official blog, Sumit Kumar and Vivek Singh, of AWS, Amazon, said: “It is trained on millions of hours of unlabeled audio data from over 100 languages. The training recipes are optimised through smart data sampling to balance the training data between languages, ensuring that traditionally under-represented languages also reach high accuracy levels.” In late 2022, Amazon Transcribe supported 79 languages.
As an application, Amazon Transcribe delivers accuracy improvement between 20 per cent and 50 per cent across most languages. On telephony speech, which, according to the company, is a challenging and data-scarce domain, accuracy improvement is between 30 per cent and 70 per cent. Amazon Transcribe also offers automatic punctuation, custom vocabulary, automatic language identification, and custom vocabulary filters. It can recognise speech in audio and video formats and noisy environments.
“Enabled by the high accuracy of Amazon Transcribe across different accents and noise conditions, its support for a large number of languages, and its breadth of value-added feature sets, thousands of enterprises will be empowered to unlock rich insights from their audio content, as well as increase the accessibility and discoverability of their audio and video content across various domains. For instance, contact centers transcribe and analyse customer calls to identify insights and subsequently improve customer experience and agent productivity,” said company representatives.
Besides AWS, AI-powered transcription services is being offered by several other companies. Otter, for example, has been providing AI transcriptions to consumers and enterprises for a while and the results are top-notch. Meta offers something slightly different but it has announced it is working on a generative AI-powered translation model that recognises nearly 100 spoken languages. SeamlessM4T, which stands for Massively Multilingual and Multimodal Machine Translation, can translate speech-to-text and text-to-text for nearly 100 languages.