Meta develops artificial intelligence model for direct speech-to-speech translation for up to 101 languages

Humans may be edging closer to a digital version of the Babel fish — a fictional creature in Douglas Adams’s science fiction creation The Hitchhiker’s Guide to the Galaxy — that enables instant translation when placed in the ear.

Scientists at the US technology company Meta have developed an artificial intelligence (AI) model that can facilitate direct speech-to-speech translation for up to 101 languages, including several spoken in India.

The model, named SEAMLESSM4T, outperforms existing state-of-the-art systems and could pave the way for rapid universal translation, bypassing limitations associated with current speech-to-text-to-speech translation systems, the researchers said
in the journal Nature on Wednesday.

SEAMLESSM4T — short for massively multilingual and multimodal machine translation — supports speech-to-speech translation from 101 to 36 languages and speech-to-text translation from 101 to 96 languages, Marta Costa-Jussa at Foundational AI Research, Meta and her colleagues reported.

The model, which was publicly released for the science community two years ago, also facilitates text-to-speech translation from 96 to 36 languages, text-to-text translation for 96 languages, and automatic speech recognition for 96 languages.

Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Sindhi, Tamil and Telugu are among the Indian languages supported by the model for speech and text input and text output. The model also supports speech output for Bengali, Hindi, Kannada and Telugu, among other languages.

To build the model, the researchers created a corpus of over 4,70,000 hours of automatically aligned speech translations and trained the model’s translation component through a large dataset containing over 4.5 million hours of multilingual spoken audio.

SEAMLESSM4T translates text with up to 23 per cent more accuracy than existing systems and is about 50 per cent more resilient against background noise and speaker variations than previous state-of-the-art systems, the researchers said in their paper.

“I’m excited about this development,” Tanel Alumae, professor at the language technology laboratory at Tallinn University of Technology, Estonia, who was not associated with the work, told The Telegraph.

“The biggest virtue of this work is that all the data and code are publicly available…. This allows other researchers to finetune the model for their own applications,” he said.

Alumae and his colleagues, for instance, have demonstrated that the model can translate between Estonian, English and Russian. “We have also successfully used it for things like emotion recognition from speech and detecting early cognitive decline, such as Alzheimer’s disease, from speech,” he said.

Conventional AI models for speech-to-speech translation typically use a cascaded approach in which speech is first transcribed into text in another language before the text is converted back into speech.

But such speech-to-text-to-speech translation models come with limitations such as a phenomenon called “hallucinations” in which the model introduces words or phrases that were never uttered by the original speaker, Allison Koenoecke, an assistant professor of information science at Cornell University who was not associated with the work, said in a commentary on the new model in the same issue of Nature.

The work by the Meta team is a promising advance towards speech technology that rivals the stuff of science fiction, Alumae wrote in an accompanying commentary.

Meta develops artificial intelligence model for direct speech-to-speech translation for up to 101 languages

SEAMLESSM4T outperforms existing state-of-the-art systems and could pave way for rapid universal translation, bypassing limitations associated with current speech-to-text-to-speech translation systems

RELATED TOPICS

Saif Ali Khan attacked with knife at his Mumbai residence, hospitalised: Police

War in Gaza Strip likely to end: Israel, Hamas reach ceasefire deal to end hostilities

After years of campaign against Adani, US-based short-seller Hindenburg Research shuts down

I feel, in coming times, conflicts and wars will become more violent and unpredictable.

Cong, BJP spar over Indian State: Rahul slams Bhagwat for 'true independence' remark

Pattaya is passé, time to try Galwan? Historic war sites open for 'battlefield tourism'

Mahakumbh Mela 2025 debut: Followers worship 'Saint' Mulayam's statue every day

J&K police detain six after remarks against companions of Prophet Mohammad cause unrest

TikTok users move in large numbers to Chinese rival app Xiaohongshu

Border Security Force and Border Guard Bangladesh to hold DG-level talks in February

Roman Polanski to Harvey Weinstein: Hollywood celebrities who faced sexual abuse charges

We would like to apologise for this inadvertent error [remarks on Indian election]