ADVERTISEMENT

Meta develops artificial intelligence model for direct speech-to-speech translation for up to 101 languages

SEAMLESSM4T outperforms existing state-of-the-art systems and could pave way for rapid universal translation, bypassing limitations associated with current speech-to-text-to-speech translation systems

Representational image File picture

G.S. Mudur
Published 16.01.25, 06:26 AM

Humans may be edging closer to a digital version of the Babel fish — a fictional creature in Douglas Adams’s science fiction creation The Hitchhiker’s Guide to the Galaxy — that enables instant translation when placed in the ear.

Scientists at the US technology company Meta have developed an artificial intelligence (AI) model that can facilitate direct speech-to-speech translation for up to 101 languages, including several spoken in India.

ADVERTISEMENT

The model, named SEAMLESSM4T, outperforms existing state-of-the-art systems and could pave the way for rapid universal translation, bypassing limitations associated with current speech-to-text-to-speech translation systems, the researchers said
in the journal Nature on Wednesday.

SEAMLESSM4T — short for massively multilingual and multimodal machine translation — supports speech-to-speech translation from 101 to 36 languages and speech-to-text translation from 101 to 96 languages, Marta Costa-Jussa at Foundational AI Research, Meta and her colleagues reported.

The model, which was publicly released for the science community two years ago, also facilitates text-to-speech translation from 96 to 36 languages, text-to-text translation for 96 languages, and automatic speech recognition for 96 languages.

Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Sindhi, Tamil and Telugu are among the Indian languages supported by the model for speech and text input and text output. The model also supports speech output for Bengali, Hindi, Kannada and Telugu, among other languages.

To build the model, the researchers created a corpus of over 4,70,000 hours of automatically aligned speech translations and trained the model’s translation component through a large dataset containing over 4.5 million hours of multilingual spoken audio.

SEAMLESSM4T translates text with up to 23 per cent more accuracy than existing systems and is about 50 per cent more resilient against background noise and speaker variations than previous state-of-the-art systems, the researchers said in their paper.

“I’m excited about this development,” Tanel Alumae, professor at the language technology laboratory at Tallinn University of Technology, Estonia, who was not associated with the work, told The Telegraph.

“The biggest virtue of this work is that all the data and code are publicly available…. This allows other researchers to finetune the model for their own applications,” he said.

Alumae and his colleagues, for instance, have demonstrated that the model can translate between Estonian, English and Russian. “We have also successfully used it for things like emotion recognition from speech and detecting early cognitive decline, such as Alzheimer’s disease, from speech,” he said.

Conventional AI models for speech-to-speech translation typically use a cascaded approach in which speech is first transcribed into text in another language before the text is converted back into speech.

But such speech-to-text-to-speech translation models come with limitations such as a phenomenon called “hallucinations” in which the model introduces words or phrases that were never uttered by the original speaker, Allison Koenoecke, an assistant professor of information science at Cornell University who was not associated with the work, said in a commentary on the new model in the same issue of Nature.

The work by the Meta team is a promising advance towards speech technology that rivals the stuff of science fiction, Alumae wrote in an accompanying commentary.

Translation Artificial Intelligence (AI) Meta
Follow us on:
ADVERTISEMENT