ADVERTISEMENT

OpenAI’s new GPT-4o model points to a future that we have seen in the film Her

The San Francisco start-up has showcased a number of improvements to its GPT-4 model, including the ability to interpret voice, video, images and code in a single interface

Theodore (Joaquin Phoenix) forms an intimate bond with his computer operating system Samantha (voiced by Scarlett Johansson) in the Spike Jonze film Her

Mathures Paul
Published 15.05.24, 06:34 AM

The 2013 Spike Jonze film Her spoke about an aspect of generative artificial intelligence which is coming true — these systems are first going to upend our relationships before they even try to help our economies. OpenAI is releasing a Her-inspired voice assistant feature that can read your facial expressions and translate spoken language in real-time.

The San Francisco start-up has showcased a number of improvements to its GPT-4 model, including the ability to interpret voice, video, images and code in a single interface. Billed as GPT-4o, the update “provides GPT-4 level intelligence, but it’s much faster and improves on capabilities across text, vision and audio”, OpenAI chief technology officer Mira Murati said.

ADVERTISEMENT

OpenAI engineers and Murati hurdled around a phone to show the new capabilities. They asked the assistant to be more expressive while coming up with a bedtime story and then abruptly requested it to switch to a different vocal tone before asking it to conclude the story with a singing voice. Next, they asked the model what the phone’s camera was seeing and the assistant was also made to work like a translator.

For the company’s CEO, Sam Altman, Her is his favourite movie. Last year, he said: “I like Her. The things Her got right — like the whole interaction models of how people use AI — that was incredibly prophetic.”

OpenAI CTO Mira Murati annoucning GPT-4o’s capabilities during the company's event on April 13 Picture: OpenAI

“The special thing about GPT-4o is it brings GPT-4 level intelligence to everyone, including our free users,” said Murati. “This is the first time we’re making a huge step forward when it comes to ease of use.”

During the presentation, OpenAI showed off GPT-4o translating live between English and Italian and also helping a researcher solve a linear equation in real-time on paper.

The “o” in GPT-4o stands for “omni”, which refers to the model’s multimodal capabilities. The company said that GPT-4o was trained across text, vision and audio, meaning all inputs and outputs are processed by the same neural network, which is different from the company’s previous models (GPT-3.5 and GPT-4).

The new voice assistant has been announced days after Bloomberg reported that OpenAI is nearing a deal with Apple to put ChatGPT on the iPhone. The iPhone voice assistant, Siri, has fallen behind competition, so a Her-inspired assistant baked into the iPhone will be helpful.

“The new voice (and video) mode is the best computer interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” Altman said in a blog post. “Getting to human-level response times and expressiveness turns out to be a big change.”

Meanwhile, start-ups including Anthropic and Mistral, as well as big tech companies Google and Meta, have closed in on OpenAI’s early lead, developing AI tools that can complete complex tasks.

The problem with large language models is that they can talk about anything but they also tend to make things up. That’s going to be an issue for anyone who wants to replace, say, a researcher with an AI system. But it can become the snarky friend on your phone.

OpenAI’s demonstrations, at one point, engaged in coquettish banter. The company’s researcher told the chatbot he was in a great mood because he was demonstrating “how useful and amazing you are”. The assistant responded: “Stop it, you’re making me blush!” Samantha is no longer coming; she’s here.

Films OpenAI
Follow us on:
ADVERTISEMENT