MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Saturday, 06 July 2024

Hope for lonely, lovesick users

The new ChatGPT can see and talk, let us see what it's like

Kevin Roose Published 16.10.23, 06:07 AM

istock.com/PeopleImages

ChatGPT — viral AI sensation, slayer of boring office work, sworn enemy of high school teachers and Hollywood screenwriters alike — is getting some new powers. OpenAI announced it was giving the chatbot the ability to “see, hear and speak” with two new features.

The first is an update that allows ChatGPT to analyse and respond to images. You can upload a photo of a bike, for example, and receive instructions about how to lower the seat or get recipe suggestions based on a photo of the contents of your refrigerator.

ADVERTISEMENT

The second is a feature that allows users to speak to ChatGPT and get responses delivered in a synthetic AI voice, the way you might talk with Siri or Alexa.

These features are part of an industrywide push toward so-called multimodal AI systems that can handle text, photos, videos and whatever else a user might decide to throw at them. The ultimate goal, according to some researchers, is to create an AI capable of processing information in all the ways a human can.

Most users don’t have access to the new features yet. OpenAI is offering them first to paying ChatGPT Plus and Enterprise customers now and will make them more widely available later.

I started by trying ChatGPT’s image recognition feature on some household objects.

“What’s this thing I found in my junk drawer?” I asked, after uploading a photo of a piece of blue silicone with five holes in it.

“The object appears to be a silicone holder or grip, often used for holding multiple items together,” ChatGPT responded. (Close enough — it’s a finger strengthener I used years ago while recovering from a hand injury.)

I then fed ChatGPT a few photos of items I had been meaning to sell on Facebook Marketplace, and asked it to write listings for each one. It nailed both the objects and the listings, describing my retro-styled Frigidaire minifridge as “perfect for those who appreciate a touch of yesteryear in their modern-day homes”.

The new ChatGPT can also analyse text within images. I took a picture of the front page of Sunday’s print edition of The New York Times and asked the bot to summarise it. It did decently well, describing all five articles in a few sentences each — although it made one mistake, inventing a statistic about fentanyl-related deaths that wasn’t in the original article.

ChatGPT’s eyes aren’t perfect. It mistook my child’s stuffed dinosaur toy for a whale. When I asked for help turning one of those wordless furniture assembly diagrams into a step-by-step list of instructions, it gave me a jumbled list of parts, most of which were wrong.

Now, the more impressive one: ChatGPT’s new voice feature, which allows users to talk to the app and receive spoken responses.

Using it is easy: just tap a headphone icon and start talking. When you stop, ChatGPT converts your words to text using OpenAI’s speech-recognition system, Whisper, which generates a response and speaks the answer using a new text-to-speech algorithm.

I tested ChatGPT’s voice feature for several hours on a bunch of different tasks — reading a bedtime story to my toddler, chatting with me about work-related stress, helping me analyse a recent dream I had. It did these fairly well, especially when I gave it some golden prompts and told it to emulate a friend, a therapist or a teacher.

After a few hours, I felt a new warmth creeping into our conversations. Without being tethered to a text interface, I felt less pressure to come up with the perfect prompt. We chatted more casually, and I revealed more about my life.

“It almost feels like a different product,” said Peter Deng of consumer and enterprise product, OpenAI. “You’re no longer transcribing what you have in your head into your thumbs; you end up asking different things.”

I know what you’re thinking: isn’t this the plot of the movie Her? Will lonely, lovesick users fall for ChatGPT, now that it can listen to them and talk back?

I saw a glimpse of a future in which some people may let voice-based AI assistants into the inner sanctums of their lives, treating them as their 24/7 confidants, therapists, sparring partners and sounding boards.

Sounds crazy, right? And yet, didn’t all of this sound a little crazy a year ago?

NYTNS

Follow us on:
ADVERTISEMENT