MY KOLKATA EDUGRAPH
ADVERTISEMENT
regular-article-logo Saturday, 06 July 2024

Voice Deepfakes: The use of artificial intelligence to mimic real people’s voices

Falling costs of generative AI programs and wide availability of recordings of people’s voices have created the perfect conditions for voice-related scams

Emily Flitter And Stacy Cowley Published 25.09.23, 04:55 AM

ariel davis/nytns

This spring, Clive Kabatznik, an investor in Florida, US, called his local Bank of America representative to discuss a big money transfer he was planning to make. Then he called again.

Except the second phone call wasn’t from Kabatznik. Rather, a software program had artificially generated his voice and tried to trick the banker into moving the money elsewhere.

ADVERTISEMENT

Kabatznik and his banker were the targets of a cutting-edge scam attempt that has grabbed the attention of cybersecurity experts: the use of artificial intelligence to generate voice deepfakes, or vocal renditions that mimic real people’s voices.

The problem is still new enough that there is no comprehensive accounting of how often it happens. But one expert whose company, Pindrop, monitors the audio traffic for many of the largest US banks said he had seen a jump in its prevalence this year — and in the sophistication of scammers’ voice fraud attempts. Another large voice authentication vendor, Nuance, saw its first successful deepfake attack on a financial services client late last year.

In Kabatznik’s case, the fraud was detectable.

Customer data such as bank account details that have been stolen by hackers — and are widely available on underground markets — help scammers pull off these attacks. They become even easier with wealthy clients, whose public appearances, including speeches, are often widely available on the Internet. Finding audio samples for everyday customers can also be as easy as conducting an online search — say, on social media apps such as TikTok and Instagram — for the name of someone whose bank account information the scammers already have.

There’s a lot of audio content out there,” said Vijay Balasubramaniyan, the CEO and founder of Pindrop, which reviews automatic voice-verification systems for eight of the 10 largest US lenders.

Over the past decade, Pindrop has reviewed recordings of more than 5 billion calls coming into call centres run by the financial companies it serves. The centres handle products such as bank accounts, credit cards and other services offered by big retail banks. All of the call centres receive calls from fraudsters, typically ranging from 1,000 to 10,000 a year. It’s common for 20 calls to come in from fraudsters each week, Balasubramaniyan said.

So far, fake voices created by computer programs account for only “a handful” of these calls, he said — and they’ve begun to happen only within the past year.

Most of the fake voice attacks that Pindrop has seen have come into credit card service call centres, where human representatives deal with customers needing help with their cards.

Balasubramaniyan played a reporter an anonymised recording of one such call that took place in March. Although a very rudimentary example — the voice in this case sounds robotic, more like an e-reader than a person — the call illustrates how scams could occur as AI makes it easier to imitate human voices.

In this instance, the caller’s synthetic speech led the employee to transfer the call to a different department and flag it as potentially fraudulent, Balasubramaniyan said.

Calls like the one he shared, which use type-to-text technology, are some of the easiest attacks to defend against: call centres can use screening software to pick up technical clues that speech is machine generated.

“Synthetic speech leaves artefacts behind, and a lot of anti-spoofing algorithms key off those artefacts,” said Peter Soufleris, CEO of IngenID, a voice biometrics technology vendor.

But, as with many security measures, it’s an arms race between attackers and defenders — and one that has recently evolved. A scammer can now simply speak into a microphone or type in a prompt and have that speech very quickly translated into the target’s voice.

Balasubramaniyan noted that one generative AI system, Microsoft’s VALL-E, could create a voice deepfake that said whatever a user wished using just three seconds of sampled audio.

While scary deepfake demos are a staple of security conferences, real-life attacks are still extremely rare, said Brett Beranek, general manager of security and biometrics at Nuance, a voice technology vendor that Microsoft acquired in 2021. The only successful breach of a Nuance customer, in October, took the attacker more than a dozen attempts to pull off.

Beranek’s biggest concern is not attacks on call centres or automated systems, like the voice biometrics systems that many banks have deployed. He worries about the scams in which a caller reaches an individual directly.

That’s what happened in Kabatznik’s case. According to the banker’s description, he appeared to be trying to get her to transfer money to a new location, but the voice was repetitive, talking over her and using garbled phrases. The banker hung up.

After two more calls like that came through in quick succession, the banker reported the matter to Bank of America’s security team, Kabatznik said. Concerned about the security of Kabatznik’s account, she stopped responding to his calls and emails even
the ones that were coming from the real Kabatznik.
It took about 10 days for the two of them to reestablish a connection, when Kabatznik arranged to visit her at her office.

Although the attacks are getting more sophisticated, they stem from a basic cybersecurity threat that has been around for decades: a data breach that reveals the personal information of bank customers.

“I think it’s pretty scary,” Kabatznik said. “The problem is, I don’t know what you do about it. Do you just go underground and disappear?”

NYTNS

Follow us on:
ADVERTISEMENT