Unveiling IBM’s AI Research: How Live Calls Can be Magically Hijacked

AI Machine Learning ProjectsEnd-point SecurityIT and DevOps

By Pooja Choudhary On Feb 13, 2024

Transforming Communication with the Breakthroughs of Voice Cloning and Language Models: A New Era Emerges

Potentially exploitable by malicious actors for financial benefit is the “audio-jacking” method, which employs large-language models (LLMs), voice cloning, text-to-speech, and speech-to-text capabilities. According to recent research by Chenta Lee, IBM Security’s lead architect of threat intelligence. Scientists at IBM have figured out how to use generative AI technologies to secretly alter live audio calls without the speakers’ knowledge.

Many are worried that the fast development of generative AI in the last 16 months will lead to the spread of misinformation by means of deepfakes, or fake images, and voice cloning, in which AI tools can use a sample of a person’s voice to generate full audio messages that sound exactly like the original.

Read: Amazon’s Revolutionary Retail Revealed: Meet Rufus, An Astonishing AI Conversational Shopper

Echoes of Deception: The Alarming Rise of Voice Cloning

In the past month, voice cloning has been in the news because of robocalls that purportedly came from President Biden urging people not to vote in the New Hampshire presidential primary. The calls were traced back to two Texas-based organizations, Lingo Telecom and Life Corp., according to the New Hampshire Attorney General’s Office. One such usage of voice cloning is in scams, where the victim is contacted by phone calls that seem to be from a loved one in distress, requesting financial assistance.

IBM explained that the idea behind audio-jacking is comparable to thread-jacking attacks, which allow hackers to covertly alter a phone call. IBM warned last year that these attacks were becoming more common in email exchanges. Here, IBM researchers aimed to surpass the use of generative AI to generate a synthetic voice for an entire discussion—a tactic they claimed is easily detectable. Rather than that, their system listens in on real-time chats and substitutes context-dependent phrases.

Read: 10 AI In Energy Management Trends To Look Out For In 2024

NVIDIA Grace Hopper™ Superchips to Speed Scientific Research and Discovery

May 13, 2024

Introducing NVIDIA’s CUDA-Q™ Platform For Quantum Computing

May 13, 2024

AI Gateway Provider Portkey.ai Is In Partnership With F5

May 13, 2024

Prev Next 1 of 7,355

“Bank account” was the term utilized in the experiments. The LLM was directed to substitute a false bank account number for any reference of real bank accounts in the chat. Malware put on victims’ phones or a hacked or malicious Voice-over-IP (VoIP) service are two possible vectors for such an assault. Hackers with excellent social engineering abilities might potentially call two victims simultaneously to start a conversation between them.

In IBM’s proof-of-concept (PoC), the software observes a live discussion and operates as a man-in-the-middle. A speech-to-text tool converts voice into text, and the LLM understands the context of the conversation. When a bank account is mentioned, it modifies a sentence. Furthermore, the LLM can be instructed to perform anything. Any kind of financial information, including accounts associated with mobile apps or digital payment systems, as well as other forms of information, such blood types, can be altered in this way. It can instruct a pilot to change the course of their flight or a financial expert to purchase or sell stocks, he added. Bad actors’ social engineering abilities must be more sophisticated for conversations that involve protocols and processes, which are inherently more complex.

Revolutionary Potential: How Generative AI Drives Problem Solving and Sparks Creative Discovery

Another obstacle that generative AI made easy to overcome was the creation of convincing artificial voices. Hackers may create convincing-sounding but phony voices using LLMs by cloning just three seconds of a person’s voice and then feeding it into a text-to-speech API. Some difficulties arose. One issue was that the researchers had to artificially interrupt the discussion in the PoC so that people on the call wouldn’t suspect a thing, because they needed to access the LLM and text-to-speech APIs remotely.

Read: 10 AI In Manufacturing Trends To Look Out For In 2024

To top it all off, the voice cloning needs to mimic the victim’s natural speech pattern down to the tempo and inflection for the con to really stick. Possible Future Threats of a Similar Kind IBM’s Proof of Concept demonstrated the use of LLMs in such sophisticated assaults, which could pave the way for future ones. In particular, he warned that “the maturity of this PoC would signal a significant risk to consumers foremost – particularly to demographics who are more susceptible to today’s social engineering scams.”

Use only trusted devices and services for these chats, make sure they are up-to-date with fixes, and ask callers to paraphrase and repeat language if something appears strange. Additionally, there are time-tested methods, such as employing robust passwords and avoiding phishing scams that include opening attachments or going on URLs that you are unfamiliar with.

[To share your insights with us as part of editorial or sponsored content, please write to sghosh@martechseries.com]