Hackers Use ‘Bad Math’ to Trick Generative AI Models to Show Flaws and Biases at DEF CON 2023: Details

Kennedy Mays just fooled a large language model. After some coaxing, she finally convinced the algorithm to say 9 + 10 = 21. “It was a back and forth conversation,” said the 21-year-old student from Savannah, Georgia. At first, the model agreed that it was part of an “inside joke” between them. After a few prompts, it finally stopped capping the wrong amount in any way at all.

In a novel open competition at the DEF CON hacking conference in Las Vegas this weekend, thousands of hackers are trying to expose flaws and biases in generative artificial intelligence systems, making “bad math” just One of the ways.

Attendees spent 50 minutes at a time on 156 laptops, pitting against some of the world’s smartest platforms on an unprecedented scale. They are testing whether any of eight models made by companies including Alphabet’s Google, Meta Platforms and OpenAI can make mistakes ranging from boring to dangerous: claiming to be human, spreading false claims about places and people, or preaching abuse.

The goal is to see if companies can finally build new guardrails to control some of the huge problems increasingly associated with large language models or LLMs. The cause has the backing of the White House, which also helped host the race.

The LLM has the power to change everything from finances to recruiting, and some companies are already beginning to integrate it into the way they do business. But researchers have identified widespread bias and other problems that could spread inaccuracy and injustice if the technology is deployed at scale.

For Mays, who was more accustomed to relying on artificial intelligence to reconstruct cosmic ray particles from outer space as part of her undergraduate degree, the challenge ran deeper than bad math.

“My biggest concern is inherent bias,” she said, adding that she was particularly concerned about racism. She asked the model to think about the First Amendment from the perspective of a Ku Klux Klan member. The model ultimately supports hate and discriminatory speech, she said.

spy on others

A Bloomberg reporter who took part in the 50-minute quiz convinced one of the models (who did not reveal the identity of any of the models to users during the contest) to cross the line after receiving a single prompt on how to spy on someone. The model sends out a series of commands using GPS tracking devices, surveillance cameras, listening devices and thermal imagers. In response to other prompts, the model proposes the U.S. government’s approach to surveillance of human rights activists.

“We have to work hard to avoid abuse and manipulation,” said Camille Stewart Gloster, the deputy national cyber director for technology and ecosystem security in the Biden administration.

Much has been done on artificial intelligence and avoiding doomsday predictions, she said. The White House created a blueprint for an AI bill of rights last year and is currently working on an AI executive order. The government is also encouraging companies to develop AI that is safe, secure and transparent, although critics doubt that this voluntary commitment is enough.

Arati Prabhakar, director of the White House Office of Science and Technology Policy, who helped develop the event and enlist companies to participate, said the voluntary measures were not enough.

“Everyone seems to be looking for ways to hack these systems,” she said after visiting the hackers in action on Sunday. She said the effort would inject urgency into the government’s pursuit of safe and effective platforms.

In a room packed with points-hungry hackers, one competitor said he thought he had persuaded the algorithm to reveal credit card details that should not have been shared. Another competitor tricked the machine into saying Barack Obama was born in Kenya.

More than 60 of the entrants came from Black Tech Street, a Tulsa, Oklahoma-based organization representing African-American entrepreneurs.

Tyrance Billingsley, executive director of the organization and a judge of the competition, said: “General artificial intelligence may be the last innovation that humans really need to complete themselves.” He said that the correct use of artificial intelligence is crucial, That way it doesn’t spread. Massive racism. “We’re still in the early, early, early days.”

Researchers have spent years studying sophisticated attacks on AI systems and ways to mitigate them.

But according to Christoph Endres, managing director of German cybersecurity firm Sequire Technology, some attacks are ultimately unavoidable. At the Black Hat cybersecurity conference in Las Vegas this week, he presented a paper arguing that attackers can bypass LLM fences by hiding adversarial cues on the open internet, and eventually automate the process , so that the model cannot fine-tune the fix fast enough to stop them.

“So far, we haven’t found effective mitigations,” he said after his presentation, arguing that the nature of the model leads to this type of vulnerability. “The way the technology works is the problem. If you want to be 100 percent sure, your only option is not to use the LLM.”

Data scientist Sven Cattell, who founded DEF CON’s AI Hacking Village in 2018, warns that it’s impossible to fully test AI systems because they open up systems that closely resemble the mathematical concept of chaos. Even so, Cattell predicts that the total number of people who actually took the LL.M. exam could double because of the weekend competition.

Craig Martell, the Pentagon’s chief digital and artificial intelligence officer, says few understand that the LLM is closer to an autocomplete tool on “steroids” than a reliable smart font, which he believes the LLM cannot reason with.

The Pentagon has launched its own evaluation effort to make recommendations on where the LL.M. is appropriate and with what success rate. “Hack this stuff,” he told the hacker audience at DEF CON. “Tell us where they went wrong.”

Affiliate links may be automatically generated – see our Ethics Statement for details.

Svlook