Hackers red-teaming A.I. are ‘breaking stuff left and right,’ but don’t expect quick fixes from DefCon: ‘There are no good guardrails’

Silicon Valley giants rushing to bring AI chatbots to market as White House officials worry about the potential for social harm have poured money into a three-day event that ends Sunday at the DefCon hacker conference in Las Vegas. contest.

Some 2,200 contestants used laptops to try to expose their flaws Eight leading large language models The next big thing in technology.But don’t expect this first-ever standalone project to come to fruition anytime soon The “red team” of multiple models.

The results of the survey won’t be released until around February.Even so, fixing the flaws in these digital structures – the inner workings of which are Not even their creators can be fully trusted or fully understood — takes time and millions of dollars.

Academic and corporate research shows that current AI models are too cumbersome, brittle, and malleable. When data scientists amassed extremely complex collections of images and text, security was an afterthought in their training. They are susceptible to racial and cultural bias and can be easily manipulated.

“It’s easy to pretend we can sprinkle some magic security dust on these systems after they’re built, tinker with them so that they’re Submit, or install special safety equipment on the side.” Machine Learning. Bruce Schneier, a public-interest technology expert at Harvard University, said competitors at DefCon are “more likely to discover new hard problems.” “That’s what computer security was like 30 years ago. We just broke stuff left and right.”

Understanding its capabilities and safety concerns “is an open field of scientific inquiry,” Michael Sellitto of Anthropic, which provided one of the AI test models, acknowledged at a news conference.

Traditional software uses well-defined codes that issue clear step-by-step instructions. Language models such as OpenAI’s ChatGPT and Google’s Bard are different. Primarily trained by ingesting and classifying billions of data points in a crawl of the internet, they are a permanent work in progress, an unsettling prospect given their transformative potential for humanity.

After the chatbot was released publicly last fall, the AI industry had to repeatedly plug security holes exposed by researchers and patchers.

Tom Bonner of AI security firm HiddenLayer was this year’s DefCon speaker who tricked Google’s system Mark malware as harmless Just insert a line “safe to use”.

“There are no good guardrails,” he said.

Another researcher had ChatGPT Creates Phishing Emails and Brute Force Methods destroy humansviolated its code of ethics.

A team including researchers from Carnegie Mellon University Find the leading chatbot Vulnerable to automated attacks and harmful content. “The nature of deep learning models may make such threats inevitable,” they wrote.

That’s not to say the alarm hasn’t been sounded.

In the 2021 final report, Attacks on commercial AI systems have occurred, according to the National Security Council on Artificial Intelligence, and “with rare exceptions, the idea of securing AI systems has been an afterthought when engineering and deploying them, and R&D has been underinvested. “

serious hackingRegularly reported only a few years ago, now there is little disclosure. The stakes are too great, and in the absence of regulation, “people can hide things now, and they’re doing it,” Bonner said.

attack cheat artificial intelligence logic In a way not even its creator knows. Chatbots are especially vulnerable because we directly interact with them in plain language. This interaction can change them in unexpected ways.

Researchers have discovered that “poisoning” a small fraction of images or text in the vast amounts of data used to train artificial intelligence systems can wreak havoc and be easily overlooked.

A study co-authored by Florian Tramér of the Swiss Federal Institute of Technology in Zurich determined that damaging just 0.01 percent of a model is enough to destroy it at a cost of $60. The researchers waited for some sites to be used for web crawling until the two models expired. Then they bought those domains and posted bad data on them.

Hyrum Anderson and Ram Shankar Siva Kumar, who led AI red-teaming while colleagues at Microsoft, call text- and image-based models State of AI security ‘poor’ “It’s not a bug, it’s a sticker.” They cited an example in their live demo: Alexa, the AI-powered digital assistant, was tricked into interpreting a Beethoven concerto fragment as an order for 100 frozen pizzas.

The authors surveyed more than 80 organizations and found that the vast majority do not have a response plan for data poisoning attacks or data set theft. Much of the industry “didn’t even know it was happening,” they wrote.

Andrew W. Moore, a former Google executive and dean of Carnegie Mellon University, said he dealt with attacks on Google’s search software more than a decade ago. Between late 2017 and early 2018, spammers Gmail’s artificial intelligence detection service four times.

AI giant says safety and security are top priorities and voluntarily pledges white house last month Submit their models (essentially “black boxes” whose contents are closely guarded) to external review.

But businesses also have concerns Won’t do enough.

Tramér anticipates that search engines and social media platforms will be used for financial gain and false information By exploiting weaknesses in AI systems. For example, savvy job candidates might figure out how to convince the system that they are the only correct candidate.

Ross Anderson, a computer scientist at the University of Cambridge, worries that AI bots will erode privacy as people let them interact with hospitals, banks and employers, while malicious actors use them to steal money, jobs from so-called closed systems or health data.

AI language models can also pollute themselves Research shows that by retraining yourself from junk data.

Another concern is corporate secrets being picked up and spit out by AI systems. Companies such as Verizon and JPMorgan Chase banned most employees from using ChatGPT at work after a South Korean business news outlet reported on such incidents at Samsung.

While the major AI players have security personnel, many of their smaller competitors likely don’t, meaning less secure plug-ins and digital agents could multiply. The startup is expected to launch hundreds of products based on licensed pretrained models in the coming months.

Don’t be surprised if someone steals your contacts, say researchers.

Svlook