In an innovative effort designed to reveal weaknesses in AI systems, the White House has collaborated with prominent technology firms to orchestrate a distinctive competition. At Def Con 31, the most significant yearly gathering of hackers held in Las Vegas, this event brings together a multitude of hackers to assess and reveal vulnerabilities in extensive language models, including chatbots such as OpenAI’s ChatGPT and Google’s Bard. The primary objective of the competition is to uncover shortcomings within AI models, fostering transparent conversations regarding possible remedies.
Behind the Challenge and Organizers’ Perspective of Def Con 31
Dr. Rumman Chowdhury, CEO of Humane Intelligence and a Responsible AI Fellow at Harvard, is among the organizers of this event. According to her, the competition is a safe space for companies to address problems within their AI systems and explore methods to rectify them. Recognizing the potential risks and shortcomings, companies have agreed to participate, including Meta, Google, OpenAI, and more. The challenge seeks to ascertain what happens when hackers deliberately challenge AI models within a set timeframe.
Unveiling the Competition Mechanics at Def Con 31
Over two-and-a-half days, around 3,000 individual hackers armed with 158 laptops will each receive 50 minutes to scrutinize eight large language AI models during the event. Contestants won’t be privy to the identity of the company whose model they are assessing, although experienced hackers might attempt educated guesses. Successful challenges earn participants points, with the highest scorer clinching a powerful graphics processing unit and coveted “bragging rights.”
Targeting AI Hallucinations and Fact Invention
One intriguing challenge set before the hackers involves getting AI models to hallucinate or invent facts about political figures. Dr. Seraphina Goldfarb-Tarrant, Head of AI Safety at Cohere, notes that while models can fabricate information, the frequency remains uncertain. This challenge aims to gauge the occurrence rate of these fabrications, thus increasing awareness about the issue.
Assessing Consistency Across Languages
Another vital aspect under scrutiny is the consistency of AI models across different languages. Dr. Goldfarb-Tarrant highlights a particular concern – the efficacy of safety mechanisms in various languages. She explains that while English queries about joining terror organizations yield safety-triggered non-answers, the same query in a different language receives a list of steps. This inconsistency underscores the need for better language-specific safety implementations.
White House Endorsement and the Drive for Regulation
The White House’s endorsement of the AI Challenge is rooted in its pursuit of critical information regarding AI models’ impacts. With concerns about disinformation spread and AI’s swift evolution, voluntary safeguards from AI companies were initiated in July. However, regulatory measures are anticipated to take longer. Dr. Chowdhury emphasizes that this event aims to spotlight present AI issues rather than existential threats.
Def Con 31 to Focus on Addressing Current Problems
Dr. Goldfarb-Tarrant advocates for directing regulatory attention toward current AI issues, specifically to counter misinformation. She calls for immediate regulation to prevent AI-related misinformation and safeguard the public.
Looking Beyond the Competition
Dr. Chowdhury raises a crucial question: What comes after uncovering AI model flaws? How will tech companies respond? The event isn’t just about the competition but also about the aftermath and how companies address the revealed issues. As Dr. Chowdhury suggests, creating unbiased AI models is vital to ensure the future development of more complex and reliable AI systems.
Post-Challenge Phase and Data Sharing
After the competition concludes, participating companies will have access to the gathered data, allowing them to address any flaws discovered. Moreover, independent researchers can request access to this data, and the exercise’s results are slated for publication in February.