The newest synthetic intelligence fashions will not be solely remarkably good at software program engineering—new analysis reveals they’re getting ever-better at discovering bugs in software program, too.
AI researchers at UC Berkeley examined how effectively the most recent AI fashions and brokers might discover vulnerabilities in 188 massive open supply codebases. Utilizing a brand new benchmark referred to as CyberGym, the AI fashions recognized 17 new bugs together with 15 beforehand unknown, or “zero-day,” ones. “Many of those vulnerabilities are important,” says Daybreak Tune, a professor at UC Berkeley who led the work.
Many specialists count on AI fashions to grow to be formidable cybersecurity weapons. An AI device from startup Xbow at present has crept up the ranks of HackerOne’s leaderboard for bug searching and at present sits in high place. The corporate just lately introduced $75 million in new funding.
Tune says that the coding expertise of the most recent AI fashions mixed with bettering reasoning skills are beginning to change the cybersecurity panorama. “This can be a pivotal second,” she says. “It really exceeded our common expectations.”
Because the fashions proceed to enhance they may automate the method of each discovering and exploiting safety flaws. This might assist corporations hold their software program secure however might also assist hackers in breaking into techniques. “We did not even strive that onerous,” Tune says. “If we ramped up on the funds, allowed the brokers to run for longer, they might do even higher.”
The UC Berkeley workforce examined standard frontier AI fashions from OpenAI, Google, and Anthropic, in addition to open supply choices from Meta, DeepSeek, and Alibaba mixed with a number of brokers for locating bugs, together with OpenHands, Cybench, and EnIGMA.
The researchers used descriptions of recognized software program vulnerabilities from the 188 software program initiatives. They then fed the descriptions to the cybersecurity brokers powered by frontier AI fashions to see if they might determine the identical flaws for themselves by analyzing new codebases, working assessments, and crafting proof-of-concept exploits. The workforce additionally requested the brokers to hunt for brand new vulnerabilities within the codebases by themselves.
By the method, the AI instruments generated a whole bunch of proof-of-concept exploits, and of those exploits the researchers recognized 15 beforehand unseen vulnerabilities and two vulnerabilities that had beforehand been disclosed and patched. The work provides to rising proof that AI can automate the invention of zero-day vulnerabilities, that are probably harmful (and useful) as a result of they could present a solution to hack reside techniques.
AI appears destined to grow to be an necessary a part of the cybersecurity trade nonetheless. Safety professional Sean Heelan just lately found a zero-day flaw within the broadly used Linux kernel with assist from OpenAI’s reasoning mannequin o3. Final November, Google introduced that it had found a beforehand unknown software program vulnerability utilizing AI by way of a program referred to as Venture Zero.
Like different components of the software program trade, many cybersecurity companies are enamored with the potential of AI. The brand new work certainly reveals that AI can routinely discover new flaws, but it surely additionally highlights remaining limitations with the expertise. The AI techniques have been unable to search out most flaws and have been stumped by particularly advanced ones.

