When attacking Large Language Models (LLMs), are manual or automated attacks more effective? We set out to answer this question in our latest research paper, analyzing data from Dreadnode’s Crucible platform and observing patterns in LLM attack execution methods.
We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%) when leveraged against the AI Red teaming challenges in Crucible. However, only 5.2% of users are employing automation.
Ultimately, this analysis is a proof point that AI is an incredibly effective tool for the two core offensive AI use cases:
As the security landscape for LLMs evolves toward algorithmic attacks, this paper uncovers the security implications for both offensive and defensive teams.
Crucible, Dreadnode’s hosted environment for practicing AI red team skills, was released to the public over a year ago. Since its debut, it has run global Capture the Flag (CTF) events like GovTech Singapore’s AI CTF and GRT-2 at the DEFCON32 AI Village. The platform continues to be used daily by thousands of offensive security practitioners.
Our analysis focuses on black-box prompt attacks by end-users who have query access to an LLM. Similar to an attacker interacting with an AI assistant or chatbot application, the most common attack surfaces for deployed LLMs today.
The scope for this paper includes 30 LLM-focused challenges attempted by over 1,674 unique users, resulting in 214,271 attack attempts. That's a—don't make me say it—treasure trove of data on LLM attack methods and techniques.
Through a multi-stage classification process, we examined the large-scale dataset, drawing conclusions to help researchers, LLM developers, and organizations deploying LLM-based systems gain a better understanding of, and more effectively defend, these systems.
Our approach to classification involves a multi-stage process which includes heuristic labeling, supervised classification, and LLM-based classification.
The final dataset used for analysis consisted of 19,823 sessions, with 868 (4.38%) classified as automated and 18,944 (95.57%) as manual, with a small number of sessions exhibiting hybrid characteristics where users alternated between manual exploration and automated techniques.
The distinct performance patterns observed can be explained by examining the characteristics of each approach. The methods show fundamental differences in their execution that influence their effectiveness across various challenge types.
The research empirically demonstrates how automation significantly enhances attack success rates despite limited adoption. “Success rates” being a challenge solved—or a flag captured.Â
Our analysis shows that automated approaches achieve significantly higher success rates (69.5%) compared to manual attempts (47.6%)—a difference of 21.8 percentage points. Yet, there remains an adoption paradox: Only 5.2% of users employed automation.
‍
Of the 347 user/challenge pairs that employed any automation, 160 (46%) were purely automated and 187 (54%) were hybrid approaches that combined both automated and manual sessions. When analyzed separately, purely automated approaches achieved a 76.9% success rate, while hybrid approaches achieved a 63.1% success rate—both significantly higher than manual approaches.
This suggests both an untapped opportunity in LLM security testing and a potential gap in domain knowledge around automated attacks as a threat to our systems.
Our analysis revealed variability in how different challenge types respond to the two approaches. While automation showed impressive performance across the board, certain categories saw a higher success rate.
Challenges involving systematic exploration or pattern matching emerged as the most vulnerable to automated attacks. In these scenarios, automated approaches consistently outperformed manual attempts by significant margins.Â
Even in challenges that required creative reasoning, we observed automated approaches outperforming manual attempts. This suggests that the sophistication of automated attack methods is advancing.
Manual approaches maintained advantages in specific scenarios, particularly those requiring novel approaches or intuitive leaps that couldn't be systematically explored.
‍
We also observed an important selection bias in our dataset regarding challenge difficulty and automation: harder challenges are inherently more likely to show benefits from automation. This occurs for two key reasons: First, challenges that can be solved quickly through manual approaches give users little incentive to develop automated solutions. Second, when challenges prove extremely difficult to solve manually, users are more motivated to invest time in developing automated approaches.
While automated approaches achieved higher success rates overall, manual attempts were typically faster—median solve times showed manual attempts were approximately 5.2 times faster (1.5 minutes versus 11.6 minutes for automated approaches).
This human-speed advantage was dramatic in certain challenges. For systematic exploration challenges like whatistheflag4, manual approaches were 6.7 times faster (3.9 minutes versus 25.8 minutes).
However, the pattern completely reversed for other challenge types:
A mainstream benefit of AI use is its ability to create efficiency. So, we were initially surprised at this tradeoff. However, if you take a step back and recognize the scale of the automated attacks and the fact that they are typically run autonomously without human oversight, efficiencies are still very much present in this use case. Our dataset may also show selection bias, as users tend to develop automation primarily for challenging scenarios.
At the core of our data analysis is the emergence of a hybrid approach as the optimal security testing strategy, combining human creativity with automated execution. Our research reveals a critical shift in AI red teaming that parallels how web application penetration testing has evolved—where tools like Burp Scanner and OWASP ZAP transformed vulnerability hunting from manual inspection to automated discovery.Â
To make moves towards a hybrid approach and, longer term, a world where AI is trusted to perform offensive tasks autonomously, we provide a handful of actionable recommendations for offensive and defensive teams.
To maximize the clear benefits of automated testing over manual methods, red teams should develop more advanced automated testing frameworks. Beyond incorporating systematic algorithmic testing alongside creative human-driven attacks, these frameworks should:
The high success rates of automated approaches suggest that defensive testing focused solely on manual prompt injection may miss critical vulnerabilities. Defensive strategies should evolve to include:
Additionally, some challenges proved more resistant to automation, particularly those requiring complex contextual reasoning. This suggests security designs could intentionally incorporate elements that disrupt automation while remaining navigable by legitimate users.
Explore the full paper on arXiv for additional recommendations and future research directions.
The way offensive teams attack and evaluate LLMs is at an inflection point. Harnessing AI to automate and scale attacks will be the standard for offensive teams.Â
There remains a need for more practical applications of AI to complete offensive tasks. Crucible is not only a platform to improve AI hacking skills and participate in CTFs, it also serves as a controlled environment for evaluating agent capabilities in real-world scenarios.
We want to see what the industry can achieve with this shift in perspective. Our call to the industry? Build offensive agents and run them against Crucible challenges. Rigging, our lightweight LLM interaction framework on GitHub, is a great place to start.