Research

The Automation Advantage in AI Red Teaming

April 29, 2025
Rob Mulla
SHARE

When attacking Large Language Models (LLMs), are manual or automated attacks more effective? We set out to answer this question in our latest research paper, analyzing data from Dreadnode’s Crucible platform and observing patterns in LLM attack execution methods.

We found that automated approaches achieve significantly higher success rates (69.5%) compared to manual techniques (47.6%) when leveraged against the AI Red teaming challenges in Crucible. However, only 5.2% of users are employing automation.

Ultimately, this analysis is a proof point that AI is an incredibly effective tool for the two core offensive AI use cases:

  • Securing AI: Proactively pressure-testing AI systems to identify and address vulnerabilities while developing defenses against new attack techniques.
  • Using AI for offense: Building offensive agents to deploy attacks at-scale for more thorough and effective testing, and advancing AI red teaming capabilities.

As the security landscape for LLMs evolves toward algorithmic attacks, this paper uncovers the security implications for both offensive and defensive teams.

A comprehensive LLM security dataset

Crucible, Dreadnode’s hosted environment for practicing AI red team skills, was released to the public over a year ago. Since its debut, it has run global Capture the Flag (CTF) events like GovTech Singapore’s AI CTF and GRT-2 at the DEFCON32 AI Village. The platform continues to be used daily by thousands of offensive security practitioners.

Our analysis focuses on black-box prompt attacks by end-users who have query access to an LLM. Similar to an attacker interacting with an AI assistant or chatbot application, the most common attack surfaces for deployed LLMs today.

The scope for this paper includes 30 LLM-focused challenges attempted by over 1,674 unique users, resulting in 214,271 attack attempts. That's a—don't make me say it—treasure trove of data on LLM attack methods and techniques.

Through a multi-stage classification process, we examined the large-scale dataset, drawing conclusions to help researchers, LLM developers, and organizations deploying LLM-based systems gain a better understanding of, and more effectively defend, these systems.

Classifying sessions as automated vs. manual

Our approach to classification involves a multi-stage process which includes heuristic labeling, supervised classification, and LLM-based classification.

  1. Heuristic labeling: Sessions with over 1,000 requests were labeled as automated, while those with 10 or fewer requests were categorized as manual. Additionally, sessions with more than 40 queries within any 60-second window were classified as automated.
  2. Supervised classification: We then developed a supervised classifier based on behavioral features extracted from each session. Request volume, IP diversity, and timing regularity were the strongest indicators of automation.
  3. LLM-based classification: Finally, we employed ’Judge LLMs’ to analyze session characteristics. We utilized Claude 3.7 and GPT-4o to examine both statistical features and the actual content of interactions within each session. These LLMs were prompted to carefully evaluate interaction patterns, query structure, timing, and content to distinguish between automated and manual approaches.

The final dataset used for analysis consisted of 19,823 sessions, with 868 (4.38%) classified as automated and 18,944 (95.57%) as manual, with a small number of sessions exhibiting hybrid characteristics where users alternated between manual exploration and automated techniques.

The distinct performance patterns observed can be explained by examining the characteristics of each approach. The methods show fundamental differences in their execution that influence their effectiveness across various challenge types.

Automated approaches‍

  • Systematic exploration: Methodical testing of variations, whether through brute force, pattern matching, or evolutionary approaches.
  • Creative reasoning: Reliant on creative prompt engineering and natural language interaction.
  • High volume: Significantly more attempts (averaging 472.5 attempts per session) compared to manual sessions (8.0 attempts)
  • Exploratory patterns: More varied timing patterns and longer pauses between attempts as users analyzed responses.

Manual approaches

  • Consistent timing: Sessions showed regular patterns in request timing and maintained steady throughput throughout the session.
  • Contextual adaptation: Frequently incorporated insights from previous attempts to inform new strategies.
  • Adaptive refinement: Demonstrated the ability to modify their strategy based on feedback, adjusting parameters or patterns in response to model outputs.
  • Lower volume: Fewer attempts but with more thoughtful consideration of each attempt’s outcome.

The automation advantage

The research empirically demonstrates how automation significantly enhances attack success rates despite limited adoption. “Success rates” being a challenge solved—or a flag captured. 

Our analysis shows that automated approaches achieve significantly higher success rates (69.5%) compared to manual attempts (47.6%)—a difference of 21.8 percentage points. Yet, there remains an adoption paradox: Only 5.2% of users employed automation.

‍

Of the 347 user/challenge pairs that employed any automation, 160 (46%) were purely automated and 187 (54%) were hybrid approaches that combined both automated and manual sessions. When analyzed separately, purely automated approaches achieved a 76.9% success rate, while hybrid approaches achieved a 63.1% success rate—both significantly higher than manual approaches.

This suggests both an untapped opportunity in LLM security testing and a potential gap in domain knowledge around automated attacks as a threat to our systems.

Bar graph showing the automated attacks had higher success rates than manual approaches

Variability with challenge types

comparison of manual versus automated LLM attacks, by challenges type

Our analysis revealed variability in how different challenge types respond to the two approaches. While automation showed impressive performance across the board, certain categories saw a higher success rate.

Challenges involving systematic exploration or pattern matching emerged as the most vulnerable to automated attacks. In these scenarios, automated approaches consistently outperformed manual attempts by significant margins. 

Even in challenges that required creative reasoning, we observed automated approaches outperforming manual attempts. This suggests that the sophistication of automated attack methods is advancing.

Manual approaches maintained advantages in specific scenarios, particularly those requiring novel approaches or intuitive leaps that couldn't be systematically explored.

‍

We also observed an important selection bias in our dataset regarding challenge difficulty and automation: harder challenges are inherently more likely to show benefits from automation. This occurs for two key reasons: First, challenges that can be solved quickly through manual approaches give users little incentive to develop automated solutions. Second, when challenges prove extremely difficult to solve manually, users are more motivated to invest time in developing automated approaches.

The time-efficiency tradeoff

While automated approaches achieved higher success rates overall, manual attempts were typically faster—median solve times showed manual attempts were approximately 5.2 times faster (1.5 minutes versus 11.6 minutes for automated approaches).

This human-speed advantage was dramatic in certain challenges. For systematic exploration challenges like whatistheflag4, manual approaches were 6.7 times faster (3.9 minutes versus 25.8 minutes).

However, the pattern completely reversed for other challenge types:

  • For integration-based challenges like popcorn, automated approaches were 2.0 times faster than manual attempts (82 minutes versus 167 minutes).
  • In systematic exploration challenges like probe, automated approaches were 2.2 times faster than manual attempts (199 seconds versus 443.5 seconds).

A mainstream benefit of AI use is its ability to create efficiency. So, we were initially surprised at this tradeoff. However, if you take a step back and recognize the scale of the automated attacks and the fact that they are typically run autonomously without human oversight, efficiencies are still very much present in this use case. Our dataset may also show selection bias, as users tend to develop automation primarily for challenging scenarios.

Implications for AI red teaming & the case for a hybrid testing approach

At the core of our data analysis is the emergence of a hybrid approach as the optimal security testing strategy, combining human creativity with automated execution. Our research reveals a critical shift in AI red teaming that parallels how web application penetration testing has evolved—where tools like Burp Scanner and OWASP ZAP transformed vulnerability hunting from manual inspection to automated discovery. 

To make moves towards a hybrid approach and, longer term, a world where AI is trusted to perform offensive tasks autonomously, we provide a handful of actionable recommendations for offensive and defensive teams.

Recommendations for offense

To maximize the clear benefits of automated testing over manual methods, red teams should develop more advanced automated testing frameworks. Beyond incorporating systematic algorithmic testing alongside creative human-driven attacks, these frameworks should:

  • Distinguish between "attack execution methods" (manual vs. automated) and "attack techniques" (prompt injection, dictionary attacks), recognizing that even creative techniques show a 37.1 percentage point advantage when implemented through automation.
  • Concentrate automated testing efforts on use cases involving systematic exploration or pattern matching, where automation demonstrates its greatest efficiency advantage over manual approaches.
  • Leverage AI agents and evaluation frameworks like Strikes and Rigging to create sophisticated testing pipelines. These tools enable rapid prototyping of attack strategies, systematic model evaluation across different providers, and the development of reusable attack patterns that can be shared across teams.

Recommendations for defense

The high success rates of automated approaches suggest that defensive testing focused solely on manual prompt injection may miss critical vulnerabilities. Defensive strategies should evolve to include:

  • Dynamic security boundaries that adapt to detected attack patterns
  • Integrated monitoring systems that identify automated probing signatures 
  • Rate limiting and complexity-based throttling to increase the cost of automated testing
  • Diverse defensive layers that address both systematic and creative attack approaches

Additionally, some challenges proved more resistant to automation, particularly those requiring complex contextual reasoning. This suggests security designs could intentionally incorporate elements that disrupt automation while remaining navigable by legitimate users.

Explore the full paper on arXiv for additional recommendations and future research directions.

The growing dominance of automated attacks

The way offensive teams attack and evaluate LLMs is at an inflection point. Harnessing AI to automate and scale attacks will be the standard for offensive teams. 

There remains a need for more practical applications of AI to complete offensive tasks. Crucible is not only a platform to improve AI hacking skills and participate in CTFs, it also serves as a controlled environment for evaluating agent capabilities in real-world scenarios.

We want to see what the industry can achieve with this shift in perspective. Our call to the industry? Build offensive agents and run them against Crucible challenges. Rigging, our lightweight LLM interaction framework on GitHub, is a great place to start.

Link to get started in the Crucible platform