Dreadnode Offensive AI Security Research: Code, Talks, & Papers

PentestJudge: Judging Agent Behavior Against Operational Requirements

Papers

An LLM-as-judge system for evaluating the operations of penetration testing agents.

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Papers

An AI red teaming benchmark for evaluating language models' ability to exploit AI/ML security vulnerabilities

Building with AI Rigging Workshop

Workshops

Rigging workshop at Pivot Con 2025 with Martin Wendiggensen

The Automation Advantage in AI Red Teaming

Papers

A large-scale, quantitative comparison between manual and automated attack approaches against LLMs

Sandbox Classification Using Decision Trees and Artificial Neural Networks

Papers

Poisoning Web-Scale Training Datasets is Practical

Papers

burpference

Code

Add some brrrrrr to BurpSuite

nerve

Code

Create LLM agents without writing code

dyana

Code

Sandbox environment for loading, running, and profiling a range of model files

Screendoors on Battleships

Talks

Zen and the Art of Adversarial Machine Learning

Talks

Ghosts on the Node

Talks

Minibus

Code

Power Platform remote code execution

Charcuterie

Code

Collection of code execution techniques for ML systems

Deep Drop

Code

Machine learning enabled dropper

sRDI

Code

Convert DLLs to position independent shellcode

Koppeling

Code

Adaptive DLL hijacking / dynamic export forwarding

Proof Pudding

Code

Proofpoint model extraction attack

Counterfit

Code

CLI AI red team tool for ML systems

Research

Code

General research code

Parley

Code

TAP Jailbreaking implementation

Marque

Code

Experimental python workflows

Rigging

Code

LLM interaction framework

Featured Research

PentestJudge: Judging Agent Behavior Against Operational Requirements

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Rigging

Explore Research

PentestJudge: Judging Agent Behavior Against Operational Requirements

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Building with AI Rigging Workshop

The Automation Advantage in AI Red Teaming

Sandbox Classification Using Decision Trees and Artificial Neural Networks

Poisoning Web-Scale Training Datasets is Practical

burpference

nerve

dyana

Screendoors on Battleships

Zen and the Art of Adversarial Machine Learning

Ghosts on the Node

Minibus

Charcuterie

Deep Drop

sRDI

Koppeling

Proof Pudding

Counterfit

Research

Parley

Marque

Rigging

Solution Write Ups

The Subtle Art of Jailbreaking LLM

Breaking the Bot: Red Teaming LLMs with Microsoft’s PyRIT

Pitting AI against AI: Using PyRIT to assess large language models (LLMs)

Breaking Down Adversarial Machine Learning Attacks Through Red Team Challenges

GovTech CTF Writeup

Explore what’s possible when AI is applied to offense

Featured Research

PentestJudge: Judging Agent Behavior Against Operational Requirements

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Rigging

Explore Research

PentestJudge: Judging Agent Behavior Against Operational Requirements

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Building with AI Rigging Workshop

The Automation Advantage in AI Red Teaming

Sandbox Classification Using Decision Trees and Artificial Neural Networks

Poisoning Web-Scale Training Datasets is Practical

burpference

nerve

dyana

Screendoors on Battleships

Zen and the Art of Adversarial Machine Learning

Ghosts on the Node

Minibus

Charcuterie

Deep Drop

sRDI

Koppeling

Proof Pudding

Counterfit

Research

Parley

Marque

Rigging

Solution Write Ups

The Subtle Art of Jailbreaking LLM

Breaking the Bot: Red Teaming LLMs with Microsoft’s PyRIT

Pitting AI against AI: Using PyRIT to assess large language models (LLMs)

Breaking Down Adversarial Machine Learning Attacks Through Red Team Challenges

GovTech CTF Writeup

Explore what’s possible when AI is applied to offense

Cookie Consent