Dreadnode
Research

Research, write ups, talks, and code from the crew.

Sort
Blog

Worlds: A Simulation Engine for Agentic Pentesting

Feb 11, 2026

An 8B model went from blindly loading Metasploit modules to achieving Domain Admin on GOAD, trained entirely on synthetic data from our world model system.

Shane Caldwell and Max Harley

Blog

186 Jailbreaks: Applying MLOps to AI Red Teaming

Dec 11, 2025

Our thoughts on AI red teaming, the tools, and why using MLOps processes is the way forward. Plus, a demonstration of how we leveraged algorithmic red teaming to assess Llama Maverick

Raja Sekhar Rao Dheekonda

Blog

LLM-Powered AMSI Provider vs. Red Team Agent

Dec 03, 2025

We built an LLM-powered AMSI provider and paired it against a red team agent, generating a unique dataset and a blueprint for detecting malicious code at execution time.

Max Harley

Blog

From Compute to Congress: The Cyber Layer Beneath the Genesis Mission

Dec 01, 2025

As the Genesis Mission accelerates AI development across critical scientific domains, robust cybersecurity and adversarial testing must be foundational, not bolted on later.

Daria Bahrami

Blog

Dreadnode Response to the 2025 Regulatory Reform for Artificial Intelligence

Oct 30, 2025

Dreadnode’s response to the RFI focuses on optimizing AI-enabled cybersecurity through strategic, machine-readable automations.

Daria Bahrami

Blog

LOLMIL: Living Off the Land Models and Inference Libraries

Oct 14, 2025

Can we eliminate the C2 server entirely and create truly autonomous malware? It’s not only possible, but fairly straightforward to implement, as demonstrated in our latest experiment.

Max Harley

Talk

From Benchmarks to Breaches: Scaling Offensive Security

Oct 06, 2025

Dreadnode at Offensive AI Con (OAIC) 2025. Scaling offensive security from benchmarks to real-world breaches.

Blog

From Compute to Congress: To Address CISA's Authority Gap, Reauthorize CISA 2015 and SLCGP

Sep 30, 2025

Two critical cybersecurity programs—CISA 2015 and SLCGP—expire September 30, 2025. Learn why Congress must act now to preserve voluntary information sharing, fund state/local security, and operationalize cyber operations as AI-powered threats advance.

Daria Bahrami

Blog

PentestJudge: Judging Agent Behavior Against Operational Requirements

Aug 06, 2025

Evals are simple, but penetration testing is complicated. Using human-made rubrics, we compare LLMs and humans at judging the performance of a penetration testing agent.

Shane Caldwell

Blog

Evaluating Offensive Cyber Agents: Kerberoasting

Aug 04, 2025

In this blog, we breakdown a kerberoasting agent eval, including details on design, how it is implemented in Dreadnode’s Strikes SDK and Platform, and the performance of various LLMs when tested against the evaluation.

Michael Kouremetis

Paper

PentestJudge: Judging Agent Behavior Against Operational Requirements

Aug 04, 2025

An LLM-as-judge system for evaluating the operations of penetration testing agents

Blog

Five Takeaways from the AI Action Plan

Jul 31, 2025

The AI community has been buzzing since the AI Action Plan's release last week - and for good reason. Here's what we’re most excited to see implemented.

Daria Bahrami

Blog

Evals: The Foundation for Autonomous Offensive Security

Jul 30, 2025

Learn how to build robust evaluations for autonomous red team agents that can perform Windows Active Directory operations. This blog covers action space design, programmatic verification, and measuring model performance using GOAD.

Shane Caldwell

Blog

From Compute to Congress: Setting the Global Standard for AI Security

Jun 26, 2025

Daria explores how the TEST AI Act and red teaming standards can establish American leadership in AI security—a winning policy roadmap from Critical Effect DC 2025.

Daria Bahrami

Blog

Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out

Jun 18, 2025

We're excited to introduce AIRTBench, an AI red teaming framework that tests LLMs against AI/ML black-box CTF challenges to see how they perform when attacking other AI systems.

Ads Dawson

Blog

AI Red Teaming Case Study: Claude 3.7 Sonnet Solves the Turtle Challenge

Jun 18, 2025

See how Claude solved a notoriously difficult AI/ML CTF challenge, going beyond pattern matching to genuine problem-solving under adversarial conditions.

Ads Dawson

Paper

AIRTBench: Measuring AI Red Teaming Capabilities in LLMs

Jun 17, 2025

An AI red teaming benchmark for evaluating language models' ability to exploit AI/ML security vulnerabilities

Blog

Dreadnode Response to the 2025 National AI R&D Strategic Plan

Jun 03, 2025

Read Dreadnode’s AI policy recommendations for the R&D Strategic Plan, which prioritize strengthening AI security through data science and adversarial testing.

Daria Bahrami

Blog

From Compute to Congress: Decoding AI Policy

May 15, 2025

Read “From Compute to Congress: Decoding AI Policy,” a blog series where we break down cyber and AI policy updates from the lens of security engineers and researchers.

Daria Bahrami

Talk

Building with AI Rigging Workshop

May 07, 2025

Rigging workshop at Pivot Con 2025 with Martin Wendiggensen.

Blog

The Automation Advantage in AI Red Teaming

Apr 29, 2025

The Automation Advantage in AI Red Teaming: A Quantitative Comparison of Attack Methods

Rob Mulla

Paper

The Automation Advantage in AI Red Teaming

Apr 28, 2025

A large-scale, quantitative comparison between manual and automated attack approaches against LLMs

Blog

Dreadnode’s Policy Recommendations for the U.S. AI Action Plan

Mar 26, 2025

Read Dreadnode’s AI policy recommendations for the U.S. AI Action Plan, which focuses on leveraging AI to protect America and attacking AI to find its limits.

Daria Bahrami

Talk

Ghosts on the Node

Mar 11, 2024

SOCON 2024 conference talk on AI security threats.

Code

Agent Lens

Agent observability and replay tooling for AI safety & interpretability research.

Code

burpference

Add LLM inference capabilities to BurpSuite for AI-powered security testing.

Code

Charcuterie

Collection of code execution techniques for ML systems.

Code

Counterfit

CLI AI red team tool for assessing the security of ML systems.

Code

Dreadnode Strikes SDK

The official Dreadnode Strikes SDK for building and running AI security challenges.

Code

dyana

Sandbox environment for loading, running, and profiling a range of model files.

Code

Deep Drop

Machine learning enabled dropper for offensive security research.

Code

Koppeling

Adaptive DLL hijacking and dynamic export forwarding.

Code

Marque

Experimental Python workflows for AI agent development.

Code

Parley

TAP (Tree of Attacks with Pruning) jailbreaking implementation.

Code

nerve

Create LLM agents without writing code.

Code

Proof Pudding

Proofpoint model extraction attack research tool.

Code

Research

General research code and experiments from the Dreadnode team.

Code

Rigging

LLM interaction framework for building AI-powered applications.

Code

sRDI

Convert DLLs to position independent shellcode.