New paper argues safety testing treats jailbreaks as isolated bugs, but failures form large behavioral regions that persist across paraphrases. The authors propose mapping these “failure basins” using MAP-Elites to chart where models fail, how big those regions are, and where refusal flips to compliance. That shifts evals from incident counting to systems-level mapping. Via Agentic AI. Read more