6 Comments
User's avatar
Robert Shaughnessy's avatar

“Prompt and pray” … LoL perfect

Josh Devon's avatar

Hah, ultimately it encapsulates what we do when we try to control the model at the prompt layer. For a defense in depth strategy, we need to assume that the agent is already prompt injected or experiencing emergent behavior and create controls that the model can't ever bypass.

Robert Shaughnessy's avatar

Assumption of vulnerability/exposure/exploit is simply key. Across all security domains I think.

Josh Devon's avatar

For sure, I think what's tricky with agents is that they don't give up, so that if they're blocked by something, they'll brute-force the action space to get it done, even if it means breaking a rule or engaging in unwanted behavior.

Chris Hughes's avatar

Excellent piece!

Josh Devon's avatar

Thanks Chris!