Agentic AI’s OODA Loop Problem
Another seminal post from Bruce Schneier on the security of AI systems. An AI agent is a system that runs in a loop. He uses the Observe-Orient-Decide-Act framework (originally developed for training US air force pilots but applied widely since) and shows how at each stage untrusted input can manipulate or subvert the agent. The reason this is such a good post is that he then adds two more great concepts.
The "AI security trilemma" is a version of the well known CAP theorem from distributed systems (you can have any two of consistency, availability or partition (network split) tolerance), or the similar rule of thumb in project management (you can have any two of cheap, fast and high quality).
This is the agentic AI security trilemma. Fast, smart, secure; pick any two. Fast and smart—you can’t verify your inputs. Smart and secure—you check everything, slowly, because AI itself can’t be used for this. Secure and fast—you’re stuck with models with intentionally limited capabilities.
He then goes on to compare AI systems inability to distinguish malicious prompts from legitimate instructions to an organism's immune system going wrong with an autoimmune disorder. The organism can't distinguish self from non-self, "or like oncogenes, the normal function and the malignant behavior share identical machinery."
Bonus interesting security link: LOLMIL: Living Off the Land Models and Inference Libraries (via ImportAI). This is a proof of concept of autonomous AI agent malware that iteratively writes and executes code using LLMs on the target device to achieve its nefarious aims. The degree of local intelligence will make this kind of approach much harder to counter.
Why is this funny? And why AI doesn’t know — yet
Recognising funny captions is far easier than writing them. The Wisconsin team found that humans overwhelmingly preferred human-authored captions to AI-generated ones. It might just be a matter of time.