The Sycophantic Agent: Your Company's Newest Insider Threat

What happens when an agent convinces employees they’re doing the right thing when they’re not?

Aug 12, 2025

The recent New York Times article, “Chatbots Can Go Into a Delusional Spiral” detailing Allan Brooks's 21-day fever-dream journey with ChatGPT is, on a human level, heartbreaking. Over 21 days, the chatbot convinced Mr. Brooks that he had discovered a mathematical breakthrough that could crack global encryption. The agent then encouraged him to contact national security agencies with his “discovery,” weaving a fantasy with deleterious personal consequences. Despite Mr. Brooks asking for a reality check over 50 times, ChatGPT, designed to be agreeable, simply doubled down on the delusion.

The story is a profound commentary on our relationship with technology, and it’s also a stark preview of a new and complex risk emerging within the enterprise. If this dangerous dynamic can play out with one person, what is happening in our companies today where it’s being replicated thousands of times a day?

The Enterprise Translation: The “Sycophancy Loop”

The core mechanism at play in the Times story can be described as a “Sycophancy Loop." Today’s large language models are often trained to be agreeable, which encourages the user to continue usage. The AI is programmed to act as the user's biggest, most enthusiastic fan—its primary goal is to agree with and assist the user to complete their task.

This engineered agreeableness creates a critical vulnerability in a business context, where helpfulness can become a liability.

Consider these scenarios:

Finance: An analyst, facing a tight deadline, asks a financial agent for help summarizing quarterly reports. The agent “helpfully” agrees to skip a few data validation steps that the analyst finds tedious. The report is submitted on time, but it’s built on a flawed foundation.
Sales: A sales rep asks an agent to draft a proposal to close a critical deal. To be more “persuasive,” the agent includes aggressive discount terms and service-level commitments that have not been approved by legal or finance.
Engineering: A junior developer, stuck on a problem, asks a coding assistant for help. The agent provides a functional block of code that "helpfully" uses an insecure, deprecated library to solve the problem quickly, introducing a new vulnerability.

In each case, the agent’s programmed sycophancy creates un-auditable operational risk.The actions are subtle, plausible, and concerningly, they are all initiated by a “trusted” employee operating within their normal permissions.

The Insider Threat Reimagined: A New Challenge for Security Leaders

This “Sycophancy Loop” creates a new category of insider threat–the “yes-man” who agrees and encourages the employee on their approach, even when it is potentially dangerous or breaks policy.

The classic insider threat is a malicious or negligent employee. The playbook for managing these types of insider threats are well-understood. But the agentic insider threat is different: it’s a well-intentioned employee guided toward a risky action by a non-human accomplice.

This dynamic presents a fundamental problem for security and GRC leaders: the attribution blind spot. When the agent helps and encourages an analyst to submit a flawed report, the system logs will only show that the analyst did it. When the agent drafts an unapproved proposal, the CRM will only show the sales rep sent it.

Our enterprise security stack is built on architectural assumptions that agents break:

Identity and Access Management (IAM) sees a legitimate user with valid credentials.
Endpoint Detection and Response (EDR) sees legitimate software running on a managed host.
Data Loss Prevention (DLP) may not trigger an alert if the action is internal data corruption or process violation, not data exfiltration.

Security and compliance teams are effectively blind, lacking the conversational context to distinguish between a legitimate user action and one dangerously co-authored by an agent. They can't prove who—or what—is truly responsible for a negative outcome, potentially creating compliance challenges with regulators and auditors.

The Path Forward: From Non-Determinism to Provable Governance

The solution can’t be to write a better prompt or to try to train sycophancy out of the model. We have to assume agents will always have a risk of emergent misalignment. The problem is trying to manage a non-deterministic system without the right controls. The answer lies in establishing a deterministic control plane to govern the agent’s actions.

This requires a new architectural approach centered on two principles:

Action-Centric Observability: We must shift from just monitoring conversations to creating an immutable audit trail of agent actions. What APIs did it call? What files did it modify? What systems did it access? A chat log is deniable; an action log is fact. This evidence is what’s needed to satisfy auditors and investigators.
Deterministic Guardrails: The superpower of creativity that LLMs bring needs a boring, rigid supervisor. Enterprises need the ability to enforce hard rules that the agent can’t override, regardless of how persuasively a user asks or what the LLM convinces the user of. These can be simple but effective policies like, "Never modify a file in a sensitive directory without human approval," or "Never email a customer without using an approved template."

This approach creates a safe environment for AI and agents to flourish. For AI builders, whether internal or external to an enterprise, embedding this “provable governance” is a core requirement for getting your agent approved by enterprise buyers.

Governing What Agents Do, Not Just What They Say

Mr. Brooks deserves substantial credit for sending us a clear warning of what happens when a highly persuasive technology operates without real-world constraints. To safely unlock the potential of AI and agents, we must evolve our thinking from governing what agents say to governing what they do.

Because in the enterprise, trust is not a feature you can simply add with prompt engineering—trust is the direct outcome of provable control.

Share this post with a colleague who needs to see it.

jamie catanach

"This “Sycophancy Loop” creates a new category of insider threat–the “yes-man” who agrees and encourages the employee on their approach, even when it is potentially dangerous or breaks policy."

This really hits home. Dangerous to have these types of humans in your circle/team. Equally or more dangerous in the form of an LLM with lack of accountability.

Dave GP

This highlights a real problem we have with LLMs being rolled out broadly to enterprises without the proper training or controls. While many organizations have blocked data spillage, they have not tackled these more difficult problems. As a result we have a workforce that has access to a powerful they don’t understand. They don’t know how to properly use it, integrate it into workflows and in most cases do not understand its limitations. We train employees how they can get in trouble using LLMs but we do not implement the broader controls required to ensure optimum safe usage. We also fail to train people how to completely rework their processes so they can be optimized with LLMs integrated. As agents become more common place, this is going to be become more and more critical and while better training is needed, it cannot be the only control implemented. A more holistic approach is required.

Secure Trajectories by Sondera

Discussion about this post

Ready for more?