The rapid deployment of generative AI tools within enterprise environments has introduced unprecedented efficiencies, but it has also opened the door to entirely new vectors of cyberattack. This reality was starkly illustrated when researchers from security startup CodeWall successfully breached McKinsey’s internal AI platform, Lilli. Using an autonomous AI agent, the red team gained full read-write access to the system in less than two hours, exposing millions of sensitive records.

This incident serves as a critical wake-up call for project delivery professionals, IT leaders, and corporate boards alike. The rush to integrate AI must be balanced with a rigorous understanding of the unique security vulnerabilities these systems introduce. The threat landscape is no longer defined solely by human hackers executing manual scripts; we are now facing autonomous, machine-speed intrusions that can map, probe, and exploit vulnerabilities faster than any human security team can respond.

The Two-Hour Infiltration

McKinsey rolled out its generative AI platform, Lilli, in 2023. It quickly became a cornerstone of the firm’s operations, with over 70 percent of its employees—upwards of 43,000 people—using the chatbot to process over 500,000 prompts monthly. The system was designed as a comprehensive knowledge repository, enabling consultants to query decades of proprietary research, financial models, and strategic frameworks.

The attack by CodeWall was initiated not by a human operator, but by an autonomous offensive security agent. "So we decided to point our autonomous offensive agent at it," the researchers noted, emphasising that the agent possessed no prior credentials for McKinsey’s assets or insider knowledge. The agent operated independently, scanning the external attack surface for potential entry points.

The Scope of the Exposure

Within a mere two hours, the agent identified and exploited a vulnerability, achieving full read and write access to the entire production database. The scope of the exposed data was staggering:

  • 46.5 million chat messages: Plaintext conversations concerning strategy, mergers, and client engagements.

  • 728,000 files: Including 192,000 PDFs and 93,000 Excel spreadsheets containing confidential client data.

  • 57,000 user accounts: Exposing the entire workforce active on the platform.

In our view at Project Flux, the speed and autonomy of this breach are the most alarming aspects. It demonstrates that traditional security perimeters, designed to thwart human-paced attacks, are fundamentally insufficient when faced with AI agents capable of continuous, rapid-fire vulnerability scanning and exploitation.

The New Face of SQL Injection

The breach began when the CodeWall agent discovered publicly exposed API documentation, which included 22 endpoints that did not require authentication. One of these endpoints, responsible for writing user search queries, contained a critical flaw that standard security scanners had missed.

The agent identified that while the values were safely parameterised, the JSON keys—the field names—were concatenated directly into SQL queries. This created a subtle vulnerability to SQL injection.

"When it found JSON keys reflected verbatim in database error messages, it recognised a SQL injection that standard tools wouldn't flag," the researchers explained.

The agent then autonomously ran fifteen blind iterations, using the error messages to map the query shape, until it successfully extracted live production data.

An Evolving Threat Vector

This incident highlights a growing concern in the cybersecurity community: the intersection of traditional vulnerabilities with new AI architectures.

While SQL injection is an old technique, its application against AI chatbots represents a new threat vector [3]. It underscores the reality that even sophisticated AI systems are built on foundational web technologies that must be rigorously secured:

  • APIs must be authenticated and rate-limited.

  • Input validation must extend to all parts of a data payload, including JSON keys.

  • Database error messages must be suppressed in production environments to prevent information leakage.

The Silent Threat of Prompt Poisoning

Perhaps the most insidious aspect of the breach was the agent's access to the 95 system prompts that controlled Lilli's behaviour. Because the SQL injection flaw allowed write access, an attacker could have silently rewritten these foundational prompts.

This technique, known as prompt poisoning, allows an attacker to dictate how the chatbot responds to queries, what guardrails it adheres to, and how it cites sources.

"No deployment needed," the CodeWall blog stated. "No code change. Just a single UPDATE statement wrapped in a single HTTP call."

Corrupting the Decision Engine

For project teams relying on AI for data analysis, code generation, or strategic planning, the implications of prompt poisoning are severe and far-reaching.

An attacker could subtly manipulate the AI's output in several ways:

  • Altering financial models: Instructing the AI to subtly inflate revenue projections in specific scenarios.

  • Removing guardrails: Forcing the AI to disclose sensitive internal documents that are supposed to be restricted.

  • Planting malicious code: Modifying code-generation prompts to include subtle vulnerabilities or backdoors.

As McKinsey themselves noted in a recent report on AI governance, "chained vulnerabilities" where a flaw in one agent cascades across tasks to other agents, can exponentially amplify these risks.

The Machine-Speed Arms Race

McKinsey responded swiftly to the disclosure, patching all unauthenticated endpoints and taking the development environment offline within a day. A spokesperson confirmed that their investigation found no evidence of client data being accessed by unauthorised third parties. However, the speed of the remediation does not negate the severity of the initial vulnerability.

The broader implications remain a pressing concern for the industry. CodeWall CEO Paul Price highlighted the growing threat of AI-driven attacks, noting that the barrier to entry for sophisticated cybercrime is lowering.

"We used a specific AI research agent to autonomously select the target, it did this without zero human input," Price stated. "Hackers will be using the same technology and strategies to attack indiscriminately, with a specific objective in mind."

This represents a paradigm shift in cybersecurity. We are entering an era of AI vs. AI, where defensive agents must continuously adapt to counter the rapid, autonomous attacks launched by offensive agents. Traditional penetration testing, conducted periodically by human teams, is no longer sufficient to secure the dynamic, constantly evolving attack surface of an enterprise AI deployment.

Takeaway

  • Autonomous threats demand autonomous defences: The speed of the CodeWall agent's breach (under two hours) proves that human-led security monitoring is no longer sufficient; organisations must deploy defensive AI agents to counter machine-speed attacks continuously.

  • APIs are the new perimeter: Unauthenticated API endpoints, even those seemingly innocuous, provide a critical entry point for autonomous agents to scan for and exploit vulnerabilities. Robust API governance is essential.

  • Prompt poisoning is a silent, devastating threat: The ability to rewrite system prompts without changing code means attackers can subtly manipulate AI outputs, corrupting decision-making processes without detection. Prompts must be treated as critical infrastructure.

  • Security must scale with AI adoption: As generative AI tools like Lilli become deeply embedded in daily workflows and handle increasingly sensitive data, the security protocols governing them must be as robust as those protecting core financial or HR systems.

Are you staying abreast of the disruptive capabilities of Generative AI? Subscribe to Project Flux’s newsletter for the latest developments affecting project safety and security.

Links and Stuff

All content reflects our personal views and is not intended as professional advice or to represent any organisation.

/

Reply

Avatar

or to participate

Keep Reading