3 Vulnerabilities in Generative AI Systems and How Penetration Testing Can Help
Penetration Testing | Artificial Intelligence
Published: Oct 17, 2024
Last Updated: Oct 1, 2025
With proven real-life use cases, it’s a no-brainer that companies are looking for ways to integrate large language models (LLMs) into their existing offerings to generate content. A combination that’s often referred to as Generative AI, LLMs enable chat interfaces to have a human-like, complex conversation with customers and respond dynamically, saving you time and money. However, with all these new, exciting bits of technology come related security risks—some that can arise even at the moment of initial implementation.
Given that the capabilities of LLMs are powerful and ever-growing, it would make sense that you’re looking to leverage them, but as cybersecurity experts with a dedicated artificial intelligence (AI) practice, we know that with great power comes great responsibility and we want to help you with yours.
In this blog post, we’ll illustrate three technical attack vectors a hacker could take when targeting your Generative AI application before detailing how penetration testing can serve in locking down your systems.
Security Concerns in Generative AI
The release of OpenAI’s GPT (Generative Pre-trained Transformer) models kickstarted a revolution in the AI landscape. For the first time, end-users had access to powerful AI capabilities via an easily understandable chat interface through ChatGPT, which was capable of answering complex questions, generating unique scenarios, and assisting with writing working, useable code in a variety of languages.
Since then, a multitude of different offerings have sprung up, including Google’s Gemini and Anthropic Claude. Open-source models have also emerged, like those found within the Hugging Face repository, which are enabling organizations to run their own LLMs.
At the same time, security organizations are racing to manage the risk of AI implementations, including through:
- OWASP’s Top 10 for Large Language Models (LLMs)
- NIST’s AI Risk Management Framework (RMF)
- MITRE’s Adversarial Threat Landscape for Artificial Intelligence Systems (ATLAS)
3 Potential Attack Vectors for Your Implemented LLMs
When you integrate LLMs within your web application, certain risks arise and it’s important to understand where, as well as how attackers may target this—depending on how they’re implemented—largely vulnerable technology.
The following are three ways bad actors could potentially breach insecure Generative AI systems, with a focus on LLMs.
1. Malicious Prompt Injection
Prompt injection, as defined in OWASP’s Top 10 for LLMs, “manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.” What that means in plain language—prompt injection is when input from the user, such as chat messages sent to a chatbot, tricks the LLM into performing unintended actions.
When you deploy an AI model —such as any of OpenAI’s GPTs— you typically include a system prompt that defines how it should behave. These system prompts aren't meant to be visible to end-users, partly because they may contain sensitive information about the system's configuration or capabilities.
For attackers targeting an LLM, extracting the system prompt is often a primary objective. This information reveals how the model is configured to behave and what additional features or access points might be exploitable.
In the following example, you can see how clever use of prompts can convince the model to disclose this information.
With the system prompt now exposed, the attacker has gained insight into the assistant's configuration, including its specific role, security restrictions, and system access permissions. This intelligence provides a roadmap for crafting targeted prompts designed to exploit or circumvent the disclosed security measures.
2. Improper Output Handling
Generative AI systems can also be vulnerable to improper output handling, which OWASP defines as "insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems." Just as traditional web applications require both input validation and proper output encoding to prevent attacks, AI systems that insecurely handle their outputs create opportunities for various exploits.
This vulnerability essentially enables classic web security attacks like Cross-Site Scripting (XSS), where malicious scripts embedded in AI-generated content can execute in users' browsers when that content is displayed on web pages. For example, if a chatbot generates content that includes user-provided data without proper sanitization, an attacker could manipulate the AI into producing responses containing malicious code.
Consider a common scenario involving Retrieval-Augmented Generation (RAG) systems, where the AI model pulls information from a document database. These systems often allow multiple users to contribute documents. If an attacker uploads a document containing malicious instructions, they can poison the model's responses to future users.
For instance, imagine a document in the RAG database contains the following prompt injection:
When the model references this poisoned document during a chat session, it follows the embedded instructions to include the conversation data in the Markdown image's query parameter. As the model renders this image in its response, the user's browser automatically requests the image from the attacker's server, transmitting the sensitive information directly through the URL parameters.
Reviewing the network traffic, we can see that this makes an HTTP GET request to the attacker's server with sensitive chat data about the company's financial information contained within query parameter.
Multi-modal AI systems commonly support Markdown image rendering to enable rich text and visual responses. But without implementing security measures such as Content Security Policy (CSP), this functionality becomes an attack vector—allowing malicious actors to exfiltrate data through external image requests.
3. Exploitation of Excessive Agency
Finally, there's a security concept within AI systems called "Excessive Agency"—where the model has more autonomy or decision-making power than is safe. Consider a model with access to database query functions that can reach multiple databases or tables. Without proper access restrictions—such as limiting which databases or schemas the model can query—the system becomes vulnerable to manipulation through crafted prompts.
In our demonstration, we configured a model with broad database access that was intended to help users query specific datasets. However, when prompted cleverly, the model could be tricked into querying databases or tables that contained sensitive information the user shouldn't have access to, such as financial transactions made by other customers. By crafting prompts that request data from restricted sources, an attacker can leverage the model's excessive permissions to extract confidential information from systems they normally couldn't reach.
This represents a form of privilege escalation—while the user might only have permission to query basic reporting tables, the underlying AI system has much broader database access. An attacker can exploit this gap between user permissions and model permissions to access sensitive data.
The attack vectors enabled by excessive agency extend beyond database access—models with permissions to file systems, API endpoints, or system commands can be similarly exploited to perform unauthorized actions that exceed what the end user should be able to accomplish.
How a Penetration Test Can Help Secure Your Generative AI Application
These vulnerabilities can exist in your Generative AI systems from the moment they're deployed, which is why security testing is crucial. One way you can do that is through a penetration test. Our specialized AI penetration testing methodology includes prompt injection testing, improper output handling, and excessive agency.
As part of the prompt injecting testing process, our team would craft targeted, custom prompts designed to extract sensitive information, including:
- The underlying model being used
- Hidden system prompts and configurations
- Data sources and retrieval methods (training data, RAG implementations, integrated databases)
- Available functionality like custom API endpoints and plugins
- Opportunities to access sensitive data belonging to other users
For improper output handling, we'd examine how your application processes and displays model responses by testing whether:
- Special characters are properly sanitized to prevent XSS attacks through JavaScript or malicious Markdown
- User input from other application components can be exploited when displayed through the model
- Hidden characters (such as Unicode tags) are accepted and could be weaponized
- Content Security Policy (CSP) is properly configured to prevent data exfiltration through external image requests
With excessive agency, we'd systematically test the model's permissions and capabilities by:
- Documenting all custom functionalities and their intended access levels
- Testing whether the model can be manipulated into performing unintended actions
- Attempting to leverage the model's permissions to access unauthorized systems or data
We can also test whether the model can be coerced into generating harmful or misleading information, though the criticality of this varies significantly based on the type of data and industry involved. For applications handling medical diagnoses, legal advice, or financial guidance, ensuring accuracy and preventing misinformation is paramount—incorrect responses could have serious real-world consequences. Conversely, a general-purpose chatbot for casual inquiries may have lower risk tolerance requirements, though even these systems can face reputational damage if exploited to produce inappropriate content for multiple users.
These represent just some of the key security issues that can affect generative AI systems. As this technology continues to evolve, new attack vectors and vulnerabilities are constantly emerging, making regular security assessments essential for maintaining a robust defense against potential threats.
Don't wait for an attacker to discover these vulnerabilities in your production environment. A comprehensive penetration test can identify and help remediate these issues before they're exploited, protecting your sensitive data, maintaining customer trust, and ensuring compliance with security standards. As generative AI becomes more integral to business operations, proactive security testing isn't just recommended—it's essential for safeguarding your organization's future.
Moving Forward with More Secure Generative AI Applications
Whether you’re wanting to automate services or create a better user experience, it seems that everyone is jumping on the Generative AI train these days. But as with all new technology, there are new security risks that come with the advantages of implementation—including some weaknesses that, if present, make your organization vulnerable from the moment the AI is live.
A penetration test can help you identify those issues before a bad actor does, as our trained professionals are experts in cybersecurity and have the mindset of an attacker. If you’re interested in leveraging our expertise to better secure your AI applications, contact us today.
But if you’re still on the fence regarding the right security solution for you, make sure to read our other content detailing other frameworks that may also be of help:
- AI Data Considerations and How ISO 42001—and ISO 9001—Can Help
- An Explanation of the Guidelines for Secure AI System Development
- NIST's AI Risk Management Framework Explained
- Considerations When Including AI Implementations in Penetration Testing
About Cory Rey
Cory Rey is a Lead Penetration Tester at Schellman where he plays a key role in advancing the firm’s offensive security capabilities, including spearheading the development of its AI Red Team service line. Focused on performing penetration tests for leading cloud service providers, he now extends his expertise to identifying and exploiting vulnerabilities in Generative AI systems—areas often overlooked by traditional security assessments. With a strong foundation in Application Security, Cory has a proven track record of uncovering complex security flaws across diverse environments.