New Threats and New Mitigations
Security risks with LLMs range from traditional risks like data leaks or supply chain attacks, to AI-native risks like prompt injection or excessive agency. OWASP
recently identified 10 major security challenges for LLM developers.
Thankfully, all security risks have mitigation techniques. AI services will need to include traditional security best practices around encryption, access controls, data isolation, network security, audit logging, etc., as well as new and emerging AI-native security practices.
LLM01 - Prompt Injection
Prompt injection is a risk that’s much like SQL injection. AI applications must sanitize user inputs, as well as data inputs. Applications that read from the internet or allow complex or multi-modal (think images and PDFs) data interactions are particularly at risk, because any input data that’s manipulated by a model can execute an indirect prompt-injection attack. Applications that allow user input are at risk of direct prompt injection attacks.
LLM applications that store persistent memory can also be vulnerable to
persistence intrusion attacks, which compromise application execution by using a prompt injection attack to persist the prompt injection information as part of LLM memory. This means a single prompt injection can poison model execution for multiple steps of an LLM application via LLM memory.
The flexibility of inputs and outputs to LLMs opens up wide and often misunderstood avenues for attack. These avenues can range from cleverly worded instructions, to subliminal instructions hidden in images or documents. LLM inputs must be both validated and moderated in order to ensure secure execution.
LLM02 - Insecure Output Handling
Like inputs, outputs can vary widely in LLM completions. Applications must steer LLMs and validate the outputs match the desired spec. For outputs that contain code or structured results, applications may also need to enforce context-free grammars (CFGs). An example CFG would be making sure outputs conform to a JSON spec. In addition to validating the structure, applications need to confirm text values don’t contain prompt injections, unexpected code, or other malicious data.
Failure to validate outputs can lead to application failures, data compromises, or even arbitrary code execution.
LLM03 - Training Data Poisoning
The largest LLMs are trained on a large portion of the open internet. Before their value was known, this was a relatively low risk way to get training data. Now that the cat is out of the bag, LLM providers need to be more careful about what data they use for training, or risk introducing advertisements, misinformation, or biases into the next generation of LLMs.
AI applications should perform a thorough vendor security assessment when selecting model vendors or training data providers.
LLM04 - Model Denial-of-Service
Due to intense resource consumption of LLMs, especially on the GPU side, model denial-of-service (MDoS) attacks require less effort than traditional denial-of-service attacks that might target a web application. Model resources like GPU clusters or API endpoints can be easily overwhelmed with traffic.
It’s important to implement rate limiting and endpoint protection to prevent MDoS attacks. It can also be helpful to have a selection of fallback models available in the case that one model goes down. It’s also important to sanitize inputs to prevent MDoS attacks such as recursive context expansion, when a prompt triggers recursive context expansion.
LLM05 - Supply Chain Vulnerabilities
Supply chain risk extends beyond model providers and cloud hosting. Libraries that simplify LLM orchestration have been a consistent source of vulnerabilities in AI applications. In the past few months, the popular LLM library Langchain has had 7 critical (and 5 high or moderate risk)
vulnerabilities allowing arbitrary code execution. Similar libraries like Llama Index have also been
vulnerable to this type of exploit. Because the ecosystem moves so quickly and is largely driven by startups, many LLM libraries are maintained by only a handful of people and vulnerabilities can stay in production for weeks.
AI applications need to maintain a thoroughly vetted supply chain to ensure dependencies don’t allow arbitrary code execution or data leaks.
LLM06 - Sensitive Information Disclosure
LLM applications have a broad surface area for sensitive information disclosure risk. LLM training data must be carefully selected to avoid including sensitive data, because that data can be easily extracted by a malicious prompt during inference. Applications that process data using LLMs must be careful that malicious prompts don’t allow information to be leaked to a user or to a memory store.
Like traditional applications, LLM applications must take measures to ensure user data is isolated and access controls are strictly enforced. This includes enforcing isolation on any persistent LLM memory as well as other user or enterprise data, because shared memory can lead to persistence intrusion attacks (link to above section), which can leak data between users. Audit logging around data access by LLM agents and users is a must in order to detect and mitigate potential data breaches.
In addition to strict access controls for both LLM agents and users, applications should utilize data redaction where appropriate to prevent sensitive data from being exposed to models altogether. Applications should also validate and moderate outputs to ensure no data is leaked.
LLM07 - Insecure Plugin Design
The interface layer between LLM applications and other applications is another potential attack surface. Plugin systems like ChatGPT Plugins have been shown to be
vulnerable to “Confused Deputy” style attacks, similar to Cross Site Request Forgery (CSRF) attacks in web applications. During these attacks, the LLM is guided, via indirect prompt injection or another avenue, into orchestrating an attack. For example, a plugin may access a webpage that contains instructions for the model to post user data to another endpoint via another plugin call, thereby leaking user data to the attacker.
To prevent these attacks, applications need to tightly control LLM plugins and extensions, especially third-party extensions. Applications should enforce strict data access controls for plugins to avoid data leaks, and implement strict input and output validation to avoid unexpected behavior. LLM applications should assume a zero-trust posture for all types of extensions and plugins. Organizations should never use LLM extensions or plugins without performing a security review (reach out for help with these reviews).
LLM08 - Excessive Agency
Many AI applications have the potential to give LLMs excessive agency. LLM agents are often given tools or plugins to execute various tasks on behalf of a user. To avoid excessive agency, the tools, plugins, and permissions available to these agents must follow the principle of least privilege.
Sometimes LLM generated code is executed on behalf of the user. To avoid excessive agency in code execution, code must be strictly validated and run in a secure sandbox with appropriately scoped permissions and network controls to avoid a compromised model from escalating access or otherwise misusing tools available to it. Avoid using applications that execute LLM agents on potentially insecure infrastructure, for example any machine (serverless function, K8s pod, etc) that can connect to both internal data and the internet.
It’s also important to incorporate human-in-the-loop design to validate that LLM agents are performing in a way that is expected by the user.
LLM09 - Over-Reliance
AI is a useful tool that’s all-to-easy for users to rely on. The more a user trusts an AI tool, the easier it is to miss an incorrect output or action. To combat this, applications should incorporate human-in-the-loop design to ensure human input is present at every critical step of a process.
Further, all AI processing, such as chain-of-thought reasoning or code generation, should be exposed to the user for potential audit. This way a user can validate the AI’s input and ensure proper application or workflow execution.
LLM10 - Model Theft
Model theft is a risk for model providers or for applications using fine-tuned models. Access to model weights should be strictly protected as sensitive data. For some use cases, it may make sense to protect raw LLM outputs to prevent attackers from using outputs to train competing models.