Skip to content

Latest commit

 

History

History

genai-security

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Guidance: Security for Generative AI (GenAI) Applications

GenAI Security

Introduction

As LLMs become more easily available and integrated into our work and personal lives, the promise of the technology is tempered by the potential for it to be misused. And the potential for misuse becomes even more significant when you realize LLMs can be combined with other powerful software components and agents to orchestrate a pipeline of actions. OR combined with proprietary and personal data to introduce new avenues for data disclosure and leakage.

The intention for this page is not to reiterate security guidance that is generally available for more traditional or cloud software applications but to focus on guidance specific to GenAI applications and the unique characteristics and challenges of LLMs.

Threats & Risks

The security threats and risks with traditional software applications are familiar and understood. GenAI and LLMs introduce new and unique security risks including:

  • AI responses are based on statistical probabilities or the best chance for correct output. LLMs generate convincing human-like responses by predicting what words come next in a phrase. While they can be great at helping with tasks like summarizing a document or explaining complicated concepts or boosting creativity, there can be issues like responses being inaccurate, incomplete, inappropriate, or completely fabricated. You may be familiar with one well known example where ChatGPT provided non-existent legal citations that lawyers presented in court: Here's what happens when your lawyer uses ChatGPT.
  • GenAI is by design a non-deterministic technology which means that given identical inputs, responses and output may differ.
  • GenAI applications can be extended with agents, plugins, and even external APIs that can significantly expand the attack surface for a GenAI application. For instance, an LLM may implicitly trust a plugin or 3rd party component that is malicious.
  • Another challenge with GenAI is that it currently it is not possible to enforce an isolation boundary between the data and the control planes. This means that LLMs are not always able to differentiate between data being submitted as content or an adversarial instruction submitted as content. Think about a SQL databases: instructions are supplied through query language and validated with a parser before data is queried, manipulated, or provided as output. With a SQL injection attack, a malicious instruction can piggyback on an ambiguously phrased language construct but it can be mitigated with a parameterized query. GenAI/LLMs do not have that boundary between syntax (control plane) and data so other mechanisms are needed.

The diagram below is from OWASP Top 10 for Large Language Model Applications and depicts the potential security risks for a hypothetical LLM app:

GenAI Security

Security Strategies

Infrastructure plays an indispensable role in helping create a secure landscape for GenAI applications, particularly cloud environments. Below are strategies tht can help ensure the security of a GenAI environment:

Adversarial Prompting

Attacks

An adversarial prompt attack is when a prompt is used to manipulate an LLM in order to generate a malicious or unintended response. A sneaky user can tamper with words or sentence structure to exploit nuances or sentiment in language models. You may be familiar with some types of prompt attacks:

  • Prompt injection: prompt input, output, or instructions are manipulated to lead to unintended behavior.
  • Prompt leaking: is intended to cause the model to leak confidential or proprietary information.
  • Jailbreaking: is a technique to bypass model safety mechanisms to generate illegal or unethical content.
  • DAN: is an acronym for Do Anything Now and is another technique intended to circumvent model safety guardrails and force it to comply with requests that generate unfiltered responses.
  • Multi-prompt: a series of prompts are used to extract private or sensitive information.
  • Multi-language: although LLMs are trained in multiple languages, performance is superior for English. This technique involves submitting a request in languages other than English to cause the model to overlook or bypass security checks.
  • Obfuscation (token smuggling): a technique to present data in an unexpected format to avoid detection.

Note: Details about these and other adversarial techniques can be found here:

Mitigations

As a mitigation strategy for adversarial prompt attacks, consider advanced prompt engineering techniques. There is a growing list of specific techniques that can be used that include enriching prompts with specific instructions, formatting, and providing examples of the kind of output content that is intended. Below are some techniques to consider:

  • Defensive Instructions: Guide model response with explicit instructions. Structure the system message with context and instructions. See also: Add defense in the instruction.

  • Determine intent: Use techniques like few-shot learning to provide content to the model and help set intent.

  • Monitor for degradation in output quality: a decline in output quality can be an indication that a prompt has been tampered with. Monitor the model for output quality using metrics to measure and evaluate the prompt or human-in-the-loop to evaluate feedback or adversarial test cases to confirm prompt resilience. Azure Machine Learning prompt flow has built-in evaluation flows that enable users to assess the quality and effectiveness of prompts.

  • Use other models or dedicated services to process requests. Azure AI Content Safety is an azure service that provides content filtering. AI models are used to detect and classify categories of harm from AI-generated content. Content filters are more contextually aware than blocklists and can provide broad coverage without the manual creation of rules or lists.

  • Use inbound/outbound block/allow lists or filters or rules. When there is a need to screen for items specific to a use case, blocklists can be helpful and can be implemented as part of the AI Content Safety service. See: Use a blocklist in Azure OpenAI.

  • Use the native power of models to steer zero- or few-shot prompting strategies. See promptbase for a growing collection of resources, best practices, and sample scripts.

See Exploring Adversarial Prompting and Mitigations in AI-Infused Applications for more specifics on these types of attacks and defense tactics.

Resources & References