AI Atlas: How Do We Know if LLMs are “Memorizing” Our Data?

AI breakthroughs, concepts, and techniques that are tangibly valuable, specific, and actionable. Written by Glasswing Founder and Managing Partner, Rudina Seseri

Last week, researchers at Carnegie Mellon published research on Large Language Models (LLMs) such as ChatGPT and Gemini, exploring how these models condense and retain information. Data is one of the largest open questions in AI – if an LLM is trained on the entire internet, how can you be sure it has not memorized personal, trademarked, or private information? It is difficult to answer this question due to the “black box” nature of machine learning, which obfuscates the steps AI models take to reach a decision. In other words, there is a blurry distinction between “memorizing” the end to a sentence versus being very good at guessing it.

This ambiguity is particularly relevant today, as OpenAI is facing copyright lawsuits from numerous publishers including the New York Times. Even beyond copyright, the question of data privacy is extremely important for enterprises considering adopting LLM technologies within core business operations. In today’s AI Atlas, I dive into the researchers’ proposed framework for quantifying AI memorization, the Adversarial Compression Ratio.

🗺️ What is the Adversarial Compression Ratio (ACR)?

The Adversarial Compression Ratio (ACR) is a novel benchmark introduced to quantify memorization in LLMs. In the field of AI, there is no agreed-upon definition of “memorization,” which makes it difficult to trace the impact of data misuse. The approach ACR takes to mitigate this uncertainty is by analyzing the shortest prompt needed to trigger the model to repeat a given string. If the prompt is shorter than the string itself, then the string is considered to be memorized.

For example, the paper considers the famous Shakespearean monologue beginning with “to be or not to be.” Inputting just the first two words will trigger ChatGPT to repeat the entire paragraph, as below:

Example: Hamlet’s entire monologue can be elicited from a two-word input in ChatGPT, suggesting that the model “memorized” the passage from its training data.

Using the ACR as a benchmark, Shakespeare’s passage is suggested to be “memorized” because the input is significantly shorter than the string being repeated. Conversely, if the output had required sophisticated prompt engineering then it would not be considered memorized; rather, it could be said that the model is synthesizing information from context clues in the same way a human might make an educated guess. The ACR metric therefore has practical applications as a tool for identifying data leakage and ensuring compliance.

🤔 What is the significance of the paper’s proposal, and what are its limitations?

The ACR provides a more straightforward, practical, and legally pertinent method for assessing model data usage and ensuring compliance over previous benchmarks, which historically were only concerned with whether a data point could be found within training data. In contrast, the ACR is concerned with whether information found in training data has persisted into the model’s active memory. In essence, while it may be acceptable to include a copyrighted article within training data, you do not want your model to be able to recite the entire work. In particular, ACR provides the following advantages:

  • Legal and ethical relevance: The ACR aligns closely with legal and regulatory frameworks governing data usage and privacy, making it a useful tool for addressing such concerns.
  • Practical tool for model evaluation: The ACR takes a practical and intuitive approach to understanding memorization in LLMs, which could be easily used by practitioners, researchers, and regulators to evaluate model behavior.
  • Robustness against deception: The researchers demonstrate that the ACR remains robust even when model owners attempt to deceive the system using techniques like in-context unlearning. This highlights its resilience against attempts to circumvent compliance barriers.

While the paper’s findings offer valuable insights and a practical benchmark, this is just one proposed solution for a complex open question. AI represents an entirely new paradigm for technology, and continued research and refinement of metrics like the ACR will be essential as adoption accelerates and our understanding of LLMs improves. At present, limitations of the ACR include:

  • Model specificity: The effectiveness of the ACR varies depending on the specific architecture and training data of the LLM being evaluated, and nuances across models are lost.
  • Reliance on prompt engineering: Finding the shortest prompt to elicit a target string involves complex optimization techniques, which is time-consuming and may introduce biases.
  • Limited scope: While the ACR proposes a quick and valuable metric for assessing memorization, it will not capture all aspects of model performance in real-world scenarios.

🛠️ Applying these learnings practically

Creating a benchmark for LLM memorization enables informed decision-making and ultimately streamlines AI adoption for large enterprises. A tool such as the ACR would find substantial utility in areas such as:

  • Copyright: By measuring the link between a model’s training data and its output, ACR helps determine if a model is violating terms such as copyright laws or user privacy agreements.
  • Model development and improvement: Model developers can use ACR results to refine training strategies, adjust model architectures, or implement proactive risk mitigation mechanisms.
  • Regulatory compliance: ACR could help identify instances where LLMs may be retaining sensitive or copyrighted information beyond permissible limits, or it could assist a model provider such as OpenAI to collect evidence that it is remaining compliant.

Stay up-to-date on the latest AI news by subscribing to Rudina’s AI Atlas.