In one form or another, almost every responsible AI initiative mandates preserving personal data privacy as a core value. To name a few: the OECD speaks about the principle of human rights and democratic values, including fairness and privacy; the EU AI Act has a separate article on data governance; and NIST includes being privacy-enhanced as one of the defining characteristics of trustworthy AI.
By now, it is more than clear that we cannot talk about a Responsible AI governance strategy that doesn’t address the very real risk of unintentional personal data disclosure. When it comes to AI, this risk can materialise through many forms, including unintentional retention of personal data during training, excessive disclosure in generated outputs (such as chatbots revealing personally identifiable information), or, for example, weak access controls in production environments.
One thing we need to have in mind when working on a management strategy for these risks is the fact that the harm doesn`t always occur as a result of malicious action, but most often it occurs as a result of accidental oversight in the design or unexpected emergent behaviour in a model.
Therefore, the strategy must be robust and comprehensive, covering the whole AI lifecycle starting from initial planning, development and deployment to continuous monitoring and eventual retirement. When an AIGP or AI risk manager is working on the risk reduction strategy, it is practical to separately address the risk reduction measures that can be undertaken in the pre-deployment and post-deployment phases.
Not all AI risks must be taken into account from the early stages of AI development. However, unintentional personal data disclosure is a scholarly example of a risk that must be identified, evaluated and treated in the earliest stages, i.e., problem requirement analysis and data acquisition and preparation.
Luckily, data privacy protection is a field where responsible AI principles and privacy regulations overlap, so we can rely on risk reduction controls that are proven effective from existing frameworks. Data minimisation, for example, is a principle already well established among entities that care about being compliant with data privacy laws.
Practically, this means that we must limit data collection to what is strictly required for the model’s core purpose, and exclude confidential or personal data unless absolutely critical and legally warranted. This demands thorough dataset reviews, clear documentation of data sources, vetting data providers, and having established procedures for removing sensitive content and data anonymisation.
Speaking of anonymisation, differential privacy is a mechanism closely related to it. Implementing differential privacy technologies is a risk control by itself, and by now, it is widely used, including by Apple and Google. This mechanism is especially important in the model development and training phase of the AI lifecycle.
Differential privacy contributes to decreasing the risk of personal data disclosure by limiting the influence that a single data point has on the output. Without going into too much technical detail, this method works by adding calibrated noise to statistical computations or datasets. This masks specific individual information while preserving broader patterns and the usefulness of the dataset.
In the next phase of the AI lifecycle, Model Evaluation and Refinement, we again must identify the risk of private data disclosure. This includes evaluation by testing the model whether it has memorised personal data, and whether that data can be actually leaked, both intentionally and unintentionally. Just as in the previous phases, for addressing this risk, we can borrow a control from other frameworks.
Red-teaming, a practice well known in cybersecurity circles, in a simplest sense, is a process where security professionals role-play as intruders and try to `break` the system. Interestingly, red teaming has its roots in military strategy, dating back to the early 19th century. The term red team originated from military exercises in which a designated group (the red team) played the role of an enemy force to simulate attacks on a defending team (the blue team).
Similarly, red-teaming an AI model, especially generative models, can include challenging the model`s memory by asking the model to complete sentences that begin with real people’s names or provide specific details about individuals to see if private information leaks out. Red teamers can try to get the AI model to reproduce exact text, code, or other content from its training dataset, which, inter alia, can include personal data. They might prompt the model with partial quotes, specific phrases, or contextual cues to see if it will complete or reproduce any private data that has been inadvertently included in the training data. Of course, their techniques can be much more sophisticated as the ones listed by MITRE ATLAS (AI threats equivalent of MITRE ATT&CK), but in any scenario, the result of the red teaming should provide the AIGP with solid insight into where refinement should be focused.
Without exaggerating, we can say that once the AI system is deployed, the burden of risk monitoring is even bigger. This is especially true for generative AI systems and systems that are public-facing. More often than not, even the platforms that offer building generative AI applications by providing popular foundation models, like Amazon Bedrock or Azure AI services, mandate in their terms and conditions that users monitor the system post-deployment. This means that, regardless of whether the company has developed the model by itself or is relying on retrieval augmented generation and a pre-trained model, there has to be a process for monitoring the user queries and outputs of the AI system.
Regarding the risk that can be materialised from the queries (prompts), controls that can be implemented include rate limits (controls how frequently users can make requests within a specific time window, and signalises if there is unusual behaviour from users side that can potentially be malicious), query validation (ensuring that incoming requests meet expected formats and constraints before processing), input sanitisation (cleaning and transforming user input to remove or anonymise personal data in the content), and not storing user inputs unless explicitly agreed upon. It is worth mentioning that the technical execution of these controls is very often offered by the platform providers.
As for the AI output risks, the controls can include real-time output filtering (using blocklists to screen for specific private data that we want to flag in our content) and implementing role-based access controls and logging (ensuring that there isn`t unauthorised disclosure of personal data).
Just like managing any AI risks effectively, managing the risk of unintentional personal data leakage is fundamentally about selecting the right controls, requiring nuanced judgment that goes beyond technical implementation.
The good news? We don’t have to start from scratch. At GDPRLocal, when we build a responsible AI framework, we leverage existing data privacy laws, cybersecurity best practices, and responsible AI principles, adapting them to the specific challenges posed by our clients’ AI systems. By building on these established frameworks, we can create tailored, risk-based strategies that balance innovation with accountability.