BLOG

Top AI and Data Privacy Concerns

F5 Newsroom Staff Thumbnail
F5 Newsroom Staff
Published July 16, 2025

Artificial intelligence has arrived: 96% of organizations surveyed for the F5 2025 State of Application Strategy Report are currently deploying AI models.

While AI promises to help organizations to work smarter, faster, and more efficiently, it also brings concerns and risks. AI systems, particularly those based on machine learning and large language models (LLMs), are powered by massive volumes of data, which are used to train and tune AI models and fuel AI engines. This data can include sensitive information such as personal identifiers, behavioral patterns, location data, and financial and health records. As AI becomes increasingly embedded in everyday applications, the risk of exposing or misusing personal data also rises: AI data privacy is now a major concern.

This blog post examines the concept of AI privacy and explores how AI creates data privacy risks and concerns. The post also looks at AI privacy laws and discusses how to protect data privacy in AI applications.

What is AI privacy?

AI privacy is the set of practices that protect the data that is collected, stored, and processed by AI systems. AI privacy is related to data privacy, which is the principle that a person should have control over their personal data, but AI privacy is its own concept and is different in key ways.

AI systems leverage vast amounts of data. In fact, the more data these systems consume, the more accurate and capable they tend to become.  For instance, it’s estimated that ChatGPT-4 has approximately 1.8 trillion parameters, and the sheer volume of data being collected introduces privacy concerns. Because these systems are trained on such large datasets—often scraped from the Internet or other large data repositories—it is difficult to ensure that private or personal data wasn’t included, and if so, was consent given to use that data?

In addition, AI pipelines—from data collection through to application delivery—are largely automated, making it difficult to identify data privacy issues unless privacy guardrails are designed into the system from the start. Developers must anticipate potential issues in advance, as an oversight can have significant data privacy implications that are challenging to address later on. If personal data was used in a training dataset and the individual requests to have their data deleted, what is the implication for the AI model?

AI, by its nature, also recognizes patterns extremely well, so an AI system could piece together unconnected data to make highly accurate inferences about a person’s private information. AI doesn’t just memorize—it learns correlations, increasing the risk that the model can infer someone’s identity from a combination of traits or piece together fragments of data to recreate sensitive information.

These issues raise serious ethical and regulatory concerns, even when data in AI system is anonymized.

AI data privacy issues

The public is both very concerned about privacy, but also very uninformed about how to protect themselves. According to Pew Research, 70% of Americans don’t trust companies to use AI responsibly, and 81% assume organizations will use their personal information in ways that would make them uncomfortable. The survey found that 78% of respondents trust themselves to make the right decisions to protect their personal information, but 56% always, almost always, or often agree to online privacy policies without reading them first.

Public attitudes toward AI's use of personal data vary significantly depending on the context. According to the same Pew Research report, only 28% of respondents are comfortable with AI being used to determine eligibility for public assistance, whereas 42% are unconcerned with a smart speaker analyzing voices to recognize individual users.

Organizations need to consider both regulatory requirements around AI and data privacy as well as public sentiment and comfort about how personal data is used.

How AI creates privacy risks

AI systems are vulnerable to data privacy risks across the entire AI lifecycle, and these risks must be understood and addressed at every phase of development and deployment to ensure ethical and secure use of AI data.

  • Data ingestion.  AI models require massive data sets for training, and the data collection stage often introduces the highest risk to data privacy, especially when sensitive data, such as healthcare information, personal finance data, and biometrics, is included. Personal data may have been gathered without proper consent or in ways not fully transparent to individuals.
  • AI model training. During the training phase, an AI model learns patterns from the ingested data.  Even if personal data is collected with consent, AI models may use that data for purposes beyond the original intent. The training stage can create trust and transparency concerns, especially if individuals feel misled or uninformed about how their data is being used by AI systems. For instance, individuals may be comfortable with organizations accessing their data for account management but would not have consented to sharing their information if they knew it would be used to train an AI system.
  • Inference engine. The inference stage is when the trained AI model is used to generate insights or predictions from new data. Privacy risks emerge here because AI systems can make highly accurate inferences about individuals based on seemingly harmless, anonymized inputs, or they may reveal or amplify biases that existed in the training data. This situation is similar to Open-Source Intelligence (OSINT), the practice of gathering intelligence from publicly available sources. OSINT plays a vital role in cybersecurity, threat detection, investigations, and competitive research by turning open data into strategic insights. However, OSINT does not involve unauthorized access or hacking, only the collection of data that is legally and publicly available.
  • Application layer. AI applications are an attractive entry point for hackers, as they are the most visible and accessible part of an AI system. Attackers can exploit APIs or web services to access troves of sensitive data. AI applications can also inadvertently expose sensitive data to legitimate users through poor access controls, overexposed error messages, or overly verbose AI responses.

Generative AI privacy risks

Generative AI systems, such as LLMs used for creating text, images, code, or audio, introduce particularly high levels of data privacy risk. Most AI models are trained on datasets scraped from the public Internet, often without explicit permission or informed consent from their sources or content creators. In addition, scraped data may contain personally identifiable information, which the generative AI system could reveal during inference.

Generative AI applications, especially public-facing writing assistants, chatbots, and image generators, are typically interactive and accessible over the web. This makes them vulnerable to prompt injections, where attackers craft inputs designed to manipulate the model's behavior, bypass controls, or trick the AI into producing restricted, offensive, or confidential content. In addition, users may paste personal or sensitive information into AI tools without realizing the content may be stored in the AI system and used for training or tuning future AI models, exposing the information to accidental data leakage.

Put together, these two factors create high-risk scenarios, in which an LLM trained on unconsented or sensitive content could be prompted to regenerate that content and reveal personal information, or users could unintentionally submit sensitive data into prompts, exposing it to unauthorized access or reuse.

AI privacy laws

As AI adoption accelerates, governments are creating or updating laws to address the data privacy risks associated with AI systems, especially those that use or store personal or sensitive data. Currently, 144 countries have enacted national data privacy laws, while other countries, such as the U.S., have a patchwork of local data privacy laws. Not all of these data privacy regulations are AI-specific, but in most cases AI systems must comply with them.

Examples of data privacy laws include the following.

  • European Union General Data Protection Regulation (GDPR)
    GDPR requires that personal data be collected and used fairly, transparently, and for clearly defined purposes established at the time of collection. Organizations must limit data collection to only what is necessary for the intended purpose, ensure that data is retained only as long as needed, and maintain strict measures to keep it secure and confidential. While GDPR does not explicitly reference artificial intelligence, any AI system that collects, processes, or stores personal data must fully comply with GDPR principles, ensuring responsible data handling throughout the AI lifecycle.
  • European Union Artificial Intelligence Act
    The EU AI Act is the world’s first comprehensive regulatory framework for artificial intelligence introduced by a major governing body. It prohibits AI applications deemed to pose “unacceptable risk,” such as government-run social scoring or the indiscriminate scraping of facial images from CCTV footage. In addition, it imposes strict legal obligations on “high-risk” AI systems, including tools used for CV screening in hiring processes. Developers of high-risk AI systems must implement robust data governance practices, ensuring that all training, validation, and testing datasets meet defined quality standards.
  • U.S. state regulations
    The California Consumer Privacy Act (CCPA) and the Texas Data Privacy and Security Act grant individuals greater control over their personal data. These laws establish key rights, including the right to know how organizations collect, use, and share their data, the right to request deletion of personal information, and the right to opt out of the sale or sharing of that data.

    The Utah Artificial Intelligence and Policy Act specifically addresses high-risk generative AI applications. It requires that AI systems provide clear disclosures when users are interacting with generative AI, ensuring transparency and informed engagement in AI-driven experiences.

The law firm White & Case publishes AI Watch: Global regulatory tracker, a good resource for keeping up-to-date on AI privacy regulations.

AI privacy best practices

As AI systems grow in complexity and reach, protecting data privacy throughout the AI lifecycle is critical. The following best practices can help ensure compliance, safeguard user trust, and reduce risk.

  • Develop strong data governance policies. Comprehensive guidelines should include data classification, role-based access controls, data security strategies such as encryption and pseudonymization, regular audits, and alignment with relevant data privacy regulations such as GDPR and CCPA.
  • Limit data collection. Collect only the data necessary for the intended function of the AI system. Minimizing data reduces exposure and is a requirement under many data privacy laws.
  • Seek explicit consent. Clearly inform users of how their data will be collected, used, and stored. Always obtain explicit, informed consent before collecting and using personal data. This transparency is not only a legal requirement in many jurisdictions but also strengthens user trust.
  • Verify the supply chain of training data. Ensure that third-party datasets used to train AI models are trustworthy and have followed strong data privacy practices.
  • Follow general application security best practices. Strengthen the foundation of your AI systems by implementing proven data security measures, including robust identity and access management, encryption of data in transit and at rest, and anonymization of data to protect personal information.
  • Conduct ongoing risk assessments. Risk management is never “one-and-done,” but is even more ongoing when it comes to AI, which is evolving rapidly and requires that privacy risks be reassessed regularly. Perform continuous evaluations to identify emerging threats, ensure compliance, and update governance policies as necessary.
  • Leverage software to enhance data protection. Use intelligent tools like the F5 Application Delivery and Security Platform to improve observability, strengthen compliance with privacy regulations, and monitor sensitive data flows through AI applications in real time.
  • Evaluate vendor and partner privacy policies. Ensure that all technology partners and service providers have strong data privacy and data security practices. AI data often flows through shared ecosystems, making third-party governance essential to minimizing risk; here are F5’s data privacy declarations.

Frameworks for AI risk management

A common feature across AI-specific regulations is the requirement to classify AI applications based on risk levels. This risk-based approach enables organizations to apply appropriate safeguards and oversight according to the AI system’s potential impact.

High-risk AI applications may include:

  • Safety-critical systems, in which AI functionality is used in critical infrastructure, transportation, or medical devices, where failures could result in harm to life or property.
  • Employment and education systems, in which AI tools are used in hiring, admissions, performance evaluations, or access to educational opportunities, where bias or errors can significantly impact individuals' rights and prospects.
  • Public safety and law enforcement, where systems used for facial recognition, predictive policing, surveillance, or risk scoring can raise civil liberties and discrimination concerns.
  • Public-facing generative AI, when applications that generate text, images, or videos are directly accessible to users, especially in situations where outputs can influence behavior, spread misinformation, or inadvertently disclose sensitive information.

The U.S. National Institute of Standards and Technology (NIST) published an Artificial Intelligence Risk Management Framework (AI RMF) in 2023 that offers general practical guidance for multiple industries and use cases. This framework may prove useful for developers of AI applications because, rather than broadly categorizing high-risk application categories, it provides guidance for determining risk and how to mitigate it.

The AI RMF Core is organized into four high-level functions that reflect a continuous and cyclical approach to AI risk management:

  1. Govern. Establish and oversee the organizational policies, processes, procedures, and practices to support responsible AI development, and ensure transparency and effective implementation.
  2. Map. Understand the context in which the AI system will operate, and evaluate the entire AI lifecycle to understand the risks and potential impacts at each stage of AI development.
  3. Measure. Assess, monitor, and quantify AI risks and impacts using appropriate tools, metrics, and evaluation techniques.
  4. Manage. Prioritize and respond to risks, applying mitigations and controls to reduce harm and optimize outcomes. Allocate resources to monitor AI systems and to respond to, recover from, and communicate about incidents.

Balancing AI innovation and data privacy

As AI innovation accelerates, it is critically important to strike a balance between the advancement of technology and upholding strong data privacy protections. Current privacy regulations recognize the importance of creating space for continued innovation while maintaining safeguards for data privacy and security.

General data privacy laws—such as GDPR and CCPA—provide a foundational framework for managing data privacy, even as new AI-specific rules emerge. Organizations should continuously evaluate the privacy implications of their AI systems, particularly as capabilities evolve or new use cases arise.

Make sure to assess your organization’s AI privacy considerations on an ongoing basis and routinely update data governance policies to ensure they remain aligned with evolving technologies, regulatory requirements, and cultural expectations.