Privacy PowerUp Series #2
Two questions are commonly asked about an organization’s website data collection practices:
- Why do you need to collect that data?
- What are you going to use it for?
Many times, the answers sound like this: We aren’t using that data for anything, or We don’t know yet; we might use it at a later date. This is especially true when talking to startups.
Privacy professionals deal with this challenge daily, often surprised that their company is collecting data that does not necessarily fit what the company does.
Companies tend to think that more data is better and that all the data being collected is necessary, especially now with generative AI models more readily available. However, that is not the case.
Explore the principle of data minimization, focusing on the challenges of data collection, why less data is actually more, and some tips on how to determine if the collection of data is necessary and how long to keep it.
What is data minimization?
Data minimization involves limiting the amount of data collected to what is necessary and relevant to a stated purpose or use, and only keeping the data as long as it’s needed for that stated purpose. Though it is a core privacy principle required by privacy laws such as GDPR, it hasn’t been much thought about until now.
What has brought data minimization to the forefront? The answer is simple – AI.
Companies need vast amounts of data to train their AI models, especially large language models (LLMs). This need for massive amounts of data sets off alarm bells for regulators and privacy professionals. The risks associated with collecting data for the purpose of training AI models are high and can harm individuals through biases inherent in the training data.
Because most privacy laws contain data minimization principles, regulators are questioning whether companies really need all that data. Especially if the business is using it to train an AI model.
In turn, companies argue that they need that data to train their AI models, which raises a challenge for privacy professionals to help the business understand and determine what data are truly needed.
The risks and challenges of collecting and keeping data
There are inherent risks and challenges in collecting and keeping data:
- Data accuracy and quality: Ensuring the data collected is accurate is challenging at scale. “Garbage in, garbage out.” The accuracy, quality, relevance, and timeliness of data degrade if it is kept for long periods.
- Data volume: Storing large amounts of data can get expensive, and time and resources will need to be spent finding scalable and cost-effective storage solutions.
- Privacy infringement: If the data is not handled properly, it could infringe upon individuals’ privacy rights, especially if it is collected from third-party sources rather than directly from the individuals.
- Security: Storing large amounts of collected data increases a company’s risk in the event of a breach and the associated costs, including fines that may be incurred as a result of the breach.
- Data integration: When receiving data from multiple sources, it is challenging to integrate across systems to get a complete picture of business operations.
These challenges and risks apply to all types of data collection, not just data collected for training AI models. Understanding these challenges will help you and your business make necessary decisions about what data is essential to meet business needs and determine how long you need to keep it.
Hanging on to old, dusty data only increases your data protection risk.
Why less is more in data collection
In terms of data collection, less is actually more. Here are some benefits of collecting only the data necessary for the specified processing purposes:
- Reducing the noise: Collecting less data enables companies to focus on the information that is most relevant to achieving business goals.
Faster and more reliable decision making: Less data can reduce the number of errors and inconsistencies, enabling faster access to information for more efficient decision-making. - Cost reduction: Storage, maintenance, and human resource costs are lower when maintaining less data.
- Security: Having less data reduces breach risk by limiting the number of records that could be affected in the event of a breach.
- Digital carbon footprint reduction: Storing data has an environmental impact. Reducing the amount of data stored helps save energy since the amount of energy needed to process data is reduced.
The benefits of data minimization reduce costs and risks and enable the business to achieve its goals faster and more effectively by making more reliable decisions.
Five practical tips for implementing data minimization
Now that we understand the risks and benefits, let’s explore some practical tips for determining if the collection of data is necessary and how long it should be kept:
1. Review your acceptable data use policies
Ensure collection and storage of data is limited to what is necessary and relevant to a stated purpose, and data is only kept as long as it’s legally required or needed for those purposes. Ensure there are requirements to delete data when it is no longer needed.
2. Embed privacy by design
Integrate privacy by design into product and service development processes, reviewing what the product or service is designed to do, and what data is truly necessary to deliver it effectively. This will also create awareness of data minimization principles among employees involved in the design and development process.
3. Leverage data minimization techniques and technologies
Use techniques like anonymization and pseudonymization.
Anonymization removes sensitive identifiers from data sets, allowing the de-identified data to be used for analysis and product improvement.
Pseudonymization replaces personal information with a unique code, reducing security and privacy risks while allowing for re-identification if necessary.
4. Conduct a data system inventory
Ensure this includes systems provided by third-party vendors. Understand what data is being collected and processed by each system, and the purpose of that data. This helps determine its necessity.
5. Use the risk assessment process
Conduct Privacy Impact Assessments (PIAs) and Data Protection Impact Assessments (DPIAs) to understand the risk associated with the data identified through your system inventory, its processing, and how long it is being kept. This will help determine if the data is truly needed for its specific processing purpose.
Managing the data protection practices of your business and your third-party organizations can be meticulous and time-consuming. Save time and reduce your privacy risk with TrustArc’s Data Inventory Hub and Risk Profile to help you automate and streamline your data mapping and risk mitigation work. Leverage TrustArc’s Assessment Manager to streamline privacy assessments with pre-built assessments (e.g., PIAs, DPIAs, TIAs) to automate assessments and easily produce on-demand reports for auditing.
Mitigating risks and enhancing trust
In this new world of more generally available AI technologies, data minimization is now more important than ever. Using too much data to train AI models may result in unexpected outputs and create biases that can erode trust.
As more companies look to integrate AI into their products or data processing activities, it is essential to understand your company’s data collection and processing risk, and that methods for implementing data minimization techniques are considered from the outset.
By understanding and implementing data minimization principles, you can reduce costs, mitigate risks, and make more reliable decisions to achieve your business goals more efficiently.
Continue mastering the privacy essentials by reviewing all the resources in the Privacy PowerUp series.
Data Collection, Minimization, Retention, Deletion & Necessity Infographic
Review the building blocks of data collection, minimization, retention, deletion, and necessity.
View nowPowerUp Your Privacy
Watch all ten videos in the Privacy PowerUp series – designed to help professionals master the privacy essentials.
Watch nowRead the next article in this series: #3 Building a Data Inventory, Mapping, and Records of Processing Activities (ROPA).
Read more from the Privacy PowerUp Series:
- Getting Started in Privacy
- Data Collection, Minimization, Retention, Deletion, and Necessity
- Building a Data Inventory, Mapping, and Records of Processing Activities (ROPA)
- Understanding Data Subject Rights (Individual Rights) and Their Importance
- The Foundations of Privacy Contracting
- Choice and Consent: Key Strategies for Data Privacy
- Managing the Complexities of International Data Transfers and Onward Transfers
- Emerging Technologies in Privacy: AI and Machine Learning for Privacy Professionals
- Privacy Program Management: Buy-in, Governance, and Hierarchy