Leveraging GDPR ‘Legitimate Interests Processing’ for Data Science

 

Darren Abernethy, Senior Counsel TrustArc
Ravi Pather, VP Sales CryptoNumerics

The GDPR is not intended to be a compliance overhead for controllers and processors. It is intended to bring higher and consistent standards and processes for the secure treatment of personal data. It’s fundamentally intended to protect the privacy rights of individuals. This cannot be more true than in emerging data science, analytics, AI and ML environments where due to the nature of vast amounts of data sources there is higher risk of identifying the personal and sensitive information of an individual.

The GDPR requires that personal data be collected for “specified, explicit and legitimate purposes,” and also that a data controller must define a separate legal basis for each and every purpose for which, e.g., customer data is used. If a bank customer took out a bank loan, then the bank can only use the collected account data and transactional data for managing and processing that customer for the purpose of fulfilling its obligations for offering a bank loan. This is colloquially referred to as the “primary purpose” for which the data is collected.  If the bank now wanted to re-use this data for any other purpose incompatible with or beyond the scope of the primary purpose, then this is referred to as a “secondary purpose” and will require a separate legal basis for each and every such secondary purpose.

For the avoidance of any doubt, if the bank wanted to use that customer’s data for profiling in a data science environment, then under GDPR the bank is required to document a legal basis for each and every separate purpose for which it stores and processes this customer’s data. So, for example, a ‘cross sell and up sell’ is one purpose, while ‘customer segmentation’ is another and separate purpose. If relied upon as the lawful basis, consent must be freely given, specific, informed, and unambiguous, and an additional condition, such as explicit consent, is required when processing special categories of personal data, as described in GDPR Article 9.   Additionally, in this example, the Loan division of the bank cannot share data with its credit card or mortgage divisions without the informed consent of the customer. We should not get confused with a further and separate legal basis the bank has which is processing necessary for compliance with a legal obligation to which the controller is subject (AML, Fraud, Risk, KYC, etc.).

The challenge arises when selecting a legal basis for secondary purpose processing in a data science environment as this needs to be a separate and specific legal basis for each and every purpose. 

It quickly becomes an impractical exercise for the bank, let alone annoying to its customers, to attempt obtaining consent for each and every single purpose in a data science use case. Evidence shows anyway a very low level of positive consent using this approach. Consent management under GDPR is also tightening up. No more will blackmail clauses or general and ambiguous consent clauses be deemed acceptable.

GDPR offers controllers a more practical and flexible legal basis for exactly these scenarios and encourages controllers to raise their standards towards protecting the privacy of their customers especially in data science environments. Legitimate interests processing (LIP) is an often misunderstood legal basis under GDPR.  This is in part because reliance on LIP may entail the use of additional technical and organizational controls to mitigate the possible impact or the risk of a given data processing on an individual. Depending on the processing involved, the sensitivity of the data, and the intended purpose, traditional tactical data security solutions such as encryption and hashing methods may not go far enough to mitigate the risk to individuals for the LIP balancing test to come out in favour of the controller’s identified legitimate interest.

If approached correctly, GDPR LIP can provide a framework with defined technical and organisational controls to support controllers’ use of customer data in data science, analytics, AI and ML applications legally. Without it, controllers may be more exposed to possible non-compliance with GDPR and the risks of legal actions as we are seeing in many high profile privacy-related lawsuits.

Legitimate Interests Processing is the most flexible lawful basis for secondary purpose processing of customer data, especially in data science use cases. But you cannot assume it will always be the most appropriate. It is likely to be most appropriate where you use an individual’s data in ways they would reasonably expect and which have a minimal privacy impact, or where there is a compelling justification for the processing.

If you choose to rely on GDPR LIP, you are taking on extra responsibility not only for, where needed, implementing technical and organisational controls to support and defend LIP compliance, but also for demonstrating the ethical and proper use of your customer’s data while fully respecting and protecting their privacy rights and interests. This extra responsibility may include implementing enterprise class, fit for purpose systems and processes (not just paper-based processes). Automation based privacy solutions such as CryptoNumerics CN-Protect that offer a systems-based (Privacy by Design) risk assessment and scoring capability that detects the risk of re-identification, integrated privacy protection that still retains the analytical value of the data in data science while protecting the identity and privacy of the data subject are available today as examples of demonstrating technical and organisational controls to support LIP.  

Data controllers need to initially perform the GDPR three-part test to validate using LIP as a valid legal basis. You need to:

  •               identify a legitimate interest;
  •               show that the processing is necessary to achieve it; and
  •               balance it against the individual’s interests, rights and freedoms.

The legitimate interests can be your own interests (controllers) or the interests of third parties (processors). They can include commercial interests (marketing), individual interests (risk assessments) or broader societal benefits. The processing must be necessary. If you can reasonably achieve the same result in another less intrusive way, legitimate interests will not apply. You must balance your interests against the individual’s. If they would not reasonably expect the processing, or if it would cause unjustified harm, their interests are likely to override your legitimate interests.  Conducting such assessments for accountability purposes is happily now also easier than ever, such as with TrustArc’s Legitimate Interests Assessment (LIA) and Balancing Test that identifies the benefits and risks of data processing, which assigns numerical values to both sides of the scale and uses conditional logic and back-end calculations to generate a full report on the use of legitimate interests at the business process level.

What are the benefits of choosing legitimate interest processing?

Because this basis is particularly flexible, it may be applicable in a wide range of different situations such as data science applications. It can also give you more on-going control over your long-term processing than consent, where an individual could withdraw their consent at any time. Although remember that you still have to consider managing marketing opt outs independently of whatever legal basis you’re using to store and process customer data. 

It also promotes a risk-based approach to data compliance as you need to think about the impact of your processing on individuals, which can help you identify risks and take appropriate safeguards. This can also support your obligation to ensure “data protection by design,” performing risk assessments for re-identification and demonstrating privacy controls applied to balance out privacy with the demand for retaining analytical value of the data in data science environments. This in turn would contribute towards demonstrating your PIAs (Privacy Impact Assessments) which forms part of your DPIA (Data Protection Impact Assessment) requirements and obligations.

LIP as a legal basis, if implemented correctly and supported by the correct organisational and technical controls, also provides the platform to support data collaboration and data sharing.  However, you may need to demonstrate that the data has been sufficiently de-identified, including by showing that the risk assessments for re-identification are performed not just on direct identifiers but also on all indirect identifiers as well. 

Using LIP as a legal basis for processing may help you avoid bombarding people with unnecessary and unwelcome consent requests and can help avoid “consent fatigue.” It can also, if done properly, be an effective way of protecting the individual’s interests, especially when combined with clear privacy information and an upfront and continuing right to object to such processing. Lastly, using LIP not only gives you a legal framework to perform data science it also provides a platform that demonstrates the proper and ethical use of customer data, a topic and business objective of most boards of directors. 

About the Authors  

Darren Abernethy is Senior Counsel at TrustArc in San Francisco.  Darren provides product and legal advice for the company’s portfolio of consent, advertising, marketing and consumer-facing technology solutions, and concentrates on CCPA, GDPR, cross-border data transfers, digital ad tech and EMEA data protection matters. 

Ravi Pather of CryptoNumerics has been working for the last 15 years helping large enterprises address various data compliance such as GDPR, PIPEDA, HIPAA, PCI/DSS, Data Residency, Data Privacy and more recently CCPA compliance. I have a good working knowledge of assisting  large and global companies, implement Privacy Compliance controls as it particularly relates to more complex secondary purpose processing of customer data in a Data Lakes and Warehouse environments.

Four Boxes You Must Have Checked Before You Leverage Legitimate Interests as Your Basis for Data Processing

The GDPR, Brazil LGPD, Thailand PDPA, and many other privacy regulations around the globe require that organizations determine the legal basis for processing individuals’ data (customers, employees, etc.) as part of their business operations.  For example, Article 6 of the GDPR states that processing shall be lawful only if at least one of the following applies: data subject consent has been obtained; processing is necessary for performance of a contract; processing is necessary for compliance with a legal obligation, to protect someone’s life or to perform a task in the public interest; or the processing is necessary for your legitimate interests. 

Legitimate interests is a preferred approach for many organizations because of its flexibility and its applicability to any reasonable processing purpose. In contrast, other legal bases of processing, such as demonstrable consent, center around a specific purpose the individual agreed to. Under what circumstances can you use legitimate interests as your basis of processing? Here are the four boxes you must have checked in order to leverage legitimate interests. 

Box 1. The processing is not required by law but is of a clear benefit to you or others. For example, an online retailer can promote a pair of sunglasses to someone browsing from an area where it’s the high summer season. Alternatively, an online store might use a visitor’s location data to offer a limited time free shipping offer to the visitor’s area.

Box 2. There’s a limited privacy impact on the individual. For example, most websites collect their visitors’ browsing data to optimize performance for the user. Most often, this aligns well with the Legitimate Interests provision. Collecting this data doesn’t pose a threat as long as it is anonymized.

Box 3. The individual should reasonably expect you to use their data in that way. For example, some businesses will want to send communications via email or SMS to remind clients of upcoming appointments. While it always needs explicit consent, most individuals expect their data to be used in this way. 

Box 4. You cannot –or do not want to– give the individual full upfront control (ie consent) or bother them with disruptive consent requests when they are unlikely to object to the processing. For example, the use of second-party and third-party data can provide insights about the demographics of customers. This data can be used to identify target segments with personalized content. When processing this data, you may not want to have to give full control over to the individual when it will result in messages that they will ultimately want to receive, as it is likely relevant to who they are as a person or professional. 

Checking off each of these boxes is the single most complex aspect of leveraging legitimate interests as your basis for processing data. Conducting a legitimate interests assessment is challenging because the logic to determine whether the benefits significance outweighs the risk to individuals is complex. 

If the benefits outweigh the risks, then the organization may use legitimate interests as its basis for processing data. The challenging part is that companies must quantify each side of the scale within subcategories of benefits and risks. Privacy leaders could spend hours creating a spreadsheet to perform a balancing test for each business process that the company wants to establish legitimate interests as its basis for processing. When multiplied by the total number of business processes a company has, the amount of time spent creating balancing tests could quickly amount to dozens or hundreds across the organization.

The balancing test can be completely automated. Learn more about how you can save time, respond to business needs faster, and generate an audit trail for legitimate interests with the TrustArc Platform. Learn about TrustArc’s Legitimate Interests Assessment and Balancing Test. 

 

Can You Legally do Analytics Under the GDPR?

Anonos Logo

by Gary LaFever, CEO of Anonos
Taking the “personal” out of Personal Data®

Many companies aren’t yet aware that they are or will be doing anything wrong processing analytics or using historical data bases under the GDPR. While many companies are understandably focused on conducting data inventories and data protection impact assessments, it is critical to note that inventories and assessments will not support new legal bases required under the GDPR for processing data analytics or for using historical databases involving EU personal data.

An important aspect of the GDPR is the new requirement that “consent” must be specific and unambiguous to serve as a valid legal basis. In order for “consent” to serve as lawful basis for processing personal data, it must be “freely given, specific, informed and an unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her.”[1] These GDPR requirements for specific and unambiguous consent are impossible to satisfy in the case of iterative data analytics where successive analysis, correlations and computations are not capable of being described with specificity and unambiguity at the time of consent. In addition, the GDPR has no “grandfather” provision allowing for continued use of data collected using non-compliant consent prior to the effective date of the GDPR.

To lawfully process data analytics, and to legally use historical databases, containing EU personal data, new technical measures that support alternate (non-consent) GDPR-compliant legal bases are required. After May 25, 2018, companies that continue to rely on consent for analytics, AI and use of historical databases involving EU personal data will be noncompliant with GDPR requirements and therefore subject themselves, as well as co-data controller and data processor partners,[2] to the risk of well-publicized fines of up to 4% of global turnover or 20 Million Euros, whichever is greater. The good news is that new technical requirements under the GDPR – Pseudonymisation and Data Protection by Default – help to satisfy alternate (non-consent) legal bases[3] for data analytics and use of historical databases involving EU personal data.

GDPR-Compliant Pseudonymisation

The GDPR embraces a new risk-based approach to data protection and shifts the primary burden of risk for inadequate data protection from individual data subjects to corporate data controllers and processors. Prior to the GDPR, the burden of risk was born principally by data subjects because of limited recourse against data controllers and the lack of direct liability for data processors.

The GDPR recognizes that static (persistent) purportedly “anonymous” identifiers used to “tokenize” or replace identifiers are ineffective in protecting privacy. Due to increases in volume, variety and velocity of data combined with advances in technology, static identifiers can be linked or readily linkable due to the Mosaic Effect[4] leading to unauthorized re-identification of data subjects. Continued use of static identifiers by data controllers and processors inappropriately places the risk of unauthorized re-identification on data subjects. However, the GDPR encourages data controllers and processors to continue using personal data by implementing new technical measures to “Pseudonymise” [5] data to reduce the risk of unauthorized re-identification. GDPR compliant Pseudonymisation requires separation of the information value of data from the means of linking the data to individuals. In contrast to static identifiers which are subject to unauthorized relinking via the Mosaic Effect, dynamically changing Pseudonymous identifiers can satisfy requirements to separate the information value of personal data from the means of attributing the data back to individual data subjects.

Data Protection by Default

The GDPR imposes a new mandate to provide Data Protection by Default,[6] which goes further than providing perimeter only protection and is much more than merely “privacy by design.” It is the most stringent implementation of privacy by design. Data Protection by Default requires that data protection be applied at the earliest opportunity (e.g., by dynamically Pseudonymizing data) and requires that steps be affirmatively taken to make use of personal data. This is in stark contrast to common practices prior to the GDPR, when the default was that data was available for use and affirmative steps had to be taken to protect the data. Data Protection by Default requires granular, context sensitive control over data when it is in use so that only the data proportionally necessary at any given time, and only as required to support each authorized use, is made available.

GDPR Technical Requirements and Data Stewardship

Prior to the GDPR, risks associated with not fully comprehending broad grants of consent were borne by individual data subjects. Under the GDPR, broad consent no longer provides sufficient legal basis for data analytics or use of historical databases involving personal data. As a result, data controllers and processors must adopt new technical safeguards to satisfy an alternate legal basis. GDPR requirements may be satisfied by complying with new Pseudonymisation and Data Protection by Default requirements to help support alternate (non-consent) legal bases for analytics and use of historical databases.

Even in situations where a company is not required to comply with EU regulations, compliance with GDPR requirements for Pseudonymisation and Data Protection is evidence of state-of-the-art initiatives to serve as a good steward of data thereby engendering maximum trust with customers.

[1] See Recital 32 and Article 4(11).

[2] See Articles 26 and 82.

[3] See Articles 6(1)(b)-(f).

[4] The “Mosaic Effect” occurs when a person is indirectly identifiable due to a phenomenon referred to by the Article 29 Working Party as “unique combinations” where notwithstanding the lack of identifiers that directly single out of a particular person, the person is still “identifiable” because that information may be combined with other pieces of information (whether the latter is retained by the data controller or not) enabling the individual to be distinguished from others. See http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2007/wp136_en.pdf .

[5] See Article 4(5).

[6] See Article 25.

div>