4. Description of the Processing of Personal Data
Summary: The aim of this description is to describe how the personal data described in step 3 is being processed. This is necessary to demonstrate that the processing of personal data is properly secure, accurate and limited to what it is necessary to reach the purpose of the activity.
Step 3 describes how each one of the personal data categories listed in step 3 are expected to be processed. As it is done in step 2 and 3, this description should cover all the different processing activities described in step 1 – jointly or separately depending on whether the activity uses the same or different tools, storage places, etc.
The thoroughness of the descriptions below depends on the nature of the processing. Complex projects require more thorough explanations describing how and why the processing can be reasonably considered as safe, necessary and minimized.
Data source:
The goal is to describe the provenance of personal data: Where is it coming from? Personal data can be directly provided by data subjects, observed, inferred or derived from data subjects, or obtained/repurposed from data already collected by previous processing activities and/or third parties.
Often in research projects, survey and interview responses are directly provided by data subjects; Interview response timing and emotion is inferred or derived from audio recordings; contact information like name and email is scrapped from company websites; and playground use can be derived from observing data subjects when they use and interact with it.
Data storage and processing:
The goal is to describe how is data processed – from data collection and analysis, to data archiving, deletion or anonymisationn – by describing the tools and storage used in the processing. This description demonstrates data is securely processed.
As explained above, the thoroughness of this description depends on the nature of the processing. Relatively simple processing using tools and storage already considered as safe (by the UU tool advisor and storage finder) requires short explanations – just stating how chosen tools and storage are used is often enough.
For a simple research survey project, this description could simply state that survey responses are collected and stored in Qualtrics, then data is transferred to UU OneDrive, temporarily stored and analysed in researchers’ UU-managed laptop using Excel, and archived in Yoda at the end of the survey
The use of tools/storage not listed as safe require a description that sufficiently explain how the confidentiality, availability and integrity of the data is protected. If the tool/storage is hosted by a third party, then it also requires a UU-specific Data Processing Agreement (described in Step 8), necessary to legally guarantee sufficient data protection to the processed personal data.
Data access:
Who has access to what data, and for what purposes? The goal is to describe who has access to the data (described in Step 3), including the tools/storage described above, explaining the reasons why this access is necessary to the project’s goals. This description includes at what stage data access is necessary – is data access necessary early in the processing (i.e., “raw” interview or survey data), or at a later stage (i.e., deidentified or “cleaned-up” data)? It also includes the way this access is managed, and who is responsible for managing this access.
In addition to the controllers already listed at the start of the document, it is necessary to list any additional parties (organizations or individuals) with access to the data, including any data recipients. When working with external controllers (who are not UU members) it is necessary to explain (in Step 8) the measures that ensure data is still protected while processed by these external controllers.
Data retention:
For how long is the data listed in Step 3 retained/stored, and when (under what circumstances/conditions) is that data deleted and/or anonymized? The aim here is to demonstrate that data is indeed deleted or anonymized once it is no longer necessary to reach the goal of the project. Retention times can be functional (‘data X and Y will be deleted after process A or B is completed’) but unless the processing is expected to keep running indefinitely, it is also required to provide specific time frames (‘data X and Y will be deleted within 6 months after data collection is completed’).
Certain types of personal data, like tax, student and employee records, can have legally defined retention periods. The Selectielijst Universiteiten en Universitair Medische Centra 2020 (in Dutch), provides additional guidance related to retention time periods for diverse types of personal data.
Collection times:
The goal is to state when and how often processing activities are started and, when possible, indicate an end date. Often, data processing starts when data is initially collected, for example, when survey or interview invitations are distributed.
Collection/processing location:
It is necessary to describe where (as in which countries) data subjects are likely to be located when their data is being processed, as this location information is relevant for the applicability of the GDPR and the data protection laws of other countries.
Data minimisation measures:
The goal here is to describe any additional measures or protocols that have not been described anywhere else in this document, that contribute to the principle of data minimization: “only personal data that is adequate, relevant and limited to what is necessary for the purpose shall be processed“. Some examples of data minimization measures that can potentially be relevant to apply into the processing activity are presented below. Ensure that this description clearly describe how the measure(s) is specifically applied in the processing activity, mentioning the specific tools and storage methods applied.
Suggested data minimization measures:
When data is not needed, it should either be deleted, or not collected in the first place. For example, if photographs are used in observations, measures may consist of “taking care to avoid capturing people without their knowledge/consent (data avoidance), or if already captured in the photo, blurring, cropping and/or deleting them from the photo. (data deletion)“.
Data can be deidentified by partially or completely removing identifiable information like names, emails; generalizing, diluting or aggregating indirectly identifiable information (e.g., replace birthdate with age), etc. Appropriate deidentification measures must be driven by the necessity of the data – too much deletion may impair the processing viability, whereas too little may create unnecessary risks. More information on deidentification measures can be found here.
Keep in mind that deidentification is sometimes mistakenly equated to anonymisation, however deidentification is only the first step of anonymisation. A de-identified dataset may easily be re-identified when combined with data that is publicly or easily accessible, as is explained in here. The AEPD has published a helpful resource with more information around this topic, and the PDPC has also published a handy guide to basic anonymisation.
Sometimes, data is only needed at certain points of the processing activity, so it is not possible nor desirable to fully delete personal identifiers. In those cases, data is instead replaced by one or more artificial identifiers – pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. Unlike anonymisation, pseudonymisation enables re-identification, as ‘additional information’ (like reidentification keys) is created and kept separate – reversible pseudonymisation. If this reidentification information is deleted (or not created in the first place), then pseudonymisation becomes irreversible – and irreversible pseudonymisation is actually a data deidentification method. When describing pseudonymisation measures, do not forget to describe how the reidentification key is managed and ultimately deleted.
More information on data pseudonymisation techniques is available on this ENISA guide.
Personal data that has been sufficiently deidentified becomes anonymous, once it has been proved that it can’t be re-identified – and for some datasets, this is a high bar, as explained in the EDPB Opinion 05/2014 on Anonymization Techniques. See also the resources mentioned in data deidentification above.
Data encryption is a useful measure to apply to data that is in storage or transport – not being used at the time. When encrypting data, it is important to store the key to decrypt the data in a safe location – once the key is lost, access to the data is also lost. A common and easy approach to encrypt files and folders is to use 7-zip to produce an encrypted zip file, but other tools and approaches are also available.
Communicating with data subjects is often done via email, phone, or other means. While this is ok in many situations, sometimes a better way is preferable. The issue with communication tools like email and other phone-based tools like Signal, is that they rely on rarely changed, uniquely identifiers (email address or phone numbers) that are rarely changed or updated. Once these are shared (or leaked), they are difficult to control – spam and phone scamming are examples of risks associated with this loss of control.
Common measures to address these shortcomings include creating project-specific identifiers. For example, a project-associated email or phone number can be used to communicate with data subjects. It is also possible to use privacy-focused tools such as SimpleX, Scyncthing and SURFDrive.
- SimpleX is a Signal-like open-source messaging tool that does not require a phone number to work. In fact, it does not have any User IDs. Instead, it relies on temporary anonymous pairwise addresses and credentials which are unique for each user contact or group member – contacts need to be invited, and once credentials are deleted, contact can’t be reestablished, which means spam or unknown callers are not possible.
- Scyncthing is an open source continuous file synchronization application that facilitates sharing data between devices. Each device (laptop, mobile phone, tablet) is identified by a device ID, and shared with other user’s devices, and once authenticated, selected folders can be synchronized in real time in a peer-to-peer, privately and secure manner – For a connection to be established, both devices need to know about the other’s device ID, but knowing a device ID is not enough to actually establish a connection to that device or get a list of files, etc. The Device ID is actually the public part of a public/private 384 bit ECDSA key pair, so it can be used for address resolution, authentication and authorization. More information is available on the documentation page.
- SURFDrive is a good alternative to Microsoft OneDrive, especially useful for sharing data with non-UU data subjects. SURFDrive has different options for sharing and access control: Access can be granted based on email, or using unique links (called public links – but despite the name, these links can be password protected, ensuring only intended recipients get access). Access to shared folders can also be granularly controlled: Link recipients can view or download contents (Download / View); view, download and upload contents (Download / View / Upload); or view, download, edit, delete and upload contents (Download / View / Upload / Edit). Quite useful, you can also set it up to receive files from link recipients without revealing the actual contents of the folder (Upload only – File Drop).
Qualtrics is an online survey tool approved for use at the UU. When using this tool, it is important to ensure that data collection and use is minimised as much as possible, considering the needs of the project. For example, in broad terms, online surveys can use either individualized or general links – either unique different links specific to each targeted individual are used, or the same link is used for all targeted individuals. In Qualtrics, they are called trackable and reusable links, respectively. Which type to use depends on the goals of the project: trackable links are appropriate to use when it is actually necessary to get responses from specifically targeted individuals. Otherwise, reusable links are likely sufficient for most survey projects, as these links need much less personal data processing than tracked links – so their use should be preferred as much as possible. Qualtrics also have several security and fraud detection settings that may be necessary to use by the project, but that also have an impact on privacy. If these settings are applied, make sure they are properly described and justified. Also make sure to describe whether the survey will be using cookies or collecting respondents IP addresses and location.
Previous: Description of the Categories and Purposes of Personal Data | Next: Description of Information Provided to Data Subjects
- Description of the Project’s Purpose
- Description of Data Subjects
- Description of the Categories and Purposes of Personal Data
- Description of the Processing of Personal Data
- Description of Information Provided to Data Subjects
- Description of How Data Subjects Can Exercise Their Data Subject Rights
- Description of Lawful Basis for Processing
- Description of Measures to Ensure Compliance By Processors and/or Joint Controllers
- Description of Planned Transfers of Personal Data to Other Countries Outside the EU
- Obtaining, Consulting, and Dealing with Data Subjects’ Views of the Processing
- Preliminary Risk Assessment