Geo data – support for researchers

4. Description of the Processing of Personal Data

Once the categories of processed personal data are described in the previous step, the aim here is to describe how each one of those personal data categories are expected to be processed.

Data Source:

A description that explains how each type of personal data (described in the previous step 3) was obtained. In research projects, personal data is often directly provided by data subjects, but it can also be indirectly obtained from other sources, like scraping from internet sites, or from observing data subjects in public (see examples below).

  • Provided by data subjects – Personal data that has been directly provided by, and collected from data subjects, often with their full awareness. For example, data collected during an interview or a survey.
  • Observed from data subjects – Personal data that has been collected from observations of data subjects, where data subjects may not necessarily be aware of this data collection. For example, data generated by observing/monitoring the behaviour of individuals in focus groups, or in their activities in public spaces.
  • Obtained from other sources – Personal data that is obtained from other independent sources. For example, when contact information is obtained from scrapping data from a company directory, or tweets scrapped from twitter, or when reusing personal data collected by other researchers.
  • Processed/extracted/analysed from already collected data – Personal data that is generated obtained from existing data after it has been subjected to some processing. For example, from simple processing like extracting a location from an address, or a first name from the full name, or more advanced processing like inferring the personal traits of an individual from their browsing history, or transcribing, cleaning and deidentifying interview recordings.
  • Newly generated personal data – Personal data that is generated or brought into existence without a previous relation to existing data (i.e., it is not derived or inferred from other data). A common example is the generation of random pseudonyms, which become personal data once they are assigned to individual research participants.

Example: Contact information (Name and address) and physical characteristics (age, gender, weight) are directly provided by data subjects. Location data (postal code) is obtained from data subject’s address information. The contact information of experts in the field is obtained from the research institute website. Deidentified interview transcripts were obtained from transcribing, cleaning and deidentifying interview recordings.

This information will be later used to define the legitimacy of the processing (in step 7 – lawful basis of processing). For example, personal data directly provided by data subjects is often based on ‘consent’, while contact information scrapped from a company website may be used based on ‘legitimate interest’.

Data Storage and Processing

The aim here is to describe how the personal data described in step 3 will be processed – the tools and other resources used to collect, store and analyse data, from the moment data was acquired, to its deletion/anonymisation.

The storage and data processing description also helps in assessing the security of the process – the confidentiality, integrity and availability of the data. For example, if you are using one of the UU security-approved available tools like Yoda or OneDrive (listed in tools.uu.nl), it can be safely assumed that your data will be sufficiently protected – as long as access rights are handled properly.

If you are using a tool/storage site that is not included or listed as safe in tools.uu.nl – for example, a software suite hosted in the cloud by an external company – you will need to provide an in-depth description that sufficiently explain how the confidentiality, availability and integrity of the data is protected by that tool, which will also likely require securing a UU-specific Data Processing Agreement with the external supplier, in order to guarantee sufficient data protection to the processed personal data (see point 8).

The thoroughness of this description depends on the nature of the processing. Performing a survey or interview is a relatively straightforward data processing, as long as sufficiently secure tools (listed in tools.uu.nl) are used. More complex projects, for example when a new educational software tool is being developed, require more detailed explanations to properly visualize the full data processing picture. For more complex data processing descriptions, it usually helps to make a diagram – diagrams.net is an open source application that can be used to easily make diagrams.

Example: survey responses are collected and stored in Qualtrics while the survey is ongoing. After the survey collection stops, data is transferred and stored in Yoda. During data analysis, portions of the data are temporarily transferred and stored in the researchers’ UU laptops, which have full disk encryption installed by default. Survey responses are deleted once its analysis is completed, and the analysis results will be checked to ensure they are sufficiently deidentified to be considered anonymous data.
In this example, only UU approved tools are used, so there is no need to describe security measures in detail (for example, UU laptops have full disk encryption installed by default), as those are already known.

Example: Interview recordings are transcribed using the services of Uitgetypt. A Data Processing Agreement has been signed with this processor (see step 8). Recording files are transfered to this processor using SURFfilesender (using encryption).
In this example, a DPA was required to ensure privacy and security of the recordings when transcribed by Uitgetypt (which is considered a data processor), using the UU-specific DPA template. In that agreement, privacy and security measures are described in more detail. Working with external parties is further discussed in step 8.

Data Access:

You have already listed the main controllers at the start of the privacy scan. Here, the aim is to describe what kind of access each one of the group members is expected to have – including the individuals listed in the administrative section at the start of the privacy scan document, and any external (non-UU) individuals, like external collaborators. In addition to their names and emails, you should describe what data they will have access to, how this access will be controlled, and who will be responsible for managing this access. Also explain why (purposes) these individuals need access to the data.  Remember that data access should follow the principle of necessity (there must be a legitimate reason to access data) and proportionality (only provide access to the minimum amount of data necessary to fulfil the purpose). List the project members (which were not listed as controllers at the start of the privacy scan document) by name, job title, department, and e-mail. If you plan to work with others outside the UU – external collaborators – their roles will be described in more detail in section 8.

Data that is stored in researchers’ UU laptops can only be accessed by the researcher. Research data stored in Yoda can only be accessed by the PI, the PhD (described as controllers at the start of this privacy scan document), and the assistant professor (listed below). Only the PI and the PhD can confer access to data stored in Yoda to research team members. In particular, pseudonymised GPS data, decryption leys and GPS reidentification keys (required to link participant’s GPS data with the other research data) can only be accessed by the PhD or the PI.
The PI, William Dyer, have access review and supervise the research project. The PhD, Alicia Jones, have access to perform the research project. Assistant professor Harleen Quinzel, H.quinn@uu.nl, will have access to perform further analysis and advice to the PhD.

Data Retention:

The aim here is to explicitly state when each type of data from step 3 is going to be deleted and/or anonymised. You can indicate functional dates (‘data X and Y will be deleted after interviews are completed’) but you should also include specific dates (‘data X and Y will be deleted within 6 weeks after data collection’). Be aware that certain types of personal data, like tax, student and employee records, usually have legally defined retention periods. The  Selectielijst Universiteiten en Universitair Medische Centra 2020 (in Dutch), also provides some guidance related to retention time periods for diverse types of personal data.

Names, contact details and reidentification keys of participants will be fully deleted after the interviews, focus groups and GPS data collection have been completed, within a year from the start of data collection. The audio recordings of the interviews will be deleted after transcription, always within six months after data collection. The rest of the (de-identified) transcripts, deidentified and pseudonymised GPS data and photos will be archived on Yoda repository for ten years after the completion of the project, to preserve the integrity of the research, according to the University Policy Framework for Research Data (in line with the Netherlands Code of Conduct for Scientific Practice)

Collection Times:

When, and how often, will data be collected – for example, if data is collected once, or at different time points. More complex data flows will require more detailed descriptions. For example, All personal data will be collected between February and April of 2022. It will be a one-time data collection process.

Collection/Processing Location:

Where (as in which geographical areas) are the participants expected to be located when their data is being collected, and where is data processing (analysis, storage) expected to take place? Geographical location of data processing is relevant for the applicability of the GDPR and the data protection laws of other countries.

The data collection will be done in purposively selected areas of the urban metropolitan city of Dhaka (Bangladesh), whereas data will be stored and analysed at the UU in the Netherlands. As such, the personal data processing operations are both within the material scope of the GDPR and the Technology Act and the Digital Security Act of Bangladesh.

Data Minimisation Measures:

The principle of data minimisation is “only personal data that is adequate, relevant and limited to what is necessary for the purpose shall be processed”. The aim is to verify if the project goals can be met by processing fewer personal data, or by having less detailed or aggregated personal data, or even without having to process personal data at all.

  • Data avoidance – Avoid processing personal data altogether when this is possible for the relevant purpose.
  • Limitation – Limit the amount of personal data collected to what is necessary for the purpose
  • Access limitation – Shape the data processing in a way that a minimal number of people need access to personal data to perform their duties, and limit access accordingly.
  • Relevance – Personal data should be relevant to the processing in question, and the controller should be able to demonstrate this relevance.
  • Necessity – Each personal data category shall be necessary for the specified purposes and should only be processed if it is not possible to fulfil the purpose by other means.
  • Aggregation – Use aggregated data when possible.
  • Pseudonymization – Pseudonymize personal data as soon as it is no longer necessary to have directly identifiable personal data, and store identification keys separately.
  • Anonymization and deletion – Where personal data is not, or no longer necessary for the purpose, personal data shall be anonymized or deleted.
  • Data flow – The data flow should be made efficient enough to not create more copies than necessary.
  • “State of the art” – consider applying up to date and appropriate technologies for data avoidance and minimisation, like encryption, differential privacy, using VPN, cryptographic hash functions, etc.

While some of these data minimisation measures may be described elsewhere in the document, it is helpful to list them all in one place to make them easy to find and to understand as a whole.

Pseudonymisation: Each interview participant will be assigned a unique (first) pseudonym to identify them in the research dataset at the start of data collection (interviews, Focus groups). Participant’s real names are never associated to the research dataset. Rather, participant’s contact information is kept separated from the research data, and pseudonyms will identify participants within the research dataset.

The GPS data have georeferenced coordinates of the participants on the survey days. These GPS tracks will only be accessible to and used by researchers with training in protecting geodata confidentiality. They will be used only for the specific purpose of addressing the objectives of the study and will not be disseminated. GPS information is associated with a second pseudonym, and a look-up table (reidentification key) that links the first and second pseudonym will be maintained and managed by the PI and stored separately. GPS data will be encrypted and stored separately from all other data. In the unlikely case that the database is hacked, this procedure of separating pseudonymised GPS data from other data will reduce the risk of re-identification. The file with the reidentification key (and the encryption key) will be kept and managed by the PI and stored separately.

In the case of photographs taken by participants of their spatial surroundings and transport infrastructure, explicit instructions will be given to reiterate to participants to try to avoid capturing other people in the picture without their knowledge/consent. In the unlikely case of people being present in the pictures, the following measures will be taken depending on the framing of the picture: (1) faces will be blurred (2) the person will be cropped and/or (3) picture will not be used and deleted permanently.

Previous: Description of the Categories and Purposes of Personal Data | Next: Description of Information Provided to Data Subjects