Geo data – support for researchers

What is Research Data?

Data, in its simplest form, refers to information that can be collected, stored, and analyzed. It can take various forms, including numbers, text, images, and more. 

Virtually anything can be data: 

  • Measurements 
  • Simulations 
  • Books 
  • Transactions 
  • Diaries 
  • Musical scores 
  • Algorithms 
  • X-Rays 
  • Historical logs 
  • Recipes 
  • Geographies 
  • etcetera

It can be common to think of data as only hard numbers or categorical data, but data does include so much more. There is so much that can be considered data, it becomes hard to explain without diving in on specific types of data, so we found a couple of short videos online that explain the types of data in succinct ways, watch them on YouTube:

What is Data? Univ of Houston What is Data? Univ of Guelph

 

While “data” is a broad term encompassing any form of information, “research data” is a more specialized subset specifically collected and analyzed within the structured context of a research project to address specific research questions or objectives. The table below summarizes the key differences between data and research data. 

Data Research Data
Purpose Various purposes, from personal information to business metrics Aim of answering research questions, testing hypotheses, or contributing to scholarly knowledge
Collection Process Can be collected in various ways Collected using systematic research methods tailored to the study’s objectives
Context Exists in numerous contexts Inherently linked to the research process, contributing to the academic or scientific exploration
Application Widely applicable in everyday life, business decision-making, technology, and more Specifically used for academic purposes, contributing to the advancement of scientific knowledge

Qualitative vs Quantitative

Quantitative data involves numerical measurements and quantifiable variables and is typically expressed in terms of numbers and statistics. On the other hand, qualitative data comprises non-numerical information, such as text, images, or observations. Some data can be categorized as either quantitative or qualitative, but many are both.

More explanation about qualitative and quantitative data you can find in this short video; Qualitative and Quantitative Data – Nucleus Biology

Data Type Quantitative Aspects Qualitative Aspects
Measurements The numerical values measured Methods used to operate the machine
Simulations Numerical results of the simulation Who wrote the simulation software?
Books How many times each word is used

Number of chapters

What kind of narration style used?

Motives and relationships between the characters?

Transactions How much money was transacted?

Dates of the transactions

What kinds of products were purchased?

Payment method

Diaries The date range in the diary What was the author doing on a given date?
Musical Scores How many key changes?

What are the frequencies of the harmonies?

What is the cultural context of the music score?

What kind of instrumentation and orchestration has been used?

Algorithms How many lines of code does it take to implement this algorithm?

Performance metrics

What language was used to write the algorithm?

What biases are captured in the algorithm

X-Rays The amount of X-Ray energy captured by the sensor or film Medical diagnoses that can be determined from the X-Ray
Historical Logs Measurements (temperature, number of sunspots, transactions)

Skew of the measurement device to modern standards

Who recorded the logs?

What equipment was used for the logs

Recipes How much of each ingredient is used

Cooking time and temperature

What ingredients are used?

Units used to describe time and temperature

Geographies Coordinates of features Types of features studied

Primary vs Secondary

The distinction between primary and secondary data lies in their origin and the method through which they are collected.

Primary data is collected by the researcher directly from the source. It can include data gathered through surveys, experiments, interviews, or observation. Researchers collect these data for the specific purpose of addressing the research question at hand. The focus on collecting primary data ensures that the data is current and highly relevant to the topic.

Secondary data is collected by others than the researcher. It can include data from sources such as government reports, academic journals, or industry publications. This data tends to be less specific, but it can also be more extensive, providing broader context to a research area. Secondary data is often used to supplement or support primary data or to provide context for a research project.

We also found a short video on YouTube which explains the differences between primary and secondary data:

Primary and Secondary Data – Prof. Essa

Data vs. Statistics 

While the terms ‘data’ and ‘statistics’ are often used interchangeably, there is an important distinction between them.   

Data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created.   Statistics are the results of data analysis – its interpretation and presentation. In other words some computation has taken place that provides some understanding of what the data means. Statistics are often, though they don’t have to be, presented in the form of a table, chart, or graph.  

Both statistics and data are frequently used in research. Statistics are often reported by government agencies – for example, unemployment statistics or educational literacy statistics. Often these types of statistics are referred to as ‘statistical data’. 

Difference between Data and Statistics – Univ of Guelph

What is Personal Data? 

Data becomes personal data when it is collected from, linked to or related to a living individual. What makes data personal depends on the context and the content of the data. Data like phone number, age or height do not become personal until it is linked to an individual. A date like “12 December 1980” becomes personal data if the data context indicates that it is an individual’s birthday.  

The individual linked to the data does not need to be identified, they just need to be identifiable. This means that data is still personal even if the identity of the linked individual is not known. The number assigned to customers in a supermarket loyalty program is considered personal data even if it is not linked to the customer’s name, address or any other identifying information. But the individual using this number is indeed identifiable, because this unique number is used to exclusively track their shopping profile, they can be uniquely identified from the pool of the other customers in the loyalty program. Therefore, all the information captured, like the customer number and their shopping data, is considered personal data. 

But personal data can become anonymous, when the content of the data refers to people as a group and not individually, so that it is no longer possible to identify a single individual from the group. From the previous example, customers are identifiable due to the uniqueness of the customer number, and the uniqueness of their shopping profile. But if this ‘uniqueness’ is removed, by removing the customer number and aggregating shopping profiles of all customers, then the whole dataset can be considered anonymous. 

You can learn more about personal and sensitive data on our website, find it here:

Personal Data