What is Research Data?
Data, in its simplest form, refers to information that can be collected, stored, and analyzed. It can take various forms, including numbers, text, images, and more.
Virtually anything can be data:
- Measurements
- Simulations
- Books
- Transactions
- Diaries
- Musical scores
- Algorithms
- X-Rays
- Historical logs
- Recipes
- Geographies
- etcetera
It can be common to think of data as only hard numbers or categorical data, but data does include so much more. There is so much that can be considered data, it becomes hard to explain without diving in on specific types of data, so we found a couple of short videos online that explain the types of data in succinct ways, watch them on YouTube:
What is Data? Univ of Houston What is Data? Univ of Guelph
While “data” is a broad term encompassing any form of information, “research data” is a more specialized subset specifically collected and analyzed within the structured context of a research project to address specific research questions or objectives. The table below summarizes the key differences between data and research data.
Data | Research Data | |
---|---|---|
Purpose | Various purposes, from personal information to business metrics | Aim of answering research questions, testing hypotheses, or contributing to scholarly knowledge |
Collection Process | Can be collected in various ways | Collected using systematic research methods tailored to the study’s objectives |
Context | Exists in numerous contexts | Inherently linked to the research process, contributing to the academic or scientific exploration |
Application | Widely applicable in everyday life, business decision-making, technology, and more | Specifically used for academic purposes, contributing to the advancement of scientific knowledge |
Qualitative vs Quantitative
Quantitative data involves numerical measurements and quantifiable variables and is typically expressed in terms of numbers and statistics. On the other hand, qualitative data comprises non-numerical information, such as text, images, or observations. Some data can be categorized as either quantitative or qualitative, but many are both.
More explanation about qualitative and quantitative data you can find in this short video; Qualitative and Quantitative Data – Nucleus Biology
Data Type | Quantitative Aspects | Qualitative Aspects |
---|---|---|
Measurements | The numerical values measured | Methods used to operate the machine |
Simulations | Numerical results of the simulation | Who wrote the simulation software? |
Books | How many times each word is used
Number of chapters |
What kind of narration style used?
Motives and relationships between the characters? |
Transactions | How much money was transacted?
Dates of the transactions |
What kinds of products were purchased?
Payment method |
Diaries | The date range in the diary | What was the author doing on a given date? |
Musical Scores | How many key changes?
What are the frequencies of the harmonies? |
What is the cultural context of the music score?
What kind of instrumentation and orchestration has been used? |
Algorithms | How many lines of code does it take to implement this algorithm?
Performance metrics |
What language was used to write the algorithm?
What biases are captured in the algorithm |
X-Rays | The amount of X-Ray energy captured by the sensor or film | Medical diagnoses that can be determined from the X-Ray |
Historical Logs | Measurements (temperature, number of sunspots, transactions)
Skew of the measurement device to modern standards |
Who recorded the logs?
What equipment was used for the logs |
Recipes | How much of each ingredient is used
Cooking time and temperature |
What ingredients are used?
Units used to describe time and temperature |
Geographies | Coordinates of features | Types of features studied |
Primary vs Secondary
The distinction between primary and secondary data lies in their origin and the method through which they are collected.
Primary data is collected by the researcher directly from the source. It can include data gathered through surveys, experiments, interviews, or observation. Researchers collect these data for the specific purpose of addressing the research question at hand. The focus on collecting primary data ensures that the data is current and highly relevant to the topic.
Secondary data is collected by others than the researcher. It can include data from sources such as government reports, academic journals, or industry publications. This data tends to be less specific, but it can also be more extensive, providing broader context to a research area. Secondary data is often used to supplement or support primary data or to provide context for a research project.
We also found a short video on YouTube which explains the differences between primary and secondary data:
Primary and Secondary Data – Prof. EssaData vs. Statistics
While the terms ‘data’ and ‘statistics’ are often used interchangeably, there is an important distinction between them.
Data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created. Statistics are the results of data analysis – its interpretation and presentation. In other words some computation has taken place that provides some understanding of what the data means. Statistics are often, though they don’t have to be, presented in the form of a table, chart, or graph.
Both statistics and data are frequently used in research. Statistics are often reported by government agencies – for example, unemployment statistics or educational literacy statistics. Often these types of statistics are referred to as ‘statistical data’.
Difference between Data and Statistics – Univ of GuelphWhat is Personal Data?
Data becomes personal data when it is collected from, linked to or related to a living individual. What makes data personal depends on the context and the content of the data. Data like phone number, age or height do not become personal until it is linked to an individual. A date like “12 December 1980” becomes personal data if the data context indicates that it is an individual’s birthday.
The individual linked to the data does not need to be identified, they just need to be identifiable. This means that data is still personal even if the identity of the linked individual is not known. The number assigned to customers in a supermarket loyalty program is considered personal data even if it is not linked to the customer’s name, address or any other identifying information. But the individual using this number is indeed identifiable, because this unique number is used to exclusively track their shopping profile, they can be uniquely identified from the pool of the other customers in the loyalty program. Therefore, all the information captured, like the customer number and their shopping data, is considered personal data.
But personal data can become anonymous, when the content of the data refers to people as a group and not individually, so that it is no longer possible to identify a single individual from the group. From the previous example, customers are identifiable due to the uniqueness of the customer number, and the uniqueness of their shopping profile. But if this ‘uniqueness’ is removed, by removing the customer number and aggregating shopping profiles of all customers, then the whole dataset can be considered anonymous.
You can learn more about personal and sensitive data on our website, find it here:
Personal Data