What to Preserve and What to Delete
What is Archiving?
Archiving research data is the practice of identifying data which is directly or indirectly related to a research project or program and where the data is no longer active. These data should be transferred from research computers into a long-term storage system, such as the Data Archive Geoscience (DAG), an internal archive of the faculty of Geosciences. You can learn more about the DAG on its website. By storing data in an archive you will protect it from loss, unauthorized access, and you will free up local hard drive space. Archival data can also be downloaded and brought back into a research project.
Good open data and research integrity principles suggest archiving research data for at least 10 years, sometimes longer. Once archived, data should not be deleted until the specified period is completed.
It is recommended to start archiving when a set of data will not be adjusted anymore for the research project itself and can be considered as static. But certainly by the end of a project at the very latest and/or when a lead researcher/data collector leaves the projects/institute, whichever comes first.
Contact the Data Team for help with archiving and preserving your data
Data Selection
During your research you may accumulate a lot of data, some of which will be eligible for archiving. It is impossible to preserve all data indefinitely. Archiving all digital data is associated with high costs for storage itself and for maintaining and managing this ever-growing volume of data and associated metadata; it may also lead to decline in discoverability. For those reasons, it is crucial that you carefully select data for preservation.
Where possible, the original (primary/rough) data should be archived, together with the code, processing scripts, or processing instructions needed to consult the data. Next in priority are permanent enriched data files which are derived from the primary data and can be used for analysis as described in the methodology section of the research. Subsequently, results from data analysis which can be used for substantiation of findings which are described in research articles, papers or thesis should be deposited.
For maintenance purposes and to ensure long-term accessibility, it is preferable that data files will be archived in ‘sustainable’ file formats following the FAIR guidelines, where possible. A list of sustainable file formats can be found here, or you can contact your data steward for assistance in finding an open and sustainable file format.
- Original (primary / raw) data
- Enriched data files
- Cleaned / prepared data used as input
- Code / scripts
- Analysis results
- Visualizations / maps / graphs
- Redundant or duplicate Data
- Data concern temporary byproducts, which are irrelevant for future use
- Data is sensitive for privacy reasons regarding the GDPR/AVG: such as consent forms, voice recordings, transcripts, DNA data, or any other data that contains information on specific people.
- Data containing state secrets
- Data sensitive to competition in a commercial sense, preserving data for the long term is in breach of contractual arrangements with your consortium partners or other parties involved
- Data which was licensed or purchased by the UU and not the intellectual property of the UU. Intellectual Property includes copyrights, patents, and trademarks.
In preparing your dataset for archiving, the first step is to determine which parts of your data are sensitive or highly sensitive, so it can be separated from the other data. Also, data with a contractual obligation to delete, temporary data and incomplete data should be left out of the data package which will be archived.
What to do after data selection?
After you have made a division of which data you want to keep and which data you no longer need, the next step is to secure the data you want to keep and delete the data you no longer need.
As for the data you want to keep, house it in a facility where you can preserve it long-term. Besides different repositories, you can also store the data the faculty data archive DAG, see the above website for more information. Do not leave the data in its current location, it is meant to store active data for current and future research projects.
As for the data you no longer need, please move them (temporarily) to the recycle bin or delete them permanently straight away. Also, do not keep a backup copy of this data on your computer, your cloud service or an external device, so as not to take up unnecessary storage space on these platforms (or in your head). This also removes the risk that this data might still surface unintentionally or unexpectedly.