Geo data – support for researchers

Metadata Standards in the Geosciences

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called “data about data” or “information about information”. It ensures that the context for how your data was created, analysed, and stored, is clear, detailed and therefore, reproducible.  

Good metadata enables you to understand, use, and share your own data now and in the future, and helps other researchers discover, access, use, repurpose, and cite your data in the long-term. It also facilitates long-term archival preservation of the data.  

There are many different metadata standards, but they all generally seek to answer the following questions: 

  • What is this data? 
  • Who created it? 
  • When was this created? 
  • What time period does this data cover? 
  • Where was this data created? 
  • What area does this data cover? 
  • How was this data created? 

Providing metadata within the framework of a metadata standard makes your metadata machine readable and human readable (through metadata viewers/editors) and in extension your data more FAIR. By publishing metadata with your data, your data will become more Findable as data search engines will be able to index your data and more Accessible as more people will be able to Access them. Using a metadata standard makes your data more interoperable among systems that use the same metadata standards and increases the interoperability and findability of data across systems that use the same standard.  

Providing metadata within the framework of a metadata standard makes your metadata more FAIR: 

  • More Findable because you are using an interoperable machine standard 
  • More Accessible because your data and metadata will be searchable 
  • More Interoperable because you have used a standard which is inherently interoperable 
  • More Re-usable because more people will be able to understand and work with your data 

Common Metadata Elements

In the following list you can find core metadata elements commonly found in metadata standards: 

Field Description
Title Very brief and concise description of the data, contains the most important keywords and should ideally be between 50 and 60 characters long
Creator(s) & Contributor(s) The persons and/or organizations that created the dataset, and other possible contributor persons and/or organizations
Topic or Subject In many cases this is a single selection (e.g. Geosciences – Earth Science), other related topics or subjects would be added under keywords
Description or Abstract Longform description of the dataset that describes many of the aspects of the dataset that are important but not covered by the existing metadata.
Collection Period The time period the data was collected
Time Coverage The time period the data covers (such as for historic data)
Geographic Location Normally a description of a location (Uithof, Utrecht, Utrecht Province, the Netherlands) or geographic coordinates or bounding box.
References Other datasets or materials used to create or influence the creation of the dataset.
Funding Information Who funded the project and how
Language The language of the data in the dataset, it is also common to use the full name of the language (English, Nederlands) or to use ISO 693 language codes (en, eng, nl, nld)
License How the data is released for re-use, with Copyright retained, under a Creative Commons license, or another license, see our pages on code and data licenses.
Data Access Permissions How others can access the dataset, for Open Science and Open Data we suggest releasing under open access. Open access is not always possible such as in cases where data is the intellectual property of other entities, or contains sensitive data such as personal data or state secrets.
Size of the Dataset How large the dataset is in standard byte units, kilobytes (KB), megabytes (MB), gigabytes (GB), terabytes (TB).
File Formats The file formats contained within the dataset.
Used Standards Industry or field of study standards used in the creation of the dataset. Consider using or publishing standards with protocols.io
Methods Methods used to create the dataset, this can include collection tools and methods, processing standards and variables, export settings, etc.
Keywords Individual words or short phrases used to describe the dataset, consider adding keywords for the dataset that include these topics:

  • Major themes
  • Methodologies
  • Geographies
  • Software Used
  • File Types
  • Equipment Used

Keywords are one of the most efficient ways to make your dataset more findable, provide as many as you can.

Citations Write out how the dataset should be cited in other literature, for best portability, provide it in the Bibtex format so it can be easily translated by citation management software into the standards used by different publications.

Metadata Standards

They are generic and domain-specific groups of metadata elements that were accepted as metadata standard. Here we introduce two of the most used generic standards, GIS metadata standards, and several photo metadata standards and how they’re implemented.  

Dublin Core is considered the baseline for metadata, it answers all the basic questions about any sort of dataset. It is used widely in libraries all over the world to catalog the books, magazines, journals, newspapers, CDs, DVDs, and other materials they collect.  

Example Usage: UU Library 

Read more on Wikipedia

Data Cite is a metadata standard aiming to make datasets discoverable, and is the standard used by Utrecht University’s YoDa data platform, as well as many other data repositories like Zenodo.  

Example Usage: YoDa – DataverseNL – Zenodo 

Read more on Wikipedia

You can write DataCite metadata for YoDa before working in YoDa, find the standalone metadata editor here: 

YoDa Metadata Editor

There are several standards for metadata in GIS, internationally the ISO 19115 is used for geospatial metadata. There are also local metadata standards, in the EU, member states are expected to follow the INSPIRE Directive when creating authoritative government datasets, and in the United States, there is the Federal Geographic Data Committee (FGDC) metadata standard. Confusingly, INSPIRE is an extension of ISO 19115 metadata, as is another common standard of ISO 19139, with either of these you will be safe to write ISO 19115 metadata.  

Example Usage: ArcGIS Online – ArcGIS Pro – QGIS – GeoServer 

Many GIS programs (such as ArcGIS or QGIS) will automatically create the technical metadata for you, such as: 

  • Coverage area 
  • How many records there are 
  • Coordinate System  
  • Attribute Field Data Types (Integer, String, Double, etc) 

Metadata that will need filled in: 

  • Who created the dataset, including contact information 
  • Accuracy/precision of the data 
  • Equipment used to collect the data 
  • Data sources 
  • Attribute Field description  
  • Data update frequency 
  • Permissions/Licensing 

In the coming months we will have manuals on how to work with metadata on our manuals page:
GIS Manuals

There are four common photo metadata standards, with the most common and important being the EXIF format.  

EXIF

EXIF metadata provides information on the camera and its settings when the photo was taken, such as (but not limited to): 

  • Camera Manufacturer 
  • Camera Model 
  • Camera Serial 
  • Aperture 
  • Shutter Speed 
  • ISO 
  • Lens Model 
  • Lens Manufacturer 
  • Lens Zoom  
  • Bit Depth 
  • Dimensions 
  • GPS Location 
  • Altitude 
Read More on Wikipedia

XMP

The second major metadata standard is XMP (Extensible Metadata Platform) which is a format for embedded many different metadata standards into an XML file. This standard is very extensible, it can include the EXIF metadata information such as above, the two metadata standards below (IPTC and DICOM), as well as custom tags that are commonly used for software to record processing history. Processing history can include the steps taken to adjust an image from its capture state to a more stylized look.

IPTC

Another photo metadata standard is the International Press Telecommunications Council (IPTC) Information Interchange Model, mostly referred to as IPTC metadata. This metadata standard is intended for news organizations to exchange information about images taken by photojournalists, and includes information such as:

  • Photographer name 
  • Photographer contact 
  • Organization 
  • Photo subject information (such as a model, person, or place) 
  • Keywords 
  • Job information (such as for a contracted photo shoot)
  • Credit information (how to cite the photo)
Read more on Wikipedia

DICOM

DICOM (Digital Imaging and Communications in Medicine) metadata is a standard for embedding patient information directly into a DICOM (.dicom) or JPEG-2000 (.jp2) file, so that images can be directly referenced to a patient. This metadata standard is embedded directly into the special medical imaging files or included with XMP metadata.

Find your Metadata Standard

For many researchers, if you publish your data to YoDa or Zenodo, you will use the Data Cite v4 standard, and will find the metadata editor built into those sites.  

For GIS data, use the ISO-19115 standard, it is editable within popular GIS software such as QGIS and ArcGIS. If you find ISO-19139 metadata, it is the same standard as ISO-19115 but with an XML schema, both standards are fully interoperable.

The Research Data Alliance has a directory of metadata standards, you can find the directory at the link below:

Research Data Alliance Metadata Catalog

What Does Metadata Look Like?

Metadata can be found in primarily two places, either embedded directly in a file or as a separate file that sits alongside the data in the filesystem (also known as a sidecar file).

Common examples of embedded metadata can be found in the properties of your files in your file explorer, this will include information such as the user that created the file, when it was created, when it was edited, how large the file is. Another example is the EXIF metadata as seen above under “Photo Metadata Standards,” EXIF metadata is stored directly in JPEG, HEIF, and TIFF files, but not in PNG files, for PNG they need to be stored in an XMP sidecar file.

Sidecar file examples include the DataCite metadata found when using YoDa, which exists as a metadata.json file. Another example is in GIS metadata, the metadata can be commonly found in an .xml file with the same name as the dataset. Sidecar metadata files are normally in .xml or .json file formats, or file that is one of these standards but has a different extension. Other sidecar metadata formats also do exist but are not as common.