Overview
Metadata inferred during the templating process should be validated
by the user and missing info added. Use spreadsheet and text editors for
this process. Template specific guides are listed below.
NOTES:
- Tabular templates: Leave empty cells blank, don’t fill with
NAs.
- Free-text templates: Keep template content simple. Complex
formatting can lead to errors.
abstract (.docx, .md, .txt)
Describes the salient features of a dataset in a concise summary much
like an abstract does in a journal article. It should cover what the
data are and why they were created.
Example
methods (.docx, .md, .txt)
Describes the data creation methods. Includes enough detail for
future users to correctly use the data. Lists instrument descriptions,
protocols, etc.
Example
keywords.txt
Describes the data in a small set of terms. Keywords facilitate
search and discovery on scientific terms, as well as names of research
groups, field stations, and other organizations. Using a controlled
vocabulary or thesaurus vastly improves discovery. We recommend using
the LTER
Controlled Vocabulary when possible.
Columns:
-
keyword One keyword per line
-
keywordThesaurus URI of the vocabulary from which
the keyword originates.
Example
personnel.txt
Describes the personnel and funding sources involved in the creation
of the data. This facilitates attribution and reporting.
Columns:
-
givenName First name
-
middleInitial Middle initial
-
surName Last name
-
organizationName Organization the person belongs
to
-
electronicMailAddress Email address
-
userId Persons research identifier (e.g. ORCID). Links a persons research profile
to a data publication.
-
role Role of the person with respect to the data.
Persons serving more than one role are listed on separate lines
(e.g. replicate the persons info on separate lines but change the role.
Valid options:
-
creator Author of the data. Will appear in the data
citation.
-
PI Principal investigator the data were created
under. Will appear with project level metadata.
-
contact A point of contact for questions about the
data. Can be an organization or position (e.g. data manager). To do
this, enter the organization or position name under givenName
and leave middleInitial and surName empty.
- Other roles (e.g. Field Technician) will be listed as associated
parties to the data.
- Funding information is listed with PIs
-
projectTitle Title of project the data were created
under. If ancillary projects were involved, then add as new lines below
the primary project with the PIs info replicated.
-
fundingAgency Agency the project was funded
by.
-
fundingNumber Grant or award number.
Example
intellectual_rights.txt
Describes how the data may be used. Releasing without restriction (CC0) or
with minimal attribution (CC BY) maximizes
value and future use.
Example
attributes_*.txt
Describes columns of a data table (classes, units, datetime formats,
missing value codes).
Columns:
-
attributeName Column name
-
attributeDefinition Column definition
-
class Column class. Valid options are:
-
numeric Numeric variable
-
categorical Categorical variable
(i.e. nominal)
-
character Free text character strings
(e.g. notes)
-
Date Date and time variable
-
unit Column unit. Required for numeric
classes. Select from EML’s standard unit dictionary, accessible with
view_unit_dictionary()
. Use values in the “id” column. If
not found, then define as a custom unit (see custom_units.txt).
-
dateTimeFormatString Format string. Required for
Date classes. Valid format string components are:
-
Y Year
-
M Month
-
D Day
-
h Hour
-
m Minute
-
s Second Common separators of format string
components (e.g. - / :) are supported.
-
missingValueCode Missing value code. Required for
columns containing a missing value code).
-
missingValueCodeExplanation Definition of missing
value code.
Example
1, Example
2
custom_units.txt
Describes non-standard units used in a data table attribute
template.
Columns:
-
id Unit name listed in the unit column of the table
attributes template (e.g. feetPerSecond)
-
unitType Unit type (e.g. velocity)
-
parentSI SI equivalent (e.g. metersPerSecond)
-
multiplierToSI Multiplier to SI equivalent
(e.g. 0.3048)
-
description Abbreviation (e.g. ft/s)
Example
catvars_*.txt
Describes categorical variables of a data table (if any columns are
classified as categorical in table attributes template).
Columns:
-
attributeName Column name
-
code Categorical variable
-
definition Definition of categorical variable
Example
1, Example
2
geographic_coverage.txt
Describes where the data were collected.
Columns:
-
geographicDescription Brief description of
location.
-
northBoundingCoordinate North coordinate
-
southBoundingCoordinate South coordinate
-
eastBoundingCoordinate East coordinate
-
westBoundingCoordinate West coordinate
Coordinates must be in decimal degrees and include a minus sign (-)
for latitudes south of the equator and longitudes west of the prime
meridian. For points, repeat latitude and longitude coordinates in
respective north/south and east/west columns.
Example
taxonomic_coverage.txt
Describes biological organisms occuring in the data and helps resolve
them to authority systems. If matches can be made, then the full
taxonomic hierarchy of scientific and common names are automatically
rendered in the final EML metadata. This enables future users to search
on any taxonomic level of interest across data packages in
repositories.
Columns:
-
taxa_raw Taxon name as it occurs in the data and as
it will be listed in the metadata if no value is listed under the
name_resolved column. Can be single word or species binomial.
-
name_type Type of name. Can be “scientific” or
“common”.
-
name_resolved Taxons name as found in an authority
system.
-
authority_system Authority system in which the
taxa’s name was found. Can be: “ITIS”, “WORMS”, “or”GBIF“.
-
authority_id Taxa’s identifier in the authority
system (e.g. 168469).
Example
provenance.txt
Describes source datasets. Explicitly listing the DOIs and/or URLs of
input data help future users understand in greater detail how the
derived data were created and may some day be able to assign attribution
to the creators of referenced datasets.
Provenance metadata can be automatically generated for supported
repositories simply by specifying an identifier (i.e. EDI) in the
systemID column. For unsupported repositories, the systemID column
should be left blank.
Columns:
-
dataPackageID Data package identifier. Supplying a
valid packageID and systemID (of supported systems) is all that is
needed to create a complete provenance record.
-
systemID System (i.e. data repository) identifier.
Currently supported systems are: EDI (Environmental Data Initiative).
Leave this column blank unless specifying a supported system.
-
url URL linking to an online source (i.e. data,
paper, etc.). Required when a source can’t be defined by a packageID and
systemID.
-
onlineDescription Description of the data source.
Required when a source can’t be defined by a packageID and
systemID.
-
title The source title. Required when a source
can’t be defined by a packageID and systemID.
-
givenName A creator or contacts given name.
Required when a source can’t be defined by a packageID and
systemID.
-
middleInitial A creator or contacts middle initial.
Required when a source can’t be defined by a packageID and
systemID.
-
surName A creator or contacts middle initial.
Required when a source can’t be defined by a packageID and
systemID.
-
role “creator” and “contact” of the data source.
Required when a source can’t be defined by a packageID and systemID. Add
both the creator and contact as separate rows within the template, where
the information in each row is duplicated except for the givenName,
middleInitial, surName (or organizationName), and role fields.
-
organizationName Name of organization the creator
or contact belongs to. Required when a source can’t be defined by a
packageID and systemID.
-
email Email of the creator or contact. Required
when a source can’t be defined by a packageID and systemID.
Example
annotations.txt
Adds semantic meaning to metadata (variables, locations, persons,
etc.) through links to ontology terms. This enables greater human
understanding and machine actionability (linked data) and greatly
improves the discoverability and interoperability of data in
general.
Columns:
-
id A unique identifier for the element being
annotated.
-
element The element being annotated.
-
context The context of the subject (i.e. element
value) being annotated (e.g. If the same column name occurs in more than
one data tables, you will need to know which table it came from.).
-
subject The element value to be annotated.
-
predicate_label The predicate label (a.k.a.
property) describing the relation of the subject to the object. This
label should be copied directly from an ontology.
-
predicate_uri The predicate label URI copied
directly from an ontology.
-
object_label The object label (a.k.a. value)
describing the subject. This label should be copied directly from an
ontology.
-
object_uri The object URI copied from an
ontology.
Example
additional_info (.docx, .md, .txt)
Ancillary info not captured by any of the other templates.
Example