1 Introduction
This document contains current ‘Best Practice’ recommendations for EML content for metadata related to ecological and environmental data. It is intended to augment the EML schema documentation (Jones et al. 2019) for a less-technical audience. The current version (v3, 2017) is one component of several resources available to EML preparers. These recommendations are directed towards the following goals:
- Provide guidance and clarification in the implementation of EML for datasets
- Minimize heterogeneity of EML documents to simplify development and re-use of software built to ingest it
- Maximize interoperability of EML documents to facilitate data synthesis
At time of this document’s publication (late 2017), the version of EML currently in production was EML.2.1.1. EML 2.2.0 is anticipated within the next year. Contact EDI for more information.
1.1 History
EML Best Practice recommendations have evolved over time. The most active contributors have been members of the LTER Information Managers Committee in multiple working groups and workshops. EML has been widely used for several years with multiple applications written against it, and the community has had the opportunity to observe the consequences of many content patterns. As much as possible, recommendations have been aligned with those experiences, as well as with the capability of data contributors.
Timeline and Previous Revisions
- 2017 Best Practices for Dataset Metadata in EML v3 (this document)
- 2016 EDI inception, see http://edirepository.org
- 2011 EML Best Practices for LTER sites v2
- 2008 EML 2.1 release
- 2004 EML Best Practices for LTER sites
- 2003 LTER adopts EML as network exchange standard
Contributors, including LTER EML Best Practices Working Groups and workshops in 2003, 2004, 2010 (alphabetical order):
- Dan Bahauddin
- Barbara Benson
- Emery Boose
- James Brunt
- Duane Costa
- Corinna Gries
- Don Henshaw
- Margaret O’Brien
- Ken Ramsey
- Inigo San Gil
- Mark Servilla
- Wade Sheldon
- Philip Tarrant
- Theresa Valentine
- John Vande Castle,
- Kristin Vanderbilt
- Jonathan Walsh
- Yang Xia
1.2 General Recommendations
Following are general best practices for handling EML dataset metadata:
1.2.1 Metadata Distribution
Do not publicly distribute EML documents containing elements with incorrect information, e.g., as a workaround for missing metadata or to meet validation requirements. Pre-publication drafts, or EML produced for demonstration or testing purposes should be clearly identified as such and not contributed to public archives, because these are passed on to large-scale clearinghouses. For previews of drafts or handling test and demonstration data packages, consult your repository to learn about options.
1.2.2 Data Package Identifiers
Metadata and data set versioning are controlled by the contributor, and so identifiers are tied to local systems. Many repository systems that accept EML-described data support principles of immutable metadata and data entity versioning. EML has elements to contain package identifiers, although these may also be assigned externally. It is the responsibility of the submitters to understand the practices of their intended repository when using identifiers.
1.2.3 High-priority Elements
- To support locating data by time, geographic location, and taxonomically, metadata should provide as much information as possible for the data package, in the three <coverage>; elements:
- <temporalCoverage>; (when),
- <geographicCoverage>; (where) and
- <taxonomicCoverage> (what).
- For a potential user to evaluate the relevance and usability of the data package for their research study or synthesis projects, metadata should include detailed descriptions in the
- <project>,
- <methods>,
- <protocols>, and
- <intellectualRights> elements.