1 Introduction

This document contains current ‘Best Practice’ recommendations for EML content for metadata related to ecological and environmental data. It is intended to augment the EML schema documentation (Jones et al. 2019) for a less-technical audience. The current version (v3, 2017) is one component of several resources available to EML preparers. These recommendations are directed towards the following goals:

Provide guidance and clarification in the implementation of EML for datasets
Minimize heterogeneity of EML documents to simplify development and re-use of software built to ingest it
Maximize interoperability of EML documents to facilitate data synthesis

At time of this document’s publication (late 2017), the version of EML currently in production was EML.2.1.1. EML 2.2.0 is anticipated within the next year. Contact EDI for more information.

1.1 History

EML Best Practice recommendations have evolved over time. The most active contributors have been members of the LTER Information Managers Committee in multiple working groups and workshops. EML has been widely used for several years with multiple applications written against it, and the community has had the opportunity to observe the consequences of many content patterns. As much as possible, recommendations have been aligned with those experiences, as well as with the capability of data contributors.

Timeline and Previous Revisions

2017 Best Practices for Dataset Metadata in EML v3 (this document)
2016 EDI inception, see http://edirepository.org
2011 EML Best Practices for LTER sites v2
2008 EML 2.1 release
2004 EML Best Practices for LTER sites
2003 LTER adopts EML as network exchange standard

Contributors, including LTER EML Best Practices Working Groups and workshops in 2003, 2004, 2010 (alphabetical order):

Dan Bahauddin
Barbara Benson
Emery Boose
James Brunt
Duane Costa
Corinna Gries
Don Henshaw
Margaret O’Brien
Ken Ramsey
Inigo San Gil
Mark Servilla
Wade Sheldon
Philip Tarrant
Theresa Valentine
John Vande Castle,
Kristin Vanderbilt
Jonathan Walsh
Yang Xia

1.2 General Recommendations

Following are general best practices for handling EML dataset metadata:

1.2.1 Metadata Distribution

Do not publicly distribute EML documents containing elements with incorrect information, e.g., as a workaround for missing metadata or to meet validation requirements. Pre-publication drafts, or EML produced for demonstration or testing purposes should be clearly identified as such and not contributed to public archives, because these are passed on to large-scale clearinghouses. For previews of drafts or handling test and demonstration data packages, consult your repository to learn about options.

1.2.2 Data Package Identifiers

Metadata and data set versioning are controlled by the contributor, and so identifiers are tied to local systems. Many repository systems that accept EML-described data support principles of immutable metadata and data entity versioning. EML has elements to contain package identifiers, although these may also be assigned externally. It is the responsibility of the submitters to understand the practices of their intended repository when using identifiers.

1.2.3 High-priority Elements

To support locating data by time, geographic location, and taxonomically, metadata should provide as much information as possible for the data package, in the three <coverage>; elements:
- <temporalCoverage>; (when),
- <geographicCoverage>; (where) and
- <taxonomicCoverage> (what).
For a potential user to evaluate the relevance and usability of the data package for their research study or synthesis projects, metadata should include detailed descriptions in the
- <project>,
- <methods>,
- <protocols>, and
- <intellectualRights> elements.