EDI Dataset Preparation Guides

Community-developed guides for preparing and publishing datasets in the environmental sciences, and similar contexts, using the Ecological Metadata Language (EML).
Author

Environmental Data Initiative and colleagues

Published

May 25, 2022

Overview

Note

These guides and the website are being updated right now.

Most content here is from the most recent “official” update of the guides that happened around May of 2022. Newer, community-led content is available for review in the prerelease version.

This website contains a series of documents about preparing and publishing datasets in the environmental sciences and similar contexts. Topics include community-developed metadata standards, serialization and markup formatting guidelines, best practices for content in ecological synthesis datasets, and more. This documentation is maintained by the Environmental Data Initiative (EDI) and all content has been developed and written in coordination with EDI’s community of scientists, data managers, and repository users.

Guides published here are directed towards the following goals:

  • Minimize heterogeneity of EML-described data packages to simplify development and re-use of software
  • Maximize interoperability to facilitate data synthesis
  • Provide guidance and clarification on
    • the use of Ecological Metadata Language (EML)
    • design a data package
    • prepare a data product for synthesis

To contribute to these documents or participate in the associated working groups, see the “About this site” page or the repository README. This website and all documents are rendered as a Quarto book.

Books

Best Practices for Dataset Metadata in Ecological Metadata Language (EML)

The recommendations for EML metadata apply to all data packages. This book is a reproduction of V3 of the static PDF document “Best Practices for Dataset Metadata in Ecological Metadata Language (EML),” last updated in 2017. The entire most recent (versioned, citable) release will be made available as a PDF.

Data Package Design for Special Cases

Considerations for a well designed data package including special cases based on data type, format, or acquisition method. Examples are images, documents, raw data stored in other repositories.

Scientific Domain-Specific Dataset Guidelines

Very much a work in progress. Recommendations for community-developed data products from specific scientific domains. Not all scientific domains are covered. The data packages are derived from raw data and reformatted to meet certain data harmonization standards, often with extensive related code bases.