Recurring data publication was traditionally a manual edit and upload process that can now be automated with programmatic approaches (e.g. EAL) and use of data repository APIs (e.g. EDIutils). These approaches not only reduce effort, but improve metadata accuracy and enable automated workflows. Automated publication and upload is suitable for:
Each use case is demonstrated below.
Ongoing data collection can be the addition of new data to a time series or data changing as a result of an evolving data processing or analytical method. In either case, automated publication requires stabilization of package contents and configuration of EAL for these contents. If package contents deviate from what can be programmatically input to EAL, then metadata accuracy decreases and human and machine understanding are compromised. Example automated data publication for ongoing collection:
make_eml()
’s temporal.coverage
and
package.id
arguments, and then runs make_eml()
for the new data.Another valuable use case of automated data publication is the creation of data packages that are derived from sources with ongoing collection. The derived package may produce a stand alone product, or may be part of a larger science workflow with additional downstream processes. The derived package publication relies event notification, which are a service some data repositories provide. The event notification is sent to the subscriber when a source data package is updated. Building upon the previous workflow:
temporal.coverage
and package.id
arguments,
and runs make_eml()
for the data product.At this point the synthesis science teams workflow ends, but is the beginning of another automated workflow that utilizes the published data product and serves the information to the public. Continuing with the workflow:
Automated data publication coupled with repository event notifications enable efficient, reproducible, and valuable science workflows.