Adds semantic meaning to dataset features (variables, locations, persons, etc.) through links to ontology terms. Run this function after all your EAL metadata templates are complete or if you're annotating an existing EML file. Annotating a dataset enables greater human understanding and machine actionability (linked data) and vastly improves the accuracy of future user searches and the interoperability of data in general.

template_annotations(
  path,
  data.path = path,
  data.table = NULL,
  other.entity = NULL,
  default.annotations = NULL,
  eml.path = path,
  eml = NULL
)

Arguments

path

(character) Path to the metadata template directory and where annotations.txt will be written.

data.path

(character; optional) Path to the data directory. Defaults to path.

data.table

(character; optional) Table name. If more than one, then supply as a vector of character strings (e.g. data.table = c('nitrogen.csv', 'decomp.csv')).

other.entity

(character; optional) Other entity name. If more than one, then supply as a vector of character strings (e.g. other.entity = c('maps.zip', 'analysis.R')).

default.annotations

(data frame; optional) Default annotations added to annotations.txt. EMLassemblyline specified defaults are used unless specifying your own (see note below). You can manually change these after annotations.txt has been created.

eml.path

(character; optional) Path to the EML directory. Use this if creating annotations.txt for an EML file. Defaults to path.

eml

(EML file; optional) An EML file located at eml.path. Use this if annotating an existing EML file.

Value

annotations

Columns:

  • id: A unique identifier for the element being annotated.

  • element: The element being annotated.

  • context: The context of the subject (i.e. element value) being annotated (e.g. If the same column name occurs in more than one data tables, you will need to know which table it came from.).

  • subject: The element value to be annotated.

  • predicate_label: The predicate label (a.k.a. property) describing the relation of the subject to the object. This label should be copied directly from an ontology.

  • predicate_uri: The predicate label URI copied directly from an ontology.

  • object_label: The object label (a.k.a. value) describing the subject. This label should be copied directly from an ontology.

  • object_uri: The object URI copied from an ontology.

The general user should ignore the id and element fields and focus on the subject, context predicate_label, predicate_uri, object_label, and object_uri fields. Only the predicate and object fields should be modified. If you want to add an annotation to any of the listed subjects, simply copy the full row containing the subject, paste it in as new line, and modify the predicate and object fields.

Details

This function gathers annotatable elements from your EML and assigns default predicate labels and URIs. You must provide object labels and URIs from the ontology of your choosing.

Note

To set your own default annotations, copy the EMLassemblyline defaults (file.copy(from = system.file("/templates/annotation_defaults.txt", package = "EMLassemblyline"), to = path), where path is where you want the file written to) then change the values in the predicate_label, predicate_uri, object_label, and object_uri fields, save the file, read it in to R as a data frame and use it with the default.annotations argument.

Some users may want to build annotations.txt from scratch. A few rules to follow when doing this:

  • id - IDs must be unique for each unique subject.

  • element - Supported elements and the required syntax is listed under the element column of annotation_defaults.txt. View this file with View(system.file("/templates/annotation_defaults.txt", package = "EMLassemblyline"))

  • context - Context values are only required for elements that are nested within other elements. Currently only /dataTable/attribute elements require context where the dataTable objectName is the context (e.g. nitrogen.csv).

  • subject - Subjects are required for each annotation. For /dataset the subject is "dataset". For /dataTable the subject is the file name. For /dataTable/attribute the subjects are the dataTable field names. For /otherEntity the subject is the file name. For /ResponsibleParty the subject is created with paste(first.name, middle.name, last.name, collapse = " ")

Examples

if (FALSE) {
# Set working directory
setwd("/Users/me/Documents/data_packages/pkg_260")

# For a set of EAL templates describing 2 tables and 2 other entities
template_annotations(
 path = "./metadata_templates",
 data.path = "./data_objects",
 data.table = c("nitrogen.csv", "decomp.csv"),
 other.entity = c("ancillary_data.zip", "processing_and_analysis.R"))
 
# For an existing EML file
template_annotations(
 path = "./metadata_templates",
 eml = "edi.260.3.xml")
}