Create the taxon_ancillary table
create_taxon_ancillary(
L0_flat,
taxon_id,
datetime = NULL,
variable_name,
unit = NULL,
author = NULL
)
(tbl_df, tbl, data.frame) The fully joined source L0 dataset, in "flat" format (see details).
(character) Column in L0_flat
containing the identifier assigned to each unique organism at the observation level.
(character) An optional in L0_flat
containing the date, and if applicable time, of ancillary location data following the ISO-8601 standard format (e.g. YYYY-MM-DD hh:mm:ss).
(character) Columns in L0_flat
containing the ancillary taxon data.
(character) An optional column in L0_flat
containing the units of each variable_name
following the column naming convention: unit_<variable_name> (e.g. "unit_average_length").
(character) An optional column in L0_flat
containing the person associated with identification of taxa in the taxon table.
(tbl_df, tbl, data.frame) The taxon_ancillary table.
This function collects specified columns from L0_flat
, converts into long (attribute-value) form by gathering variable_name
. Regular expression matching joins unit
to any associated variable_name
and is listed in the resulting table's "unit" column.
"flat" format refers to the fully joined source L0 dataset in "wide" form with the exception of the core observation variables, which are in "long" form (i.e. using the variable_name, value, unit columns of the observation table). This "flat" format is the "widest" an L1 ecocomDP dataset can be consistently spread due to the frequent occurrence of L0 source datasets with > 1 core observation variable.
flat <- ants_L0_flat
taxon_ancillary <- create_taxon_ancillary(
L0_flat = flat,
taxon_id = "taxon_id",
variable_name = c(
"subfamily", "hl", "rel", "rll", "colony.size",
"feeding.preference", "nest.substrate", "primary.habitat",
"secondary.habitat", "seed.disperser", "slavemaker.sp",
"behavior", "biogeographic.affinity", "source"),
unit = c("unit_hl", "unit_rel", "unit_rll"))
taxon_ancillary
#> # A tibble: 742 x 7
#> taxon_ancillary_id taxon_id datetime variable_name value unit author
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 NA subfamily Myrmici~ NA NA
#> 2 2 1 NA hl 1.1582 mill~ NA
#> 3 3 1 NA rel 0.17268~ mill~ NA
#> 4 4 1 NA rll 1.32377~ mill~ NA
#> 5 5 1 NA colony.size Medium NA NA
#> 6 6 1 NA feeding.preference Granivo~ NA NA
#> 7 7 1 NA nest.substrate Wood NA NA
#> 8 8 1 NA primary.habitat Open NA NA
#> 9 9 1 NA secondary.habitat Wet NA NA
#> 10 10 1 NA seed.disperser Y NA NA
#> # ... with 732 more rows