Create the taxon_ancillary table

create_taxon_ancillary(
  L0_flat,
  taxon_id,
  datetime = NULL,
  variable_name,
  unit = NULL,
  author = NULL
)

Arguments

L0_flat: (tbl_df, tbl, data.frame) The fully joined source L0 dataset, in "flat" format (see details).
taxon_id: (character) Column in L0_flat containing the identifier assigned to each unique organism at the observation level.
datetime: (character) An optional in L0_flat containing the date, and if applicable time, of ancillary location data following the ISO-8601 standard format (e.g. YYYY-MM-DD hh:mm:ss).
variable_name: (character) Columns in L0_flat containing the ancillary taxon data.
unit: (character) An optional column in L0_flat containing the units of each variable_name following the column naming convention: unit_<variable_name> (e.g. "unit_average_length").
author: (character) An optional column in L0_flat containing the person associated with identification of taxa in the taxon table.

Value

(tbl_df, tbl, data.frame) The taxon_ancillary table.

Details

This function collects specified columns from L0_flat, converts into long (attribute-value) form by gathering variable_name. Regular expression matching joins unit to any associated variable_name and is listed in the resulting table's "unit" column.

"flat" format refers to the fully joined source L0 dataset in "wide" form with the exception of the core observation variables, which are in "long" form (i.e. using the variable_name, value, unit columns of the observation table). This "flat" format is the "widest" an L1 ecocomDP dataset can be consistently spread due to the frequent occurrence of L0 source datasets with > 1 core observation variable.

Examples

flat <- ants_L0_flat

taxon_ancillary <- create_taxon_ancillary(
  L0_flat = flat,
  taxon_id = "taxon_id",
  variable_name = c(
    "subfamily", "hl", "rel", "rll", "colony.size", 
    "feeding.preference", "nest.substrate", "primary.habitat", 
    "secondary.habitat", "seed.disperser", "slavemaker.sp", 
    "behavior", "biogeographic.affinity", "source"),
  unit = c("unit_hl", "unit_rel", "unit_rll"))

taxon_ancillary
#> # A tibble: 742 x 7
#>    taxon_ancillary_id taxon_id datetime variable_name      value    unit  author
#>    <chr>              <chr>    <chr>    <chr>              <chr>    <chr> <chr> 
#>  1 1                  1        NA       subfamily          Myrmici~ NA    NA    
#>  2 2                  1        NA       hl                 1.1582   mill~ NA    
#>  3 3                  1        NA       rel                0.17268~ mill~ NA    
#>  4 4                  1        NA       rll                1.32377~ mill~ NA    
#>  5 5                  1        NA       colony.size        Medium   NA    NA    
#>  6 6                  1        NA       feeding.preference Granivo~ NA    NA    
#>  7 7                  1        NA       nest.substrate     Wood     NA    NA    
#>  8 8                  1        NA       primary.habitat    Open     NA    NA    
#>  9 9                  1        NA       secondary.habitat  Wet      NA    NA    
#> 10 10                 1        NA       seed.disperser     Y        NA    NA    
#> # ... with 732 more rows