Overview

The taxonomyCleanr is easy to use and requires no taxonomic expertiese. Simply send your data through a series of cleaning functions (count_taxa, trim_taxa, replace_taxa, remove_taxa), send the resultant output to the resolver functions (resolve_sci_taxa, resolve_comm_taxa), and create a revision of your raw data (revise_taxa). Voila! Clean taxonomic data!

Below is a demonstration of this process using example data that comes installed with the taxonomyCleanr package.

Installation

Install taxonomyCleanr from the project GitHub.

# Install from GitHub
# remotes::install_github('EDIorg/taxonomyCleanr')
library(taxonomyCleanr)

Load data

Load the taxonomic data into RStudio as a data frame (or tibble). The taxa must be listed in a single column of character type class, not factor type class.

# Load test data installed with the taxonomyCleanr package
data <- data.table::fread(file = system.file('example_data.txt', package = 'taxonomyCleanr'))

This data table has 6 columns:

  • Year - Year the sample was taken
  • Sample_Date - Date the sample was taken
  • Plot - Plot number the sample was taken from
  • Heat_treatment - The heat treatment applied to the plot.
  • Species - Taxa observed in each sample. NOTE: This column contains species binomials, common names, invalid taxonomic names, taxa names of varying hierarchical ranks, etc. This is the list of taxa we will clean up.
  • Mass - Mass of the organism obtained from each sample.
Test data containing a column of taxa to be cleaned.
Year Sample_Date Plot Heat_Treatment Species Mass
2007 8/22/07 29 Control Mosses 150.53
2007 8/22/07 29 Control Unsorted biomass 150.53
2007 8/24/07 29 High Achillea millefolium(lanulosa) 0.13
2007 8/24/07 29 High Achillea millefolium(lanulosa) 3.60
2007 8/24/07 29 High Achillea millefolium(lanulosa) 9.77
2007 8/24/07 29 High Crepis tectorum 1.43
2007 8/24/07 29 High Cyperus sp. 0.53
2007 8/23/07 29 High Euphorbia glyptosperma 0.05
2007 8/24/07 29 High Lespedeza capitata 139.90
2007 8/21/07 29 Low Achillea millefolium(lanulosa) 0.60
2007 8/21/07 29 Low Achillea millefolium(lanulosaaaa) 1.10
2007 8/21/07 29 Low Achillea millefolium(lanulosabb) 4.90
2007 8/21/07 29 Low Achillea millefolium(lanulosacc) 0.87
2007 8/21/07 29 Low Cyperus sp. 1.53
2007 8/20/07 29 Low Lepidium densiflorum 0.05
2007 8/21/07 29 Low Lespedeza capitata 54.67
2007 8/21/07 29 Low Poa pratensis 0.07
2007 8/21/07 29 Low Rumex acetosella 0.33
2007 8/21/07 29 Low Schizachyrium scoparium 0.70
2007 8/21/07 64 Control Unsorted biomass 202.55
2007 8/20/07 64 High Bouteloua gracilis 12.20
2007 8/20/07 64 High Cyperus sp. 0.23
2007 8/20/07 64 High Koeleria cristata 0.87
2007 8/20/07 64 High Lespedeza capitata 52.83
2007 8/20/07 64 High Liatris aspera 2.80
2007 8/20/07 64 High Lupinus perennis 50.90
2007 8/20/07 64 High Panicum virgatum 0.43
2007 8/20/07 64 High Petalostemum purpureum 9.03
2007 8/20/07 64 High Petalostemum villosum 15.13
2007 8/20/07 64 High Poa pratensis 0.23
2007 8/20/07 64 High Schizachyrium scoparium 23.60
2007 8/20/07 64 High Solidago nemoralis 32.10
2007 8/20/07 64 High Sorghastrum nutans 5.53
2007 8/20/07 64 High Stipa spartea 4.00
2007 8/24/07 64 Low Achillea millefolium(lanulosa) 0.83
2007 8/24/07 64 Low Andropogon gerardi 13.93
2007 8/24/07 64 Low Bouteloua curtipendula 0.20
2007 8/24/07 64 Low Bouteloua gracilis 27.10
2007 8/24/07 64 Low Coreopsis palmata 4.40
2007 8/24/07 64 Low Koeleria cristata 0.83
2007 8/24/07 64 Low Lespedeza capitata 44.10
2007 8/24/07 64 Low Liatris aspera 14.00
2007 8/24/07 64 Low Lupinus perennis 27.87
2007 8/24/07 64 Low Miscellaneous litter 1.87
2007 8/24/07 64 Low Petalostemum candidum 3.00
2007 8/24/07 64 Low Petalostemum purpureum 9.33
2007 8/24/07 64 Low Schizachyrium scoparium 10.97
2007 8/24/07 64 Low Solidago rigida 39.57
2007 8/24/07 64 Low Sporobolus cryptandrus 0.60
2007 8/20/07 69 Control Unsorted biomass 58.85
2007 8/20/07 69 High Achillea millefolium(lanulosa) 3.53
2007 8/20/07 69 High Agropyron repens 0.37
2007 8/20/07 69 High Aristida basiramea 6.53
2007 8/20/07 69 High Crepis tectorum 0.83
2007 8/20/07 69 High Cyperus sp. 3.17
2007 8/20/07 69 High Digitaria sp. 0.10
2007 8/20/07 69 High Erigeron canadensis 8.33
2007 8/20/07 69 High Hedeoma hispida 1.30
2007 8/20/07 69 High Lepidium densiflorum 0.13
2007 8/20/07 69 High Lupinus perennis 0.37
2007 8/20/07 69 High Miscellaneous litter 0.70
2007 8/20/07 69 High Physalis virginiana 0.43
2007 8/20/07 69 High Taraxicum officinalis 0.06
2007 8/24/07 69 Low Achillea millefolium(lanulosa) 2.97
2007 8/24/07 69 Low Agrostis scabra 0.47
2007 8/24/07 69 Low Ambrosia artemisiifolia elatior 0.47
2007 8/24/07 69 Low Aristida basiramea 9.53
2007 8/24/07 69 Low Bouteloua gracilis 0.10
2007 8/24/07 69 Low Crepis tectorum 1.33
2007 8/24/07 69 Low Cyperus sp. 1.70
2007 8/24/07 69 Low Eragrostis spectabilis 1.63
2007 8/24/07 69 Low Erigeron canadensis 2.67
2007 8/24/07 69 Low Hedeoma hispida 0.20
2007 8/24/07 69 Low Lespedeza capitata 0.10
2007 8/24/07 69 Low Miscellaneous litter 0.93
2007 8/24/07 69 Low Physalis virginiana 1.97
2007 8/24/07 69 Low Schizachyrium scoparium 0.23
2007 8/23/07 78 Control Amorpha canescens 9.30
2007 8/21/07 78 Control Unsorted biomass 216.13
2007 8/23/07 78 High Andropogon gerardi 22.60
2007 8/23/07 78 High Bouteloua gracilis 0.73
2007 8/23/07 78 High Lespedeza capitata 7.03
2007 8/23/07 78 High Lupinus perennis 20.90
2007 8/23/07 78 High Petalostemum purpureum 0.30
2007 8/23/07 78 High Poa pratensis 1.30
2007 8/23/07 78 High Schizachyrium scoparium 10.87
2007 8/23/07 78 High Solidago rigida 133.00
2007 8/23/07 78 High Sorghastrum nutans 1.40
2007 8/28/07 78 Low Amorpha canescens 47.80
2007 8/28/07 78 Low Andropogon gerardi 19.10
2007 8/28/07 78 Low Bouteloua gracilis 9.93
2007 8/28/07 78 Low Coreopsis palmata 1.03
2007 8/28/07 78 Low Erigeron canadensis 4.17
2007 8/28/07 78 Low -9999 9.53
2007 8/28/07 78 Low Liatris aspera 11.03
2007 8/28/07 78 Low Lupinus perennis 10.27
2007 8/28/07 78 Low Petalostemum purpureum 6.53
2007 8/28/07 78 Low Poa pratensis 0.90
2007 8/28/07 78 Low Schizachyrium scoparium 10.83
2007 8/28/07 78 Low Solidago nemoralis 3.43
2007 8/22/07 29 Control Mosses 150.53
2007 8/22/07 29 Control Yellow Perch 150.53
2007 8/22/07 29 Control Rainbow smelt 150.53
2007 8/22/07 29 Control Large mouth bass 150.53
2007 8/28/07 78 Low Petalostemum S.p. 6.53
2007 8/28/07 78 Low Poa Cf. 0.90
2007 8/28/07 78 Low Schizachyrium spp. 10.83
2007 8/28/07 78 Low Petalostemum 6.53
2007 8/28/07 78 Low Poa cf… 0.90
2007 8/28/07 78 Low Schizachyrium sPp 10.83
2007 8/24/07 64 Low _Koeleria_cristata 0.83
2007 8/24/07 64 Low Lespedeza_capitata 44.10
2007 8/24/07 64 Low Liatris_aspera 14.00
2007 8/24/07 64 Low 0.83
2007 8/24/07 64 Low 44.10
2007 8/24/07 64 Low Oncorhynchus tshawytscha 44.10
2007 8/24/07 64 Low Oncorhynchus gorbuscha 44.10
2007 8/24/07 64 Low Oncorhynchus kisutch 44.10

Create taxa map

The taxa map (taxa_map.csv) links the raw data to the cleaned data. Each cleaning function logs changes to taxa_map.csv thereby facilitating an understanding of how the data were changed and a means by which to update the raw data table. A thorough explanation of the maps contents will be provided after the cleaning and resolver processes have been run on these example data.

# Create the taxa map
my_path <- tempdir()
taxa_map <- create_taxa_map(path = my_path, x = data, col = 'Species')

Count taxa

Get the unique taxa names and respective counts with count_taxa. This function helps identify issues that should be fixed before sending the taxa list to the resolver functions. Doing so increases the success of an authority match. Notice, some of the taxa in the test data are obviously misspelled (e.g. Achillea millefolium(lanulosa) and Achillea millefolium(lanulosaaaa) likely represent the same taxon), and some of the listed names are clearly not taxa (e.g. -9999 and Miscellaneous litter).

# Get unique taxa and counts
output <- count_taxa(x = data, col = 'Species', path = my_path)
Unique taxa and their respective counts. Several issues exist with these taxa.
Taxa Count
2
_Koeleria_cristata 1
Liatris_aspera 1
-9999 1
Achillea millefolium(lanulosa) 7
Achillea millefolium(lanulosaaaa) 1
Achillea millefolium(lanulosabb) 1
Achillea millefolium(lanulosacc) 1
Agropyron repens 1
Agrostis scabra 1
Ambrosia artemisiifolia elatior 1
Amorpha canescens 2
Andropogon gerardi 3
Aristida basiramea 2
Bouteloua curtipendula 1
Bouteloua gracilis 5
Coreopsis palmata 2
Crepis tectorum 3
Cyperus sp. 5
Digitaria sp. 1
Eragrostis spectabilis 1
Erigeron canadensis 3
Euphorbia glyptosperma 1
Hedeoma hispida 2
Koeleria cristata 2
Large mouth bass 1
Lepidium densiflorum 2
Lespedeza capitata 6
Lespedeza_capitata 1
Liatris aspera 3
Lupinus perennis 5
Miscellaneous litter 3
Mosses 2
Oncorhynchus gorbuscha 1
Oncorhynchus kisutch 1
Oncorhynchus tshawytscha 1
Panicum virgatum 1
Petalostemum 1
Petalostemum candidum 1
Petalostemum purpureum 4
Petalostemum S.p. 1
Petalostemum villosum 1
Physalis virginiana 2
Poa Cf. 1
Poa cf… 1
Poa pratensis 4
Rainbow smelt 1
Rumex acetosella 1
Schizachyrium scoparium 6
Schizachyrium sPp 1
Schizachyrium spp. 1
Solidago nemoralis 2
Solidago rigida 2
Sorghastrum nutans 2
Sporobolus cryptandrus 1
Stipa spartea 1
Taraxicum officinalis 1
Unsorted biomass 4
Yellow Perch 1

Trim taxa

Several of the taxa have variations of common suffixes found in taxonomic data (e.g. c.f. and sp.), but frequently cause issues when searching taxonomic authorities. The trim_taxa function removes these excess characters as well as leading and trailing white spaces and under score characters.

# Trim excess characters from the taxa list
output <- trim_taxa(path = my_path)

Running count_taxa on the raw data frame (i.e. data), in combination with the information logged to taxa_map.csv from trim_taxa, creates a view of the updated taxa list.

# View the taxa after running trim_taxa
output <- count_taxa(x = data, col = 'Species', path = my_path)
Unique taxa and counts after trim_taxa. Notice, extraneous characters (e.g. c.f., spp., and underscores) have been removed.
Taxa Count
105
Cyperus 5
Digitaria 1
Koeleria cristata 1
Lespedeza capitata 1
Liatris aspera 1
Poa 2
Schizachyrium 2

Replace taxa

Some of the taxa are misspelled. Use replace_taxa to replace the misspelled taxa with the correct spelling, or the best guess of the correct spelling. Use count_taxa to verify these changes.

# Replace misspelled taxa with the correct spelling
output <- replace_taxa(path = my_path, input = 'Achillea millefolium(lanulosa)', output = 'Achillea millefolium')
output <- replace_taxa(path = my_path, input = 'Achillea millefolium(lanulosaaaa)', output = 'Achillea millefolium')
output <- replace_taxa(path = my_path, input = 'Achillea millefolium(lanulosabb)', output = 'Achillea millefolium')
output <- replace_taxa(path = my_path, input = 'Achillea millefolium(lanulosacc)', output = 'Achillea millefolium')

# Get the list of unique taxa
output <- count_taxa(x = data, col = 'Species', path = my_path)
Unique taxa counts after replacing misspelled taxa.
Taxa Count
108
Cyperus 5
Digitaria 1
Poa 2
Schizachyrium 2

Remove taxa

Some taxa in the list are clearly not taxa, and should be removed with remove_taxa before attempting to resolve to an authority.

# Remove taxa
output <- remove_taxa(path = my_path, input = '')
output <- remove_taxa(path = my_path, input = '-9999')
output <- remove_taxa(path = my_path, input = 'Unsorted biomass')
output <- remove_taxa(path = my_path, input = 'Miscellaneous litter')

# Get unique taxa and counts
output <- count_taxa(x = data, col = 'Species', path = my_path)
Unique taxa and counts after non-taxa have been removed.
Taxa Count
Cyperus 5
Digitaria 1
Poa 2
Schizachyrium 2

Resolve scientific taxa

Now the list of taxa looks reasonable. Extraneous characters have been removed, occurences of similarly spelled taxa have been harmonized, and non-taxa names have been removed. Send the list of taxa to resolve_sci_taxa, along with a preferred list of authorities to search, and successful hits will return the accepted scientific spelling, taxonomic serial number, and taxonomic rank. resolve_sci_taxa will give preference to the ordering of the taxonomic authorites input to the function. View the list of authorities supported by resolve_sci_taxa with view_taxa_authorities

# Supported authorities are listed in the column titled resolve_sci_taxa
view_taxa_authorities()
Authorities supported by resolve_sci_taxa and resolve_comm_taxa
id authority resolve_sci_taxa resolve_comm_taxa
3 Integrated Taxonomic Information System (ITIS) supported supported
9 World Register of Marine Species (WORMS) supported not supported
11 Global Biodiversity Information Facility (GBIF) supported not supported
165 Tropicos - Missouri Botanical Garden supported not supported

The authorities ITIS and WORMS will be used.

# Resolve taxa using ITIS and WORMS
output <- resolve_sci_taxa(path = my_path, data.sources = c(3,9))
Output from resolve_sci_taxa call
taxa_raw taxa_trimmed taxa_replacement taxa_removed taxa_clean rank authority authority_id score difference
Mosses NA NA NA NA NA NA
Unsorted biomass TRUE NA NA NA NA NA NA
Achillea millefolium(lanulosa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Crepis tectorum NA NA NA NA NA NA
Cyperus sp. Cyperus NA NA NA NA NA NA
Euphorbia glyptosperma NA NA NA NA NA NA
Lespedeza capitata NA NA NA NA NA NA
Achillea millefolium(lanulosaaaa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosabb) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosacc) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Lepidium densiflorum NA NA NA NA NA NA
Poa pratensis NA NA NA NA NA NA
Rumex acetosella NA NA NA NA NA NA
Schizachyrium scoparium NA NA NA NA NA NA
Bouteloua gracilis NA NA NA NA NA NA
Koeleria cristata NA NA NA NA NA NA
Liatris aspera NA NA NA NA NA NA
Lupinus perennis NA NA NA NA NA NA
Panicum virgatum NA NA NA NA NA NA
Petalostemum purpureum NA NA NA NA NA NA
Petalostemum villosum NA NA NA NA NA NA
Solidago nemoralis NA NA NA NA NA NA
Sorghastrum nutans NA NA NA NA NA NA
Stipa spartea NA NA NA NA NA NA
Andropogon gerardi NA NA NA NA NA NA
Bouteloua curtipendula NA NA NA NA NA NA
Coreopsis palmata NA NA NA NA NA NA
Miscellaneous litter TRUE NA NA NA NA NA NA
Petalostemum candidum NA NA NA NA NA NA
Solidago rigida NA NA NA NA NA NA
Sporobolus cryptandrus NA NA NA NA NA NA
Agropyron repens NA NA NA NA NA NA
Aristida basiramea NA NA NA NA NA NA
Digitaria sp. Digitaria NA NA NA NA NA NA
Erigeron canadensis NA NA NA NA NA NA
Hedeoma hispida NA NA NA NA NA NA
Physalis virginiana NA NA NA NA NA NA
Taraxicum officinalis NA NA NA NA NA NA
Agrostis scabra NA NA NA NA NA NA
Ambrosia artemisiifolia elatior NA NA NA NA NA NA
Eragrostis spectabilis NA NA NA NA NA NA
Amorpha canescens NA NA NA NA NA NA
-9999 TRUE NA NA NA NA NA NA
Yellow Perch NA NA NA NA NA NA
Rainbow smelt NA NA NA NA NA NA
Large mouth bass NA NA NA NA NA NA
Petalostemum S.p. Petalostemum NA NA NA NA NA NA
Poa Cf. Poa NA NA NA NA NA NA
Schizachyrium spp. Schizachyrium NA NA NA NA NA NA
Petalostemum NA NA NA NA NA NA
Poa cf… Poa NA NA NA NA NA NA
Schizachyrium sPp Schizachyrium NA NA NA NA NA NA
_Koeleria_cristata Koeleria cristata NA NA NA NA NA NA
Lespedeza_capitata Lespedeza capitata NA NA NA NA NA NA
Liatris_aspera Liatris aspera NA NA NA NA NA NA
TRUE NA NA NA NA NA NA
Oncorhynchus tshawytscha NA NA NA NA NA NA
Oncorhynchus gorbuscha NA NA NA NA NA NA
Oncorhynchus kisutch NA NA NA NA NA NA

The taxa that could be resolved to ITIS and WORMS were logged to taxa_map.csv, along with their taxonomic serial numbers and taxonomic ranks.

Resolve common taxa

Some of the taxa that couldn’t be resolved by resolve_sci_taxa is because their common names were listed. Use resolve_comm_taxa to attempt resolution of these common names to an authority. resolve_comm_taxa is similar to resolve_sci_taxa in that it requires a preferred list of authorities to search against. Select authorities supported by resolve_comm_taxa.

# View the list of authorities supported by resolve_comm_taxa
view_taxa_authorities()
Authorities supported by the resolve_sci_taxa and resolve_comm_taxa
id authority resolve_sci_taxa resolve_comm_taxa
3 Integrated Taxonomic Information System (ITIS) supported supported
9 World Register of Marine Species (WORMS) supported not supported
11 Global Biodiversity Information Facility (GBIF) supported not supported
165 Tropicos - Missouri Botanical Garden supported not supported
# Resolve common using ITIS
output <- resolve_comm_taxa(path = my_path, data.sources = 3)
Output from resolve_comm_taxa call
taxa_raw taxa_trimmed taxa_replacement taxa_removed taxa_clean rank authority authority_id score difference
Mosses NA NA Common ITIS NA NA NA
Unsorted biomass TRUE NA NA NA
Achillea millefolium(lanulosa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Crepis tectorum NA NA Common ITIS NA NA NA
Cyperus sp. Cyperus NA NA Common ITIS NA NA NA
Euphorbia glyptosperma NA NA Common ITIS NA NA NA
Lespedeza capitata NA NA Common ITIS NA NA NA
Achillea millefolium(lanulosaaaa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosabb) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosacc) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Lepidium densiflorum NA NA Common ITIS NA NA NA
Poa pratensis NA NA Common ITIS NA NA NA
Rumex acetosella NA NA Common ITIS NA NA NA
Schizachyrium scoparium NA NA Common ITIS NA NA NA
Bouteloua gracilis NA NA Common ITIS NA NA NA
Koeleria cristata NA NA Common ITIS NA NA NA
Liatris aspera NA NA Common ITIS NA NA NA
Lupinus perennis NA NA Common ITIS NA NA NA
Panicum virgatum NA NA Common ITIS NA NA NA
Petalostemum purpureum NA NA Common ITIS NA NA NA
Petalostemum villosum NA NA Common ITIS NA NA NA
Solidago nemoralis NA NA Common ITIS NA NA NA
Sorghastrum nutans NA NA Common ITIS NA NA NA
Stipa spartea NA NA Common ITIS NA NA NA
Andropogon gerardi NA NA Common ITIS NA NA NA
Bouteloua curtipendula NA NA Common ITIS NA NA NA
Coreopsis palmata NA NA Common ITIS NA NA NA
Miscellaneous litter TRUE NA NA NA
Petalostemum candidum NA NA Common ITIS NA NA NA
Solidago rigida NA NA Common ITIS NA NA NA
Sporobolus cryptandrus NA NA Common ITIS NA NA NA
Agropyron repens NA NA Common ITIS NA NA NA
Aristida basiramea NA NA Common ITIS NA NA NA
Digitaria sp. Digitaria NA NA Common ITIS NA NA NA
Erigeron canadensis NA NA Common ITIS NA NA NA
Hedeoma hispida NA NA Common ITIS NA NA NA
Physalis virginiana NA NA Common ITIS NA NA NA
Taraxicum officinalis NA NA Common ITIS NA NA NA
Agrostis scabra NA NA Common ITIS NA NA NA
Ambrosia artemisiifolia elatior NA NA Common ITIS NA NA NA
Eragrostis spectabilis NA NA Common ITIS NA NA NA
Amorpha canescens NA NA Common ITIS NA NA NA
-9999 TRUE NA NA NA
Yellow Perch NA NA Common ITIS NA NA NA
Rainbow smelt NA NA Common ITIS NA NA NA
Large mouth bass NA NA Common ITIS NA NA NA
Petalostemum S.p. Petalostemum NA NA Common ITIS NA NA NA
Poa Cf. Poa NA NA Common ITIS NA NA NA
Schizachyrium spp. Schizachyrium NA NA Common ITIS NA NA NA
Petalostemum NA NA Common ITIS NA NA NA
Poa cf… Poa NA NA Common ITIS NA NA NA
Schizachyrium sPp Schizachyrium NA NA Common ITIS NA NA NA
_Koeleria_cristata Koeleria cristata NA NA Common ITIS NA NA NA
Lespedeza_capitata Lespedeza capitata NA NA Common ITIS NA NA NA
Liatris_aspera Liatris aspera NA NA Common ITIS NA NA NA
TRUE NA NA NA
Oncorhynchus tshawytscha NA NA Common ITIS NA NA NA
Oncorhynchus gorbuscha NA NA Common ITIS NA NA NA
Oncorhynchus kisutch NA NA Common ITIS NA NA NA

Taxa map overview

Throughout the cleaning process, results have been logged to taxa_map.csv facilitating understanding of the changes to the raw taxa list. The taxa map will be used to create a revision of the raw taxa list, but first an explanation of the columns of this file is warranted. Information about taxa_map.csv can also be found in the documentation for create_taxa_map (i.e. ?create_taxa_map). The taxa map has 10 columns:

  • taxa_raw - The unique taxa extracted from the raw data by create_taxa_map.
  • taxa_trimmed - Taxa that were operated on by trim_taxa, with the resultant name listed.
  • taxa_replacement - Taxa that were operated on by replace_taxa, with the resultant replacement listed.
  • taxa_removed - Taxa that were operated on by remove_taxa. TRUE if taxa was removed, NA otherwise.
  • taxa_clean - Taxa that were successfully resolved to an authority by resolve_sci_taxa or resolve_comm_taxa. The listed taxa spelling is accepted by the matched authority.
  • rank - Taxonomic rank determined by resolve_sci_taxa or a value of Common if resolved by resolve_comm_taxa.
  • authority - Taxonomic authority resolved to by resolve_sci_taxa or resolve_comm_taxa.
  • authority_id - Identification number/value of the resolved taxon in the taxonomic authority listed under authority.
  • score - A value given by some authorities as to how well the raw taxon matched the resolved taxon. See the authority for more information.
  • difference - TRUE if there is a difference between taxa_raw and taxa_clean.

Below is the taxa_map.csv for the cleaning procedures implemented on the test data. Some noteworthy features of this map:

  • taxa_trimmed, taxa_replacement, and taxa_removed contain values as per the specifications listed above.
  • taxa_clean contains the accepted spelling of taxa that were able to be resolved to a taxonomic authority. Not all taxa were resolved. In these instances a manual search of an authority is recommended. Often a manual search will reveal additional information that will aid in the resolving of a taxon. Once resolved. The fields taxa_clean, rank, authority, and authority_id can be manually updated.
  • rank, authority, and authority_id contain values if the taxon was resolved, and contains NA otherwise.
taxa_map.csv after all the cleaning procedures have been applied.
taxa_raw taxa_trimmed taxa_replacement taxa_removed taxa_clean rank authority authority_id score difference
Mosses NA Common ITIS NA NA NA
Unsorted biomass TRUE NA NA NA
Achillea millefolium(lanulosa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Crepis tectorum NA Common ITIS NA NA NA
Cyperus sp. Cyperus NA Common ITIS NA NA NA
Euphorbia glyptosperma NA Common ITIS NA NA NA
Lespedeza capitata NA Common ITIS NA NA NA
Achillea millefolium(lanulosaaaa) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosabb) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Achillea millefolium(lanulosacc) Achillea millefolium NA Achillea millefolium Species ITIS 35423 0.988 NA
Lepidium densiflorum NA Common ITIS NA NA NA
Poa pratensis NA Common ITIS NA NA NA
Rumex acetosella NA Common ITIS NA NA NA
Schizachyrium scoparium NA Common ITIS NA NA NA
Bouteloua gracilis NA Common ITIS NA NA NA
Koeleria cristata NA Common ITIS NA NA NA
Liatris aspera NA Common ITIS NA NA NA
Lupinus perennis NA Common ITIS NA NA NA
Panicum virgatum NA Common ITIS NA NA NA
Petalostemum purpureum NA Common ITIS NA NA NA
Petalostemum villosum NA Common ITIS NA NA NA
Solidago nemoralis NA Common ITIS NA NA NA
Sorghastrum nutans NA Common ITIS NA NA NA
Stipa spartea NA Common ITIS NA NA NA
Andropogon gerardi NA Common ITIS NA NA NA
Bouteloua curtipendula NA Common ITIS NA NA NA
Coreopsis palmata NA Common ITIS NA NA NA
Miscellaneous litter TRUE NA NA NA
Petalostemum candidum NA Common ITIS NA NA NA
Solidago rigida NA Common ITIS NA NA NA
Sporobolus cryptandrus NA Common ITIS NA NA NA
Agropyron repens NA Common ITIS NA NA NA
Aristida basiramea NA Common ITIS NA NA NA
Digitaria sp. Digitaria NA Common ITIS NA NA NA
Erigeron canadensis NA Common ITIS NA NA NA
Hedeoma hispida NA Common ITIS NA NA NA
Physalis virginiana NA Common ITIS NA NA NA
Taraxicum officinalis NA Common ITIS NA NA NA
Agrostis scabra NA Common ITIS NA NA NA
Ambrosia artemisiifolia elatior NA Common ITIS NA NA NA
Eragrostis spectabilis NA Common ITIS NA NA NA
Amorpha canescens NA Common ITIS NA NA NA
-9999 TRUE NA NA NA
Yellow Perch NA Common ITIS NA NA NA
Rainbow smelt NA Common ITIS NA NA NA
Large mouth bass NA Common ITIS NA NA NA
Petalostemum S.p. Petalostemum NA Common ITIS NA NA NA
Poa Cf. Poa NA Common ITIS NA NA NA
Schizachyrium spp. Schizachyrium NA Common ITIS NA NA NA
Petalostemum NA Common ITIS NA NA NA
Poa cf… Poa NA Common ITIS NA NA NA
Schizachyrium sPp Schizachyrium NA Common ITIS NA NA NA
_Koeleria_cristata Koeleria cristata NA Common ITIS NA NA NA
Lespedeza_capitata Lespedeza capitata NA Common ITIS NA NA NA
Liatris_aspera Liatris aspera NA Common ITIS NA NA NA
TRUE NA NA NA
Oncorhynchus tshawytscha NA Common ITIS NA NA NA
Oncorhynchus gorbuscha NA Common ITIS NA NA NA
Oncorhynchus kisutch NA Common ITIS NA NA NA

Revise taxa

Now that the taxa have been cleaned, as best they can, the raw data table can be updated with the new taxonomic information. This new information is contained in 4 new columns, which have the same definitions as listed in the taxa map:

  • taxa_clean - Taxa that were successfully resolved to an authority. The listed taxa spelling is accepted by the matched authority.
  • taxa_rank - Taxonomic rank.
  • taxa_authority - Taxonomic authority that was resolved.
  • taxa_authority_id - Identification number/value of the resolved taxon in the taxonomic authority.

These 4 columns are appended to the raw data table and written to a file named “taxonomyCleanr_output”.

# Revise the raw data table and write to file
output <- revise_taxa(path = my_path, x = data, col = 'Species', sep = '\t')
A revision of the raw data table with new taxonomic data appended
Year Sample_Date Plot Heat_Treatment Species Mass taxa_clean taxa_rank taxa_authority taxa_authority_id
2007 8/22/07 29 Control Mosses 150.53 Common ITIS NA
2007 8/22/07 29 Control Unsorted biomass 150.53 NA
2007 8/24/07 29 High Achillea millefolium(lanulosa) 0.13 Achillea millefolium Species ITIS 35423
2007 8/24/07 29 High Achillea millefolium(lanulosa) 3.60 Achillea millefolium Species ITIS 35423
2007 8/24/07 29 High Achillea millefolium(lanulosa) 9.77 Achillea millefolium Species ITIS 35423
2007 8/24/07 29 High Crepis tectorum 1.43 Common ITIS NA
2007 8/24/07 29 High Cyperus sp. 0.53 Common ITIS NA
2007 8/23/07 29 High Euphorbia glyptosperma 0.05 Common ITIS NA
2007 8/24/07 29 High Lespedeza capitata 139.90 Common ITIS NA
2007 8/21/07 29 Low Achillea millefolium(lanulosa) 0.60 Achillea millefolium Species ITIS 35423
2007 8/21/07 29 Low Achillea millefolium(lanulosaaaa) 1.10 Achillea millefolium Species ITIS 35423
2007 8/21/07 29 Low Achillea millefolium(lanulosabb) 4.90 Achillea millefolium Species ITIS 35423
2007 8/21/07 29 Low Achillea millefolium(lanulosacc) 0.87 Achillea millefolium Species ITIS 35423
2007 8/21/07 29 Low Cyperus sp. 1.53 Common ITIS NA
2007 8/20/07 29 Low Lepidium densiflorum 0.05 Common ITIS NA
2007 8/21/07 29 Low Lespedeza capitata 54.67 Common ITIS NA
2007 8/21/07 29 Low Poa pratensis 0.07 Common ITIS NA
2007 8/21/07 29 Low Rumex acetosella 0.33 Common ITIS NA
2007 8/21/07 29 Low Schizachyrium scoparium 0.70 Common ITIS NA
2007 8/21/07 64 Control Unsorted biomass 202.55 NA
2007 8/20/07 64 High Bouteloua gracilis 12.20 Common ITIS NA
2007 8/20/07 64 High Cyperus sp. 0.23 Common ITIS NA
2007 8/20/07 64 High Koeleria cristata 0.87 Common ITIS NA
2007 8/20/07 64 High Lespedeza capitata 52.83 Common ITIS NA
2007 8/20/07 64 High Liatris aspera 2.80 Common ITIS NA
2007 8/20/07 64 High Lupinus perennis 50.90 Common ITIS NA
2007 8/20/07 64 High Panicum virgatum 0.43 Common ITIS NA
2007 8/20/07 64 High Petalostemum purpureum 9.03 Common ITIS NA
2007 8/20/07 64 High Petalostemum villosum 15.13 Common ITIS NA
2007 8/20/07 64 High Poa pratensis 0.23 Common ITIS NA
2007 8/20/07 64 High Schizachyrium scoparium 23.60 Common ITIS NA
2007 8/20/07 64 High Solidago nemoralis 32.10 Common ITIS NA
2007 8/20/07 64 High Sorghastrum nutans 5.53 Common ITIS NA
2007 8/20/07 64 High Stipa spartea 4.00 Common ITIS NA
2007 8/24/07 64 Low Achillea millefolium(lanulosa) 0.83 Achillea millefolium Species ITIS 35423
2007 8/24/07 64 Low Andropogon gerardi 13.93 Common ITIS NA
2007 8/24/07 64 Low Bouteloua curtipendula 0.20 Common ITIS NA
2007 8/24/07 64 Low Bouteloua gracilis 27.10 Common ITIS NA
2007 8/24/07 64 Low Coreopsis palmata 4.40 Common ITIS NA
2007 8/24/07 64 Low Koeleria cristata 0.83 Common ITIS NA
2007 8/24/07 64 Low Lespedeza capitata 44.10 Common ITIS NA
2007 8/24/07 64 Low Liatris aspera 14.00 Common ITIS NA
2007 8/24/07 64 Low Lupinus perennis 27.87 Common ITIS NA
2007 8/24/07 64 Low Miscellaneous litter 1.87 NA
2007 8/24/07 64 Low Petalostemum candidum 3.00 Common ITIS NA
2007 8/24/07 64 Low Petalostemum purpureum 9.33 Common ITIS NA
2007 8/24/07 64 Low Schizachyrium scoparium 10.97 Common ITIS NA
2007 8/24/07 64 Low Solidago rigida 39.57 Common ITIS NA
2007 8/24/07 64 Low Sporobolus cryptandrus 0.60 Common ITIS NA
2007 8/20/07 69 Control Unsorted biomass 58.85 NA
2007 8/20/07 69 High Achillea millefolium(lanulosa) 3.53 Achillea millefolium Species ITIS 35423
2007 8/20/07 69 High Agropyron repens 0.37 Common ITIS NA
2007 8/20/07 69 High Aristida basiramea 6.53 Common ITIS NA
2007 8/20/07 69 High Crepis tectorum 0.83 Common ITIS NA
2007 8/20/07 69 High Cyperus sp. 3.17 Common ITIS NA
2007 8/20/07 69 High Digitaria sp. 0.10 Common ITIS NA
2007 8/20/07 69 High Erigeron canadensis 8.33 Common ITIS NA
2007 8/20/07 69 High Hedeoma hispida 1.30 Common ITIS NA
2007 8/20/07 69 High Lepidium densiflorum 0.13 Common ITIS NA
2007 8/20/07 69 High Lupinus perennis 0.37 Common ITIS NA
2007 8/20/07 69 High Miscellaneous litter 0.70 NA
2007 8/20/07 69 High Physalis virginiana 0.43 Common ITIS NA
2007 8/20/07 69 High Taraxicum officinalis 0.06 Common ITIS NA
2007 8/24/07 69 Low Achillea millefolium(lanulosa) 2.97 Achillea millefolium Species ITIS 35423
2007 8/24/07 69 Low Agrostis scabra 0.47 Common ITIS NA
2007 8/24/07 69 Low Ambrosia artemisiifolia elatior 0.47 Common ITIS NA
2007 8/24/07 69 Low Aristida basiramea 9.53 Common ITIS NA
2007 8/24/07 69 Low Bouteloua gracilis 0.10 Common ITIS NA
2007 8/24/07 69 Low Crepis tectorum 1.33 Common ITIS NA
2007 8/24/07 69 Low Cyperus sp. 1.70 Common ITIS NA
2007 8/24/07 69 Low Eragrostis spectabilis 1.63 Common ITIS NA
2007 8/24/07 69 Low Erigeron canadensis 2.67 Common ITIS NA
2007 8/24/07 69 Low Hedeoma hispida 0.20 Common ITIS NA
2007 8/24/07 69 Low Lespedeza capitata 0.10 Common ITIS NA
2007 8/24/07 69 Low Miscellaneous litter 0.93 NA
2007 8/24/07 69 Low Physalis virginiana 1.97 Common ITIS NA
2007 8/24/07 69 Low Schizachyrium scoparium 0.23 Common ITIS NA
2007 8/23/07 78 Control Amorpha canescens 9.30 Common ITIS NA
2007 8/21/07 78 Control Unsorted biomass 216.13 NA
2007 8/23/07 78 High Andropogon gerardi 22.60 Common ITIS NA
2007 8/23/07 78 High Bouteloua gracilis 0.73 Common ITIS NA
2007 8/23/07 78 High Lespedeza capitata 7.03 Common ITIS NA
2007 8/23/07 78 High Lupinus perennis 20.90 Common ITIS NA
2007 8/23/07 78 High Petalostemum purpureum 0.30 Common ITIS NA
2007 8/23/07 78 High Poa pratensis 1.30 Common ITIS NA
2007 8/23/07 78 High Schizachyrium scoparium 10.87 Common ITIS NA
2007 8/23/07 78 High Solidago rigida 133.00 Common ITIS NA
2007 8/23/07 78 High Sorghastrum nutans 1.40 Common ITIS NA
2007 8/28/07 78 Low Amorpha canescens 47.80 Common ITIS NA
2007 8/28/07 78 Low Andropogon gerardi 19.10 Common ITIS NA
2007 8/28/07 78 Low Bouteloua gracilis 9.93 Common ITIS NA
2007 8/28/07 78 Low Coreopsis palmata 1.03 Common ITIS NA
2007 8/28/07 78 Low Erigeron canadensis 4.17 Common ITIS NA
2007 8/28/07 78 Low -9999 9.53 NA
2007 8/28/07 78 Low Liatris aspera 11.03 Common ITIS NA
2007 8/28/07 78 Low Lupinus perennis 10.27 Common ITIS NA
2007 8/28/07 78 Low Petalostemum purpureum 6.53 Common ITIS NA
2007 8/28/07 78 Low Poa pratensis 0.90 Common ITIS NA
2007 8/28/07 78 Low Schizachyrium scoparium 10.83 Common ITIS NA
2007 8/28/07 78 Low Solidago nemoralis 3.43 Common ITIS NA
2007 8/22/07 29 Control Mosses 150.53 Common ITIS NA
2007 8/22/07 29 Control Yellow Perch 150.53 Common ITIS NA
2007 8/22/07 29 Control Rainbow smelt 150.53 Common ITIS NA
2007 8/22/07 29 Control Large mouth bass 150.53 Common ITIS NA
2007 8/28/07 78 Low Petalostemum S.p. 6.53 Common ITIS NA
2007 8/28/07 78 Low Poa Cf. 0.90 Common ITIS NA
2007 8/28/07 78 Low Schizachyrium spp. 10.83 Common ITIS NA
2007 8/28/07 78 Low Petalostemum 6.53 Common ITIS NA
2007 8/28/07 78 Low Poa cf… 0.90 Common ITIS NA
2007 8/28/07 78 Low Schizachyrium sPp 10.83 Common ITIS NA
2007 8/24/07 64 Low _Koeleria_cristata 0.83 Common ITIS NA
2007 8/24/07 64 Low Lespedeza_capitata 44.10 Common ITIS NA
2007 8/24/07 64 Low Liatris_aspera 14.00 Common ITIS NA
2007 8/24/07 64 Low 0.83 NA
2007 8/24/07 64 Low 44.10 NA
2007 8/24/07 64 Low Oncorhynchus tshawytscha 44.10 Common ITIS NA
2007 8/24/07 64 Low Oncorhynchus gorbuscha 44.10 Common ITIS NA
2007 8/24/07 64 Low Oncorhynchus kisutch 44.10 Common ITIS NA

Make taxonomicCoverage EML

When creating EML metadata (Ecological Metadata Language), it is a good practice to include the taxonomic entities and their respective hierarchies to facilitate search and discovery.

# Create the taxonomicCoverage EML node set and write to file
output <- make_taxonomicCoverage(path = my_path, write.file = TRUE)
output <- XML::xmlTreeParse(paste0(my_path, '/taxonomicCoverage.xml'))
output$doc$children
## $eml
## <eml:eml packageId="c72a7030-fb41-4ba3-9412-e3c6c2a28076" system="uuid" schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd" xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2">
##  <taxonomicClassification>
##   <taxonRankName>kingdom</taxonRankName>
##   <taxonRankValue>Plantae</taxonRankValue>
##   <commonName>plants</commonName>
##   <taxonId provider="https://itis.gov">202422</taxonId>
##   <taxonomicClassification>
##    <taxonRankName>subkingdom</taxonRankName>
##    <taxonRankValue>Viridiplantae</taxonRankValue>
##    <commonName>green plants</commonName>
##    <taxonId provider="https://itis.gov">954898</taxonId>
##    <taxonomicClassification>
##     <taxonRankName>infrakingdom</taxonRankName>
##     <taxonRankValue>Streptophyta</taxonRankValue>
##     <commonName>land plants</commonName>
##     <taxonId provider="https://itis.gov">846494</taxonId>
##     <taxonomicClassification>
##      <taxonRankName>superdivision</taxonRankName>
##      <taxonRankValue>Embryophyta</taxonRankValue>
##      <taxonId provider="https://itis.gov">954900</taxonId>
##      <taxonomicClassification>
##       <taxonRankName>division</taxonRankName>
##       <taxonRankValue>Tracheophyta</taxonRankValue>
##       <commonName>vascular plants</commonName>
##       <commonName>tracheophytes</commonName>
##       <taxonId provider="https://itis.gov">846496</taxonId>
##       <taxonomicClassification>
##        <taxonRankName>subdivision</taxonRankName>
##        <taxonRankValue>Spermatophytina</taxonRankValue>
##        <commonName>spermatophytes</commonName>
##        <commonName>seed plants</commonName>
##        <taxonId provider="https://itis.gov">846504</taxonId>
##        <taxonomicClassification>
##         <taxonRankName>class</taxonRankName>
##         <taxonRankValue>Magnoliopsida</taxonRankValue>
##         <taxonId provider="https://itis.gov">18063</taxonId>
##         <taxonomicClassification>
##          <taxonRankName>superorder</taxonRankName>
##          <taxonRankValue>Asteranae</taxonRankValue>
##          <taxonId provider="https://itis.gov">846535</taxonId>
##          <taxonomicClassification>
##           <taxonRankName>order</taxonRankName>
##           <taxonRankValue>Asterales</taxonRankValue>
##           <taxonId provider="https://itis.gov">35419</taxonId>
##           <taxonomicClassification>
##            <taxonRankName>family</taxonRankName>
##            <taxonRankValue>Asteraceae</taxonRankValue>
##            <commonName>sunflowers</commonName>
##            <taxonId provider="https://itis.gov">35420</taxonId>
##            <taxonomicClassification>
##             <taxonRankName>genus</taxonRankName>
##             <taxonRankValue>Achillea</taxonRankValue>
##             <commonName>yarrow</commonName>
##             <taxonId provider="https://itis.gov">35422</taxonId>
##             <taxonomicClassification>
##              <taxonRankName>species</taxonRankName>
##              <taxonRankValue>Achillea millefolium</taxonRankValue>
##              <commonName>common yarrow</commonName>
##              <commonName>milenrama</commonName>
##              <commonName>milfoil</commonName>
##              <commonName>western yarrow</commonName>
##              <commonName>bloodwort</commonName>
##              <commonName>carpenter&apos;s weed</commonName>
##              <commonName>yarrow</commonName>
##              <taxonId provider="https://itis.gov">35423</taxonId>
##             </taxonomicClassification>
##            </taxonomicClassification>
##           </taxonomicClassification>
##          </taxonomicClassification>
##         </taxonomicClassification>
##        </taxonomicClassification>
##       </taxonomicClassification>
##      </taxonomicClassification>
##     </taxonomicClassification>
##    </taxonomicClassification>
##   </taxonomicClassification>
##  </taxonomicClassification>
## </eml:eml>