## attributeList

This element tree is found at (XPath):
/eml:eml/dataset/dataTable/attributeList
/eml:eml/dataset/view/attributeList
/eml:eml/dataset/storedProcedure/attributeList
/eml:eml/dataset/spatialRaster/attributeList
/eml:eml/dataset/spatialVector/attributeList
/eml:eml/dataset/otherEntity/attributeList

The <attributeList> tree is required for all data types except for <otherEntity>. It describes all variables in a data entity in individual <attribute> elements. The description includes the name and definition of each attribute, its domain, definitions of coded values, and other pertinent information.

<attributeName> is typically the name of a field in a data table. This is often short and/or cryptic. It is recommended that attributeNames be suitable for use as a variable, e.g., composed of ASCII characters, and that the <attributeName>s match the column headers of a CSV or other text table.

Context: in the EDI repository, <attributeName>s must be unique within a data entity.

<attributeLabel> (optional): is used to provide a less ambiguous or less cryptic alternative identification than what is provided in <attributeName>. <attributeLabel> is likely to be used as a column or row header in an HTML display.

<attributeDefinition> gives a precise and complete definition of attribute being documented. It explains the contents of the attribute fully so that a data user can interpret the attribute accurately.

<storageType> may be system specific, as for a RDBMS, i.e., A Microsoft SQL varchar, or Oracle datetime. This field represents a ‘hint’ to processing systems as to how the attribute might be represented in a system or language, but is distinct from the actual expression of the domain of the attribute. Non system-specific values include float, integer and string.

<measurementScale> indicates the type of scale from which values are drawn for the attribute. EML’s attribute-unit model is described in detail; see “Other Resources.” One of the 5 scale types must be used: nominal, ordinal, interval, ratio, or dateTime, as follows:

##### Non-numeric types:

The <nominal> scale is used to represent named categories. Values are assigned to distinguish them from other observations. This would include a list of coded values (e.g. 1=male, 2=female), or plain text descriptions. Columns that contain strings or simple text are nominal. Example: plot1, plot2, plot3.

<ordinal> values are categories that have a logical or ordered relationship to one another, but the magnitude of the differences between the values is not defined or meaningful. Example: Low, Medium, High.

Both the nominal and ordinal scales are <nonNumericDomain> types, and can be either text or an enumerated list. The <enumeratedDomain> applies to coded values, and requires a <codeDefinition> or a referenced entity containing the code explanations. For <textDomain> an optional pattern may describe the text, e.g., a US telephone number can be described by the format “\d\d\d-\d\d\d-\d\d\d\d.”

##### Numeric types:

<interval> measurements are ordinal, but in addition, use equal-sized units on a scale between values. Because the units are equal sized, these measurements are numeric. However, the starting point is arbitrary, so a value of zero is not meaningful. For example, the Celsius temperature scale uses degrees which are equally spaced, but where zero does not represent “absolute zero” (i.e., the temperature at which molecular motion stops), and 20 C is not “twice as hot” as 10 C.

<ratio> measurements have a meaningful zero point, and ratio comparisons between values are legitimate. For example, the Kelvin scale reflects the amount of kinetic energy of a substance (i.e., zero is the point where a substance transmits no thermal energy), and so temperature measured in kelvin units is a ratio measurement. Concentration is also a ratio measurement because a solution at 10 micromolePerLiter has twice as much substance as one at 5 micromolePerLiter.

The numeric types <interval> and <ratio> scales require additional tags describing the <unit>, <numericDomain>, and<precision>.

<unit> Units should be described in correct physical units. Terms which describe data but are not units should be used in <attributeDefinition>. For example, for data describing “milligrams of Carbon per square meter,” “Carbon” belongs in the <attributeDefinition>, while the <unit> is “milligramPerMeterSquared.”

<standardUnit> and <customUnit>: Unit names must be either <standardUnit>, from the unit dictionary included with EML (http://knb.ecoinformatics.org/software/eml/eml-2.1.0/eml-unitTypeDefinitions.html#StandardUnitDictionary) or <customUnit> and defined in the <additionalMetadata>.

For general purposes, the following guidelines (from ISO recommendations) apply to <customUnits>: Units should be written out, not abbreviated. Unit modifiers, such as “squared,” should follow the unit being modified. For example, meterSquared is preferred, while squareMeter is improper. Units should be singular, such as “meter,” and not plural, such as “meters.”

Context: EDI has adopted the LTER Unit Registry and recommends that <customUnit> element be used for all units with content pulled from the Unit Registry, even when the unit is already listed in the standard unit dictionary.

<numericDomain> This tag includes elements specifying the <numberType> and the minimum and maximum allowable values of a numeric attribute. A measurement’s <numberType> should be defined as real, natural, whole or integer as explained in EML handbook: (see “Other Resources”). The <bounds> are theoretical or allowable minimum and maximum values (prescriptive), rather than the actual observed range in a data set (descriptive). The <bounds> tree is optional.

<precision> describes the number of decimal places for the attribute. Currently, EML does not allow more than one precision value for a column. For example, a column containing lengths of fish may be measured to a precision of .01 meter for one species of fish (e.g., large), and .001 meters for a different species, but all the data on “fish length” are collected into one attribute and are measured using their appropriate precision values. For these cases precision can be omitted, but the variable precision information should be described in detail in method/methodStep. Together, the information in <numericDomain> and <precision> are sufficient to decide upon an appropriate system-specific data type for representing a particular attribute. For example, an attribute with a numeric domain from 0-50,000 and a precision of 1 could be represented in the C language using a ‘long’ value, but if the precision is changed to ‘0.5’ then a ‘float’ type would be needed.

The <measurementType> element, <dateTime>, is a date-time value from the Gregorian calendar and it is recommended that these be expressed in a format that conforms to the ISO 8601 standard. An example of an allowable ISO date-time is “YYYY-MM-DD,” as in 2004-06-25, or, more fully, as “YYYY-MM-DDThh:mm:ssTZD” (eg 1997-07-16T19:20:30.45Z). The ISO standard is quite strict about the structure of date components. Since legacy data often contain non-standard dates, and existing equipment (e.g., sensors) may still be producing non-standard dates, the EML authors have provided additional allowable formats. See the EML documentation for a complete list. It is important to note that the dateTime field should not be used for recording time durations. In that case, use a unit such as seconds, nominalMinute or nominalDay, that defines the duration in terms of its relationship to SI second.

The <missingValueCode> is optional, but should be included to describe any missing value codes present in the data set (e.g. NA, NaN, ND, 9999). The missing value code is a string, not a value, which means that the content of this field must exactly match what appears in place of data values for it to be correctly interpreted. For example, if data are output with precision .01 and with missing values formatted to “-9999.00,” then the content of the <missingValueCode> element must be “-9999.00” not “-9999.”

The examples show two attribute trees. The first was generated from an SQL system with a defined storage type. The second <attributeList> includes tags for <customUnits>, with the Unit defined in the <additionalMetadata> tree.

Example 21: attributeList/attribute dataTable

<attributeList>
<attribute id="soil_chemistry.site_id">
<attributeName>site_id</attributeName>
<attributeDefinition>Site id as used in sites table</attributeDefinition>
<storageType typeSystem="http://www.w3.org/2001/XMLSchema-datatypes">string</storageType>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Site id as used in sites table</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute id="soil_chemistry.pH">
<attributeName>pH</attributeName>
<attributeDefinition>ph of soil solution</attributeDefinition>
<storageType typeSystem="http://www.w3.org/2001/XMLSchema-datatypes">float</storageType>
<measurementScale>
<ratio>
<unit>
<standardUnit>dimensionless</standardUnit>
</unit>
<precision>0.01</precision>
<numericDomain>
<numberType>real</numberType>
</numericDomain>
</ratio>
</measurementScale>
</attribute>
<attribute id="pass2001.q110">
<attributeName>q110</attributeName>
<attributeDefinition>Q110-Preference for front yard landscape</attributeDefinition>
<storageType typeSystem="http://www.w3.org/2001/XMLSchema-datatypes">float</storageType>
<measurementScale>
<ordinal>
<nonNumericDomain>
<enumeratedDomain>
<codeDefinition>
<code>1.00</code>
<definition>1-A desert landscape</definition>
</codeDefinition>
<codeDefinition>
<code>2.00</code>
<definition>2-Mostly lawn</definition>
</codeDefinition>
<codeDefinition>
<code>3.00</code>
<definition>3-Some lawn</definition>
</codeDefinition>
</enumeratedDomain>
</nonNumericDomain>
</ordinal>
</measurementScale>
</attribute>
<attribute id="att.2">
<attributeName>Year</attributeName>
<attributeDefinition>Calendar year of the observation from years 1990 - 2010</attributeDefinition>
<storageType>integer</storageType>
<measurementScale>
<dateTime>
<formatString>YYYY</formatString>
<dateTimePrecision>1</dateTimePrecision>
<dateTimeDomain>
<bounds>
<minimum exclusive="false">1993</minimum>
<maximum exclusive="false">2003</maximum>
</bounds>
</dateTimeDomain>
</dateTime>
</measurementScale>
</attribute>
<attribute id="att.7">
<attributeName>Count</attributeName>
<attributeDefinition>Number of individuals observed</attributeDefinition>
<storageType>integer</storageType>
<measurementScale>
<interval>
<unit>
<standardUnit>number</standardUnit>
</unit>
<precision>1</precision>
<numericDomain>
<numberType>whole</numberType>
<bounds>
<minimum exclusive="false">0</minimum>
</bounds>
</numericDomain>
</interval>
</measurementScale>
<missingValueCode>
<code>NaN</code>
<codeExplanation>value not recorded or invalid</codeExplanation>
</missingValueCode>
</attribute>
<attribute id="att.7">
<attributeName>cond</attributeName>
<attributeLabel>Conductivity</attributeLabel>
<attributeDefinition>measured with SeaBird Elecronics CTD-911</attributeDefinition>
<storageType>float</storageType>
<measurementScale>
<ratio>
<unit>
<customUnit>siemensPerMeter</customUnit>
</unit>
<precision>0.0001</precision>
<numericDomain>
<numberType>real</numberType>
<bounds>
<minimum exclusive="false">0</minimum>
<maximum exclusive="false">40</maximum>
</bounds>
</numericDomain>
</ratio>
</measurementScale>
</attribute>
</attributeList>

The examples below show complete entity trees for <spatialVector> and <spatialRaster> converted via XSLT (stylesheet) from Esri metadata format. For details see “Other Resources.”

Example 22: Entity and attribute information for spatialVector

<spatialVector id="Landuse for Ficity in 1955">
<entityName>Landuse for Ficity in 1955</entityName>
<entityDescription>This GIS layer represents a reconstructed
generalized landuse map for the area of current Ficity around the time
period of 1955.</entityDescription>
<physical>
<objectName>fls-20.zip</objectName>
<dataFormat>
<externallyDefinedFormat>
<formatName>Esri Shapefile (zipped)</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<onlineDescription>f1s-20 Zipped Shapefile File</onlineDescription>
</online>
</distribution>
</physical>
<attributeList id="Landuse for Ficity in 1955.attributeList">
<attribute id="Landuse for Ficity in 1955.FID">
<attributeName>FID</attributeName>
<attributeDefinition>Internal feature number.</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>
Sequential unique whole numbers that are automatically generated.
</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute id="Landuse for Ficity in 1955.Shape">
<attributeName>Shape</attributeName>
<attributeDefinition>Feature geometry.</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>Coordinates defining the features.</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute id="Landuse for Ficity in 1955.Z955">
<attributeName>Z955</attributeName>
<attributeDefinition>
This field signifies the landuse value for each polygon.
</attributeDefinition>
<storageType typeSystem="http://www.w3.org/2001/XMLSchema-datatypes">string</storageType>
<measurementScale>
<nominal>
<nonNumericDomain>
<enumeratedDomain>
<codeDefinition>
<code>Agriculture</code>
<definition>Agricultural land use</definition>
</codeDefinition>
<codeDefinition>
<code>Urban</code>
<definition>Urbanized area</definition>
</codeDefinition>
<codeDefinition>
<code>Desert</code>
<definition>Unmodified area</definition>
</codeDefinition>
<codeDefinition>
<code>Recreation</code>
<definition>Recreational land use</definition>
</codeDefinition>
</enumeratedDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
</attributeList>
<geometry>Polygon</geometry>
<geometricObjectCount>78</geometricObjectCount>
<spatialReference>
</spatialReference>
</spatialVector>

Example 23: Entity and attribute information for spatialRaster

<spatialRaster id="fi_24k">
<entityName>fi_24k</entityName>
<entityDefinition>Ficiticiou State 7.5 Minute Digital Elevation Model</entityDefinition>
<physical>
<objectName>fls-30.zip</objectName>
<dataFormat>
<externallyDefinedFormat>
<formatName>Esri binary grid</formatName>
</externallyDefinedFormat>
</dataFormat>
<distribution>
<online>
<onlineDescription>f1s-30 zipped raster data File</onlineDescription>
</online>
</distribution>
</physical>
<attributeList id="fi_24k.attributeList">
<attribute id="fi_24k.ObjectID">
<attributeName>ObjectID</attributeName>
<attributeDefinition>Internal feature number.</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<textDomain>
<definition>
Sequential unique whole numbers that are automatically generated.
</definition>
</textDomain>
</nonNumericDomain>
</nominal>
</measurementScale>
</attribute>
<attribute id="fi_24k.Cell Value">
<attributeName>Cell Value</attributeName>
<attributeDefinition>Elevation Value</attributeDefinition>
<measurementScale>
<ratio>
<unit>
<standardUnit>meter</standardUnit>
</unit>
<precision />
<numericDomain>
<numberType>integer</numberType>
<bounds>
<minimum exclusive="true">-5193.000000</minimum>
<maximum exclusive="true">14785.000000</maximum>
</bounds>
</numericDomain>
</ratio>
</measurementScale>
</attribute>
<attribute id="fi_24k.Count">
<attributeName>Count</attributeName>
<attributeDefinition>Count</attributeDefinition>
<measurementScale>
<ratio>
<unit>
<standardUnit>number</standardUnit>
</unit>
<precision />
<numericDomain>
<numberType>whole</numberType>
</numericDomain>
</ratio>
</measurementScale>
</attribute>
</attributeList>
<spatialReference>
</spatialReference>
<horizontalAccuracy>not available</horizontalAccuracy>
<verticalAccuracy>not available</verticalAccuracy>
<cellSizeXDirection>30.0</cellSizeXDirection>
<cellSizeYDirection>30.0</cellSizeYDirection>
<numberOfBands>1</numberOfBands>
<rasterOrigin>Upper Left</rasterOrigin>
<rows>21092</rows>
<columns>18136</columns>
<verticals>1</verticals>
<cellGeometry>matrix</cellGeometry>
</spatialRaster>

The <otherEntity> data type includes the free text <entityType> element for naming the type of the entity. The otherEntity/physical/dataFormat/externallyDefinedFormat/formatName element stores the file format. While there is no controlled vocabulary for the content of these elements, format names can be drawn from DataONE’s objectFormaList. Table 3 provides suggestions for some common other entity formats.

Table 3. Entity types and format names for some <otherEntity> types.

Common Name Entity Type Format Name
R script script R programming language script
R markdown script R Markdown file
PHP script script application/php
JPEG image photograph JPEG
PDF document document Portable Document Format