Welcome to TERN Knowledge Base

Manipulating AusPlots data I: Subset data frames

The 'get_ausplots' function extracts and compiles AusPlots data allowing substantial flexibility in the selection of the required data. Up to 8 different types of data can be retrieved into distinct data frames (i.e. data on sampling sites, vegetation structure, vegetation point intercept, vegetation vouchers, vegetation basal wedge, soil characterization, soil bulk density, and soil & soil metagenomics samples). In addition, data can be filtered for particular sets of plots and/or genus/species, as well as geographically using a rectangular bounding box.


However, in some situations we are only interested in a subset of the data retrieved by 'get_ausplots'. To subset ausplot data we use the variables in the retrieved data frames corresponding to the concept by we would like to filter the data. In some occasions we would sub-setting a single data frame (i.e. type of variables) is all what we need. Variables in the 'site.info' data frame contain information that affect all other data frames; so typically after sub-setting the contents of the variable of interests in the 'site.info' data frame, we will also subset the remaining datasets using one of the common variables among all data frames. Common variables among datasets include 'site_location_name', 'site_location_visit_id', and 'site_unique'.  Commonly 'site_unique' is the best option to ‘connect’ ausplots data frames, as it is the most specific variable representing a single visit to a particular site.


To subset a data frame we filter its data by querying the variable(s) of interest using operators. The variables of interest are typically factors, numerical, or boolean variables. Many variables retrieved by 'get_ausplots' have a 'char' class, despite conceptually falling in one of these 3 categories. Therefore, before using a variable to filter a data frame we must inspect its contents and class, and if required change its class to an adequate one. We use relational operators to filter individual variables, and logical (and occasionally arithmetic) operators to combine more than one variable in our filtering operations (R Operators).



EXAMPLES

Multiple examples includng various types  of sub-settng are presented below. Exaples cover sub-setting a single data frame and all data frames, as well as not requiring variable class transformation and requiring it). All examples would start by loading the 'ausplotsR' library and extracting AusPlots data using the 'get_ausplots' function. In the examples we use the `AP.data' list of data frames that contains information for all the currently available AusPlots sites. This list was previously created in the 'Obtaining AusPlots data: 'get_ausplots' function' Step-by-Step Guide (we use the list created in Example 4). 


I. SUB-SETTING A SINGLE DATA FRAME

We might be, for example, interested in point intercept data only for vegetation of a particular height, a particular growth form, growing on particular substrate type, or found in a particular set of transects. In these examples, we use the variables in the ‘veg.PI’ data frame to filter the retrieved ausplots data in this data frame. We do not need to subset any othe data frames.


Examples


Example 1: Height

Height is 'numeric', so there is no need to change its class.


Example 2: Transect

Transect is a 'factor', so there is no need to change its class.


Example 3: Growth Form

Transect is a 'character' variable, so we need to change its class to 'factor'.



II. SUB-SETTING ALL DATA FRAMES

In some occasions, we are interested on sites located at particular states or bioregions. Alternatively, we might be only interested on data obtained in sites on steep slopes and/or with a slope facing (i.e. aspect) south. In these examples, we can use the variables in the 'site.info' data frame to filter the sites we of interest. In this case, we also need to subset the data in the remaining data frames, as we are only interested in data that has been collected in sites with particular characteristics. Therefore, we the filter the other data frames by site, selecting the sites filtered out in our first sub-setting operation on the 'site.info' data frame. To do so we use one of the variables present in all data frames that contain a site identifier (i.e. using 'site_location_name', 'site_location_visit_id', or 'site_unique'; see above).


Examples


Example 1: Site Slope

Site Slope is a 'character' variable, so we need to change its class to 'numeric'.


Example 2: Site Aspect

Site Aspect is a 'character' variable, so we need to change its class to 'numeric'.


Example 3: State

State is a 'character' variable, so we need to change its class to 'factor'.


Example 4: Bioregion name

Bioregion name is a 'character' variable, so we need to change its class to 'factor'.

Provide your feedback about the experience with Knowledge base