Questions and Errors in the Ausplots Rangelands Dataset

Questions

 


QUESTION 1

There are a lot of sites where PLOT_DIMENSIONS are not explicitly recorded (values are NULL). This should not be a problem because we have PLOT_IS_100M_BY_100M field, and for most of them, the value is TRUE, so they actually are 100x100. But there are some cases where PLOT_DIMENSIONS is NULL and PLOT_IS_100M_BY_100M is FALSE, so we don’t know what is the real size of the plot.

Do you have any more information about those sites? 

I have used this query: SELECT * FROM site_location WHERE "PLOT_DIMENSIONS" IS NULL AND "PLOT_IS_100M_BY_100M" IN (false)

ANSWER

Email Emrys (12/05/2020): “I have fixed the issues with question 2 in SWARM.”

Ben 15/5/20: “All of our plots are designed where ever possible to be 100x100 oriented to the grid (Will explain below regarding coords), but on occasions it is not possible to fit a 100 x 100 plot within a homogeneous patch of vegetation in the field, and so (Rarely) plot dimensions can be changed to eg. 200 x 50m (They always need to be 10,000M^2). This field is designed to quickly filter out those plots that “are not normal” in regards their dimensions. It looks from the comment above that Emrys has now populated the critical info!”

ACTION / FIX

No action needed, in future ingestions data should be fixed. PENDING FUTURE REVIEW


QUESTION 2

We have noticed that PHOTO_STORAGE_LOCATION field in ‘photopoints’ table, from sometime in 2015, is always NULL. In all previous rows it has the value “

All photos are stored at the TERN AusPlots facility, Level 12 Schultz building, Adelaide University North Terrace Campus.  Contact christina.pahl@adelaide.edu.au for access requests.

We think that even if the value is NULL, photos are still stored in TERN AusPlots facility, and we should show that message as well. Is this correct?

ANSWER

Ben 15/5/20: “Presumably an old question as i think you’re now working on getting the photos into the new system…. But those contact details are old (by about three years) so best to update if necessary to put contact details.”

ACTION / FIX

PENDING Link to the actual photo file or ask for update the contact details in DB.


QUESTION 3

There is a column for a single latitude and longitude value per site_location row. There is also latitude and longitude values from the site_location_point table as app_lat and app_long. What is the difference? Is there a difference? Can the latitude and longitude value be treated the same as the values in the site_location_point table?

ANSWER

Emrys: “They are and should be the same. Should try and provide both as some users may prefer to work in the UTM coordinate system over the latitude and longitude. site_location lat/long is always a south-west point from the site_location_point. Can probably drop UTM coordinates but need to double check.“

Ben 15/5/20: “A and B - in reference to app_lat and App_long - Given the ease of making the tablet collect the location of the plot whilst collecting the data at the site we chose to collect that data as an ultimate back up - If site name was screwed up, or DGPS data corrupt we would still have an approximate location of the plot ( It’s only GPS not DGPS, and i don’t remember exactly what action in the app triggers it so it would not always be collected from the same location within the plot….. It is for internal use only and should not be exposed to the public”

ACTION / FIX

We are using the SW point as location of the site, we don’t use these coordinates.

QUESTION 3

There are some cases where app_lat and app_long is not the same as the SW site_location_point:

1 SELECT s.SITE_LOCATION_ID, app_lat, app_long, latitude, longitude FROM SITE_LOCATION_POINT p RIGHT JOIN SITE_LOCATION s ON p.SITE_LOCATION_ID = s.SITE_LOCATION_ID WHERE p.point='SW' AND (s.app_long is not null) AND (s.app_lat is not null) AND (s.app_long<>'No Location Data') AND (s.app_lat <>'No Location Data') AND (s.app_long<>'NC') AND (s.app_lat <>'NC') AND (round(p.latitude,3) <> round(s.app_lat,3)) AND (round(p.longitude,3) <> round(s.app_long,3))

ANSWER

(Emrys): Ignore the APP Lat Long- Do not use it to give you the site coordinates. It is really just a backup if everything else turns to shit

ACTION / FIX

We are using the SW point as location of the site, we don’t use these coordinates.


QUESTION 4

The tables CHARACTERISATION_COARSE_FRAGS_LITHOLOGY and CHARACTERISATION_COARSE_FRAGS_shape, are supposed to be related 1 to 1 with the soil_characterisation sample, but in a few cases this is not true. Probably it is some kind of error introducing the data.

1 2 3 4 select SOIL_CHARACTERISATION_ID , count(*) as c from CHARACTERISATION_COARSE_FRAGS_LITHOLOGY group by SOIL_CHARACTERISATION_ID order by c desc select SOIL_CHARACTERISATION_ID , count(*) as c from CHARACTERISATION_COARSE_FRAGS_shape group by SOIL_CHARACTERISATION_ID order by c desc select SOIL_CHARACTERISATION_ID , count(*) as c from CHARACTERISATION_COARSE_FRAGS_size group by SOIL_CHARACTERISATION_ID order by c desc select SOIL_CHARACTERISATION_ID , count(*) as c from CHARACTERISATION_COARSE_FRAGS_abundance group by SOIL_CHARACTERISATION_ID order by c desc

ANSWER

(Emrys): “I just had a look to work out what was going on here. It seems this is collected in the soil data sheet but doesn’t appear to be going into the database, at least not anywhere I can see it.”

Ben 15/5/20 - “Andrew may be able to re-run our import scripts for this data - Worth running this Question past him!”

ACTION / FIX

Ask them for fix the data. PENDING


QUESTION 5

There are 3 site visits with 6 bulk density samples, apparently duplicated rows:

1 2 3 4 5 6 7 8 select SITE_LOCATION_VISIT_ID, count(*) as count from SOIL_BULK_DENSITY group by SITE_LOCATION_VISIT_ID order by count desc SITE_LOCATION_VISIT_ID COUNT 57617 6 57087 6 58009 6 select * from SOIL_BULK_DENSITY WHERE SITE_LOCATION_VISIT_ID = 57617

ANSWER

(Emrys): “As you point out, rows that have somehow been duplicated. We will work on getting rid of the spurious ones from our end”
Ben 15/5/20: “Seemingly Emrys has resolved!”

ACTION / FIX

Nothing in our side. Review data has been fixed in source. PENDING FUTURE REVIEW


QUESTION 6

Do we want all of the points in the site_location_point table for a site_location_id?

ANSWER

Yes, the points help create the plot bounds.

Ben 15/5/20: “Just to add to the previous comment it is important to keep and make available all of the site location info - Including the actual coordinates for the 4 plot corners collected with the dgps. Given it is for monitoring it is essential to sample exactly the same plot through time. These define the plot (incase any of our plot markers in the field are missing, and are commonly used by remote sensors to “cookie cut” imagery that they are able to directly relate to our field data!”

ACTION / FIX

We are storing SW as site location coordinates, 4 corners as a Polygon and linking all samples and observations to the place they were taken (using the rest of the points, if apply).


QUESTION 7

With CORVEG, we only saw 1:1 relationships between an observation collection and an observation. AusPlots Rangelands has 1:M relationship between an observation collection and observation. When we present the integrated data to users in its tabular form, how should it be presented? So far each “theme” is a table of data, and to represent this data accurately, we would have to have multiple tables for a theme.

This questions is more for the data team to review once all the data has been integrated. We will have to review this question to decide how to present the data to the users.

ANSWER

Ben 15/5/20: “Given the complexity of parts of our data collection ( Particularly the different intercepts through the vertical projection of each point intercept) it was not possible to avoid one to many relationships, but Andrew and i were clear that it was necessary and why it was done when we designed and built the database - When you have that discussion as a team you might find it useful to include both Andrew and Myself!”

ACTION / FIX

All parameters of a theme will be offered to the user as a table (if no additional filter are applied)


QUESTION 8

In the Plot Ontology, it only has properties to capture latitude and longitude, and not properties from the UTM system. Do we extend the model, or do we convert all UTM coordinates to latitude and longitude?

This question is for the data team to decide whether we extend the information model or not. We can always keep it simple and only support latitude and longitude points since it can be converted to UTM.

ANSWER

Ben 15/5/20: “We purposely designed the plot locations to be laid out using the UTM…. That meant that if anything went wrong with the dgps then we could lay out the plot (More roughly) with a hand held notebook and a hand held GPS….. Tale SW corner coord, Add 100 to easting to get SE, 100 to northing to get NW, and 100 to both to get NE. Delivery is probably an issue for you guys to discuss, but it would be nice if a user wants to use UTM that they get the same coords as we collected rather than some slight error induced from the conversion from UTM to LAT/Long and then back to UTM!”

ACTION / FIX

REVIEW Should we include as well the UTM in the model?


QUESTION 9

In table herbarium_determination, what is state_accession_number?

ANSWER

Ben 15/05/20: “When we get the herbaria to ID specimens we give them the option of keeping specimens that they think will make a valuable contribution to their collection on the understanding that they will access them into their collection - I.e. make them fully included into their Library and databasing etc. ( This cuts down the number of samples that we need to keep and gives the samples wider usage) - The accession number is essentially their primary key in their database - A unique number for every sample. it’s essential that this stays with our data and is easily publicly accessible so that anyone needing access to the sample can easily find it and arrange access.”

ACTION / FIX

Already storing the State Accession Number in the tern-ssn:StateIdentification as dwct:identificationID


QUESTION 10

The Plot Ontology only supports recording location data at the site level. In AusPlots Rangelands, there are location data at the site level and sample level. Example tables: soil_subsite_observations.

ANSWER

The Plot Ontology supports sites and sub-sites. Each sub-site can then have a relationship to a Location instance, which holds the location data.

Ben 17/05/20: “Looks like you’ve solved this - I’ll also skip the two crossed out questions”

Updated (05/06/2020): Specific location of samples are recorded as plot:SamplingPoint, which has the actual coordinates.

ACTION / FIX

Modelled as plot:SamplingPoint


QUESTION

The table species_growth_form is missing.

ANSWER

This is a data team problem. Mosheh will enable this in the “datarunner” script.

ACTION / FIX

Solved


QUESTION

The table lut_structure_size is missing.

ANSWER

This is a data team problem. Mosheh will enable this in the “datarunner” script.

ACTION / FIX

Solved


QUESTION 11

Leaf area index data missing. Relationship to the stratum is also missing. Is it possible to provide this information? Capture this as an observation with result as a boolean.

ANSWER

The information is stored in Adelaide. Potentially in the future, the schema will change to accommodate for the information in one of the tables, or in a new table.


Ben 17/05/20: “Currently this data is not available in the DB. Tern currently having a review of LAI and if we should continue to collect it… If yes then we should get it into the DB, if no, probably best to just deliver the data as an opaque blob. The path forward should be known in the next month or so.”

ACTION / FIX

PENDING FUTURE REVIEW Review in late June/July 2020 if LAI data will be continue collected.


QUESTION 12

Is AusPlots Rangelands released annually? What is the release schedule?

ANSWER

Ben 17/05/20: “The old AEKOS used to set up Specified ingests. We’re pretty convinced that that is not the best past forward. We would prefer you guys to interact with our data via the API so that users can access the most up to data data. It is important though if things happen this way that only sites with the “o.k. to publish” flag set to “yes” or “true” (Sorry i cant remember the exact format of the field in the DB) are made publicly available. Other data (that flag not set) is in various states of upload / curation / Waiting for herbarium id’s etc….. There is probably potential to publish some of this data in future, but we probably need to have a longer discussion of the implications / benefits of doing so…. It would be nice to have a “collected, but data not yet available” category so that we can fairly rapidly make the locations of our sample sites known!”

ACTION / FIX

Currently we getting rid of all the data related to sites with “o.k. to publish”=FALSE.
(05/06/2020) Currently Guru/Mosheh are in conversations with Tok in the best solution to get the data directly from the source. PENDING FUTURE REVIEW


QUESTION 13

Where is AusPlots Forests? Does it have a database?

ANSWER

Ben 17/05/20: “This was initially ingested into Aekos despite my recommendation not to bother,,,, At this stage it is essentially a legacy dataset with no ability for ongoing monitoring - It is completely separate to APR and is probably a lower priority. It is a lot of work for 48 plots!”

ACTION / FIX

Is is not been ingested as part of Ausplots Reangelads dataset.


QUESTION 14

Where do you source the species names from?

ANSWER

Species name are mostly sourced from the herbarium for each state. For some, it is faster and more efficient to determine by AusPlot staff - this is true for some stuff south or west of NSW where the NSW herbarium are not familiar with the species.

Ben 17/05/20: “The answer you already have is mostly right - They are indeed provided by the state and territory herbaria, with details of the id’er and accessions if relevant stored. Any work on species names etc. should be discussed with Andrew T as we are likely to make a few changes in this part of our Database to make it easier to search and rapidly exclude incomplete ids are the species or genus level. At this stage it is important that the herbarium species id is the species name used - The field name (often recorded in the form of a scientific name, but not always) is just a field name by staff while collecting in the field - It is superseded (but not replaced) by an official herbarium determination.“

ACTION / FIX

Nothing to fix


QUESTION 15

In the table point_intercept, what is the in_canopy_sky field?

ANSWER

From within a cover, can the observer look up and see the sky.

Ben 17/05/20: “The answer that you have is essentially correct, but it is a little more nuanced than that. There are two widely used measures of vegetation cover. “Opaque canopy cover” records the cover of vegetation canopy across a site. “Foliage Projected Cover” records the cover of foliage across a site. The difference being the first essentially makes a boundary around the outside of the tree canopy and calls everything inside that boundary as cover. The second FPC only classes areas inside that boundary as cover if the densiometer actually intersects with foliage. If it is within that area, but the densiometer does not intersect vegetation then it is not classed as cover. Our solution to the user being able to use either form of cover as they desire is to create the category “in Canopy Sky”. So we have three categories. No intercept = When the user sees sky through the densiometer and that is outside of any canopy boundary. a “Hit” or “Intercept” is recorded when the densiometer intercept with foliage. the In canopy Sky category is where the densiometer does not intercept with foliage, but is within the boundary of that canopy - Hence “in Canopy Sky”. If users are after Opaque canopy cover they include any In Canopy sky record with actual hits to determine cover. if the user wants FPC then they exclude in canopy sky from their cover calculations!”

ACTION / FIX

Nothing to fix


QUESTION 16

In soil_bulk_density table, field sample_id defines the depth where the sample was taken ([0-0.1],[0.1,0.2],[0.2,0.3]). As the field is a Free Text Field, we can see multiple ways to fill that information (even showing metres and centrimetres, etc.)

We can also see 3 site visits whose sample_id are (1,2,3) instead of (0,1,2), the most common pattern. This would crash any attempt to automatise the cleaning and reformatting of the field.

If would be very helpful to clean this field in the source and maybe create some controls to validate what is introduced in DB.

ANSWER

Email Emrys (Fri 5/06/2020): “Yes, the sample Id field for bulk densities is a mess. We will figure out the best way to homogenise this field and get back to you.”

ACTION / FIX

Until data is fixed in the source, we will clean and format the field in the ETL. PENDING FUTURE REVIEW


QUESTION 17

In soil_characterisation table, there are 3 site_visits with incomplete sample depth information:

Not sure if you could fix those 4 values.

1 select * from soil_characterisation where "SITE_LOCATION_VISIT_ID" IN (53641,53665,53680)

ANSWER

PENDING

ACTION / FIX

PENDING


QUESTION 18

We have identified one site_visit where there is no “soil_observation_type” defined (Not Collected in DB) but there are 4 soil characterisation samples related to the site_visit which we don’t know the soil_observation_type.
We understand that NC (Not Collected) means that no soil characterisation samples where taken in the site_visit so there shouldn't be any samples linked to it. Is that correct?

Query:

1 select * from soil_characterisation sc join site_location_visit slv on sc."SITE_LOCATION_VISIT_ID" = slv."SITE_LOCATION_VISIT_ID" where "SOIL_OBSERVATION_TYPE" = 'NC'

ANSWER

PENDING

ACTION / FIX

PENDING


QUESTION

In table site_location_point, what is the field threedcq?

ANSWER

The accuracy of the location recorded. what is the measurement? - not used - will be replaced in the future. It’s to say to within a metre, how accurate the recording is. New data mostly are entered as latitude and longitude from the DGPS on the tablets.

ACTION / FIX

Nothing to fix


QUESTION

What are the pit markers in the site_location_visit table?

ANSWER

Pit markers mark the soil pit outside the main plot site. Usually within 5 metres of the main plot site. The reason it is outside the main plot is because they don’t want to make a big whole in the main plot site.

ACTION / FIX

Nothing to fix


QUESTION

What part of the site is the soil_characterisation table recording?

ANSWER

This table contains information about the soil pit that is outside the main plot site. It contains soil chemistry data about the main plot site.

ACTION / FIX

Nothing to fix


QUESTION

What is the soil_subsite_observations table?

ANSWER

The sub-sites of the main plot site. The actual samples are collected in this sub-site. In the future, soil chemistry data may be recorded in these tables, but for now, they are recorded in soil_characterisation table.

ACTION / FIX

Nothing to fix


QUESTION

What is r_basechemicalanalysis table in AEKOS? There is no schema in AusPlots' source database.

ANSWER

CSV added to the database during the datatransfer runner execution. Not used.

ACTION / FIX

Nothing to fix


QUESTION

What is r_basestudylocationvisit table in AEKOS? It is not in the source database. There appears to be some new information in this table.

ANSWER

CSV added to the database during the datatransfer runner execution. Not used.

ACTION / FIX

Nothing to fix


QUESTION

In Soil Characterisation , the values for colour_when_moist are like 2.5YR2.54 or 10R34 or 7.5YR2.53. The letters are straight out of the Yellow Book but what do the numbers mean? I suspect the numbers may be a mixture of the other characteristics such as abundance or size, but I am not certain.

ANSWER

 

ACTION / FIX

 


QUESTION

There are Sites (site_location) without latitude/longitude coordinates. In addition, there are Sites with no site_location_point, so there are site without any location information.

1 2 3 4 5 6 7 8 # Sites without app_long, app_lat = 311rows SELECT * from SITE_LOCATION WHERE app_lat IS NULL OR app_lat='No Location Data' OR app_lat='NC' # Site without location points = 101 rows SELECT * FROM SITE_LOCATION_POINT p RIGHT JOIN SITE_LOCATION s ON p.SITE_LOCATION_ID = s.SITE_LOCATION_ID WHERE p.site_location_id IS NULL # Sites with no location information = 85 rows SELECT * FROM SITE_LOCATION_POINT p RIGHT JOIN SITE_LOCATION s ON p.SITE_LOCATION_ID = s.SITE_LOCATION_ID WHERE p.site_location_id IS NULL AND (s.app_lat IS NULL OR s.app_lat='No Location Data' OR s.app_lat='NC')

ANSWER

(Emrys 04/02/2020): I get 48 ( not 101) that don’t have any points at all and these are forest plots ( the third letter of the site code will be an F). The data for these is stored elsewhere but we are the custodians of the physical samples- tea bags, soil etc. Other sites sometimes have less than the standard 25 points for a range of different reasons- DGPS malfunction, other groups set up the sites and only provided us with corner markings etc.

ACTION / FIX

After fixing the sites filtering, no more sites without location have been found.


QUESTION

The table species_growth_form is not consistent with the Look-Up Table lut_growth_form.

In species_growth_form the growth_form is Fungi whereas in lut_growth_form is Fungus.

ANSWER

(Emrys 05/02/2020): “Probably makes sense to change the lookup table to fungi rather than change all the entries for each species?“

ACTION / FIX

Review if data have been fixed in the source PENDING FUTURE REVIEW

Errors

ERROR

Typos in horizon classification column in soilcharacterisation table:

  1. Are major case suffixes "K" and "W" typos for "k" and "w"?

  2. What is "10YR53" and does it belong in horizon column?

  3. is Ash a typo for "A"?

  4. is B/c a typo for B/C or for Bc?

  5. is A! a typo for A1 or A?

 

ANSWER

 

 

ACTION / FIX

 

 

TEST

 

 

ERROR

misspelled transect identifiers

 

ANSWER

 

 

ACTION / FIX

Review if data have been fixed in source

 

TEST

expect_false(any(unique(point_intercept$transect) %in% c("W5-.E", "1-N1")))

 


ERROR

Missing transect across datatables

 

ANSWER

 

 

ACTION / FIX

Review if data have been fixed in source

 

TEST

show_failure(expect_setequal(site_location_visit$site_location_visit_id, point_intercept$site_location_visit_id))

 

 

ERROR

Negative values for basal area and basal area factor in basal area table.

 

ANSWER

 

 

ACTION / FIX

 

 

TEST

select * from basal_area where basal_area_factor < 0

 

 

ERROR

text in slope and aspect columns

 

ANSWER

 

 

ACTION / FIX

get rid of strings, get rid of non-numeric charcacters (e.g, °)

 

TEST

expect_numeric(site_location$slope & site_location$aspect) == TRUE

 

In table soil_characterisation the column mottles_colour is not linked to the look-up table lut_mottle_colour.


In table soil_bulk_density the field sample_id is inputted as free text. Ask if AusPlots data entry software can be a controlled drop-down list to only enter a consistent set of text. Ideally a new look-up table should be created for this.


The table photograph is not linked in the schema. A: photograph table is a generated table from site_location in postprocessing.


The table species_distinct_per_site is missing in the schema.


In the site_location table, both site_aspect and site_slope need to be fixed. They contain free text, numbers and percentage.