Skip to content

Commit 64da65b

Browse files
Merge pull request #179 from KristinaRiemer/add_walkthrough
Add Rmd for traits and weather download runthrough
2 parents c8d9dae + c27ef33 commit 64da65b

2 files changed

Lines changed: 379 additions & 0 deletions

File tree

videos/pilot_walkthrough.Rmd

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: "Pilot Walkthrough"
3+
author: "Kristina Riemer"
4+
output: github_document
5+
urlcolor: blue
6+
---
7+
8+
### Intro
9+
10+
Using data from TERRA REF project. Get and plot trait data, and then same for weather data.
11+
12+
Will be live coding, so if you want can follow along doing what I do on your own machine.
13+
14+
Will be using R + RStudio, and following R packages: traits to get data, dplyr and lubridate for data cleaning, ggplot for plotting trait data.
15+
16+
Full tutorials for these at: terraref.github.io/tutorials/. I can also send the code used here specifically if people want it.
17+
18+
### Traits download
19+
20+
Set some global options for the function used to get data
21+
Using subset of data that's publicly available so don't need API key. Will need to find and use API key to access other data.
22+
23+
```{r}
24+
options(betydb_url = "https://terraref.ncsa.illinois.edu/bety/",
25+
betydb_api_version = 'beta',
26+
betydb_key = '9999999999999999999999999999999999999999')
27+
```
28+
29+
Using traits R package. Function is betydb_query, works for several datasets including Terra Ref.
30+
31+
Pulling data from Season 4, only a subset using limit because there's a lot of it.
32+
33+
```{r}
34+
library(traits)
35+
season_4 <- betydb_query(sitename = "~Season 4", limit = 1000)
36+
```
37+
38+
Look at dataframe.
39+
40+
Look at just traits available, canopy_height is one. Using data cleaning R package.
41+
42+
```{r}
43+
library(dplyr)
44+
season_4 %>%
45+
distinct(trait) %>%
46+
print(n = Inf)
47+
```
48+
49+
Want to look at just the trait values for this trait during a more recent season, season 6. Use same function but with another argument, trait.
50+
51+
```{r}
52+
canopy_height <- betydb_query(trait = "canopy_height",
53+
sitename = "~Season 6",
54+
limit = 250)
55+
```
56+
57+
Want to plot canopy height across time, first have to get date into correct format for plotting. Use function from another R package to create new date column with correct formatted date.
58+
59+
```{r}
60+
library(lubridate)
61+
canopy_height <- canopy_height %>%
62+
mutate(formatted_date = ymd_hms(raw_date))
63+
```
64+
65+
Plot canopy data. Using ggplot package.
66+
67+
Plot newly formatted date column on x-axis and canopy height value, in mean column on y.
68+
69+
```{r}
70+
library(ggplot2)
71+
ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) +
72+
geom_point()
73+
```
74+
75+
Add axis labels, finding units from dataframe.
76+
77+
```{r}
78+
ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) +
79+
geom_point() +
80+
labs(x = "Date", y = "Plant height (cm)")
81+
```
82+
83+
How to get API key:
84+
85+
1. Log into betydb.org
86+
2. Go to data/users
87+
3. See your account there with API key listed
88+
89+
### Weather download
90+
91+
No special R package for getting weather data. Pull directly from Clowder.
92+
93+
Data is in JSON format, so use this R package to pull down data and turn into R data frame structure.
94+
95+
Create URL based on what part of data we want. Stream ID specifies weather station, and then since and until for date range. Getting all weather data for 2017.
96+
97+
```{r}
98+
library(jsonlite)
99+
weather <- fromJSON('https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31', flatten = FALSE)
100+
```
101+
102+
Pulling out subset of data called properties. Handful of weather data.
103+
104+
Then same reformatting of date as before. Using end_time column from weather dataset.
105+
106+
```{r}
107+
weather <- weather$properties %>%
108+
mutate(formatted_date = ymd_hms(weather$end_time))
109+
```
110+
111+
Plot single variable, air temperature, across time. Turns out data is only for month of January.
112+
113+
```{r}
114+
ggplot(data = weather, aes(x = formatted_date, y = air_temperature)) +
115+
geom_point() +
116+
labs(x = "Date", y = "Temperature (K)")
117+
```
118+
119+
If we want to easily plot all 8 of the weather variables, need to rearrange data. It's in wide format, need it in long.
120+
121+
Remove a couple of unneeded columns. Then turn variable headers into a column and put their values in weather_value column.
122+
```{r}
123+
library(tidyr)
124+
weather_long <- weather %>%
125+
select(-source, -source_file) %>%
126+
gather(weather_variable, weather_value, -formatted_date)
127+
```
128+
129+
Can now easily plot all of them using ggplot.
130+
131+
```{r}
132+
ggplot(data = weather_long, aes(x = formatted_date, y = weather_value)) +
133+
geom_point() +
134+
facet_wrap(~weather_variable, scales = "free_y") +
135+
labs(x = "Date", y = "Weather variable")
136+
```

videos/pilot_walkthrough.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
Pilot Walkthrough
2+
================
3+
Kristina Riemer
4+
5+
### Intro
6+
7+
Using data from TERRA REF project. Get and plot trait data, and then
8+
same for weather data.
9+
10+
Will be live coding, so if you want can follow along doing what I do on
11+
your own machine.
12+
13+
Will be using R + RStudio, and following R packages: traits to get data,
14+
dplyr and lubridate for data cleaning, ggplot for plotting trait data.
15+
16+
Full tutorials for these at: terraref.github.io/tutorials/. I can also
17+
send the code used here specifically if people want it.
18+
19+
### Traits download
20+
21+
Set some global options for the function used to get data Using subset
22+
of data that’s publicly available so don’t need API key. Will need to
23+
find and use API key to access other data.
24+
25+
``` r
26+
options(betydb_url = "https://terraref.ncsa.illinois.edu/bety/",
27+
betydb_api_version = 'beta',
28+
betydb_key = '9999999999999999999999999999999999999999')
29+
```
30+
31+
Using traits R package. Function is betydb\_query, works for several
32+
datasets including Terra Ref.
33+
34+
Pulling data from Season 4, only a subset using limit because there’s a
35+
lot of it.
36+
37+
``` r
38+
library(traits)
39+
```
40+
41+
## Registered S3 method overwritten by 'httr':
42+
## method from
43+
## as.character.form_file crul
44+
45+
## Registered S3 method overwritten by 'hoardr':
46+
## method from
47+
## print.cache_info httr
48+
49+
``` r
50+
season_4 <- betydb_query(sitename = "~Season 4", limit = 1000)
51+
```
52+
53+
Look at dataframe.
54+
55+
Look at just traits available, canopy\_height is one. Using data
56+
cleaning R package.
57+
58+
``` r
59+
library(dplyr)
60+
```
61+
62+
##
63+
## Attaching package: 'dplyr'
64+
65+
## The following objects are masked from 'package:stats':
66+
##
67+
## filter, lag
68+
69+
## The following objects are masked from 'package:base':
70+
##
71+
## intersect, setdiff, setequal, union
72+
73+
``` r
74+
season_4 %>%
75+
distinct(trait) %>%
76+
print(n = Inf)
77+
```
78+
79+
## # A tibble: 40 x 1
80+
## trait
81+
## <chr>
82+
## 1 canopy_height
83+
## 2 relative_chlorophyll
84+
## 3 absorbance_730
85+
## 4 leaf_temperature
86+
## 5 vH+
87+
## 6 light_intensity_PAR
88+
## 7 SPAD_880
89+
## 8 SPAD_850
90+
## 9 SPAD_650
91+
## 10 leaf_angle_clamp_position
92+
## 11 ambient_humidity
93+
## 12 leaf_thickness
94+
## 13 SPAD_730
95+
## 14 SPAD_605
96+
## 15 SPAD_530
97+
## 16 RFd
98+
## 17 qP
99+
## 18 qL
100+
## 19 NPQt
101+
## 20 Fs
102+
## 21 absorbance_940
103+
## 22 absorbance_880
104+
## 23 absorbance_605
105+
## 24 absorbance_530
106+
## 25 PhiNPQ
107+
## 26 PhiNO
108+
## 27 roll
109+
## 28 absorbance_850
110+
## 29 SPAD_420
111+
## 30 LEF
112+
## 31 FoPrime
113+
## 32 FmPrime
114+
## 33 Phi2
115+
## 34 leaf_temperature_differential
116+
## 35 ECSt
117+
## 36 gH+
118+
## 37 FvP/FmP
119+
## 38 proximal_air_temperature
120+
## 39 pitch
121+
## 40 absorbance_650
122+
123+
Want to look at just the trait values for this trait during a more
124+
recent season, season 6. Use same function but with another argument,
125+
trait.
126+
127+
``` r
128+
canopy_height <- betydb_query(trait = "canopy_height",
129+
sitename = "~Season 6",
130+
limit = 250)
131+
```
132+
133+
Want to plot canopy height across time, first have to get date into
134+
correct format for plotting. Use function from another R package to
135+
create new date column with correct formatted date.
136+
137+
``` r
138+
library(lubridate)
139+
```
140+
141+
##
142+
## Attaching package: 'lubridate'
143+
144+
## The following object is masked from 'package:base':
145+
##
146+
## date
147+
148+
``` r
149+
canopy_height <- canopy_height %>%
150+
mutate(formatted_date = ymd_hms(raw_date))
151+
```
152+
153+
Plot canopy data. Using ggplot package.
154+
155+
Plot newly formatted date column on x-axis and canopy height value, in
156+
mean column on y.
157+
158+
``` r
159+
library(ggplot2)
160+
ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) +
161+
geom_point()
162+
```
163+
164+
![](pilot_walkthrough_files/figure-gfm/unnamed-chunk-6-1.png)<!-- -->
165+
166+
Add axis labels, finding units from dataframe.
167+
168+
``` r
169+
ggplot(data = canopy_height, aes(x = formatted_date, y = mean)) +
170+
geom_point() +
171+
labs(x = "Date", y = "Plant height (cm)")
172+
```
173+
174+
![](pilot_walkthrough_files/figure-gfm/unnamed-chunk-7-1.png)<!-- -->
175+
176+
How to get API key:
177+
178+
1. Log into betydb.org
179+
2. Go to data/users
180+
3. See your account there with API key listed
181+
182+
### Weather download
183+
184+
No special R package for getting weather data. Pull directly from
185+
Clowder.
186+
187+
Data is in JSON format, so use this R package to pull down data and turn
188+
into R data frame structure.
189+
190+
Create URL based on what part of data we want. Stream ID specifies
191+
weather station, and then since and until for date range. Getting all
192+
weather data for 2017.
193+
194+
``` r
195+
library(jsonlite)
196+
weather <- fromJSON('https://terraref.ncsa.illinois.edu/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31', flatten = FALSE)
197+
```
198+
199+
Pulling out subset of data called properties. Handful of weather data.
200+
201+
Then same reformatting of date as before. Using end\_time column from
202+
weather dataset.
203+
204+
``` r
205+
weather <- weather$properties %>%
206+
mutate(formatted_date = ymd_hms(weather$end_time))
207+
```
208+
209+
Plot single variable, air temperature, across time. Turns out data is
210+
only for month of January.
211+
212+
``` r
213+
ggplot(data = weather, aes(x = formatted_date, y = air_temperature)) +
214+
geom_point() +
215+
labs(x = "Date", y = "Temperature (K)")
216+
```
217+
218+
![](pilot_walkthrough_files/figure-gfm/unnamed-chunk-10-1.png)<!-- -->
219+
220+
If we want to easily plot all 8 of the weather variables, need to
221+
rearrange data. It’s in wide format, need it in long.
222+
223+
Remove a couple of unneeded columns. Then turn variable headers into a
224+
column and put their values in weather\_value column.
225+
226+
``` r
227+
library(tidyr)
228+
weather_long <- weather %>%
229+
select(-source, -source_file) %>%
230+
gather(weather_variable, weather_value, -formatted_date)
231+
```
232+
233+
Can now easily plot all of them using
234+
ggplot.
235+
236+
``` r
237+
ggplot(data = weather_long, aes(x = formatted_date, y = weather_value)) +
238+
geom_point() +
239+
facet_wrap(~weather_variable, scales = "free_y") +
240+
labs(x = "Date", y = "Weather variable")
241+
```
242+
243+
![](pilot_walkthrough_files/figure-gfm/unnamed-chunk-12-1.png)<!-- -->

0 commit comments

Comments
 (0)