-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathsimulation_comparison.Rmd
More file actions
126 lines (85 loc) · 5.15 KB
/
Copy pathsimulation_comparison.Rmd
File metadata and controls
126 lines (85 loc) · 5.15 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Simulation Comparison"
author: "Allen Hurlbert"
date: "July 25, 2019"
output: html_document
theme: cerulean
---
<style type="text/css">
body{ /* Normal */
font-size: 18px;
}
</style>
## Simulations
Goals...
Simulation models being compared...
```{r, echo = FALSE, warning = FALSE, message = FALSE }
library(ape)
library(stringr)
library(dplyr)
library(geiger)
library(lessR)
library(gsheet)
library(dplyr)
source('code/treeMetrics.R')
source('code/pcaFunctions.r')
treeOutput = read.table("treeOutput.txt", header = T, sep = '\t')
```
## Conduct a PCA across trees from all models
An important step is deciding which variables to include. Here, we exclude `S` or species richness, but include everything else.
```{r}
varsForPCA = c("tree.length", "PD", "gamma", "beta", "Colless", "Sackin", "Yule.PDA.ratio", "MRD",
"VRD", "PSV", "mean.Iprime", "MPD", "MGL_principal_eigenvalue",
"MGL_asymmetry", "MGL_peakedness", "MGL_eigengap", "nLTT_stat")
limitedVars = c("tree.length", "PD", "gamma", "Colless", "Sackin", "Yule.PDA.ratio", "MRD",
"VRD", "PSV", "mean.Iprime", "nLTT_stat")
pcaOutput = treeMetricsPCA(treeOutput, models = 'all', vars = varsForPCA)
pcaOutputLtd = treeMetricsPCA(treeOutput, models = 'all', vars = limitedVars)
```
Here is a correlation matrix showing how all of the metrics relate to each other. There is clearly a large suite of metrics that are positively correlated with richness.
``` {r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
varCor = cor(treeOutput[,!names(treeOutput) %in% c('model', 'simID', 'VPD')], use = "pairwise.complete.obs")
varCor2 = corReorder(varCor, bottom = 6, right = 6, diagonal_new = FALSE)
```
This correlation helps us interpret the different PC axes. Here are the loadings for the first 5 axes. PC 1 is basically species richness, even though richness itself was not in the PCA--all of the variables that load positively ~0.3 are positively correlated with richness.
PC2 and PC3 both are related to tree imbalance, with positive scores indicating positive mean.Iprime and negative beta.
PC4 loads positively with gamma, where negative values indicate trees with internal nodes closer to the root (i.e. slowdowns).
```{r}
round(pcaOutput$pcaLoadings[, 1:5], 2)
```
First, let's visualize where trees from all models fall in PCA space along the first 2 PC axes. The symbol reflects which of the 3 model families (statistical, conceptual, realistical) the the tree is from.
```{r, echo = FALSE, warning = FALSE, fig.align = 'center'}
par(mar = c(4, 4, 1, 1))
betweenModelPCAPlot(pcaOutput, xscore = 1, yscore = 2, colorBy = "model", pchBy = "ModelFamily", cex = 1.3)
```
Most of the variation in PC 1 is due to a set of models from Hurlbert-Stegen. Let's focus on some of the other PC axes.
```{r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
par(mar = c(4, 4, 1, 1))
betweenModelPCAPlot(pcaOutput, xscore = 2, yscore = 3, colorBy = "model", pchBy = "ModelFamily", cex = 1.3)
betweenModelPCAPlot(pcaOutput, xscore = 3, yscore = 4, colorBy = "model", pchBy = "ModelFamily", cex = 1.3)
```
Because the calculation of certain metrics failed on certain trees, they are not included in the PCA plots above. If we limit the PCA to the subset of metrics available for all trees (i.e. excluding beta, MPD, and the RPANDA-MGL statistics), we get the following:
```{r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
par(mar = c(4, 4, 1, 1))
betweenModelPCAPlot(pcaOutputLtd, xscore = 2, yscore = 3, colorBy = "model", pchBy = "ModelFamily", cex = 1.3)
betweenModelPCAPlot(pcaOutputLtd, xscore = 3, yscore = 4, colorBy = "model", pchBy = "ModelFamily", cex = 1.3)
```
## Within model variation
We can also explore what explains variation within a given model. Here is Thiago's model plotted in the overall PCA space, coded by the dispersal parameter and region of the founder species.
``` {r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
par(mar = c(4,4, 1, 1))
withinModelPCAPlot(pcaOutput, "ra", xscore = 2, yscore = 3, colorBy = 'Dispersal', pchBy = 'Founder', cex = 2)
withinModelPCAPlot(pcaOutput, "ra", xscore = 1, yscore = 2, colorBy = 'Dispersal', pchBy = 'Founder', cex = 2)
```
Here is Florian's model (fh), coded by speciation rate (colors) and whether speciation is protracted or not.
``` {r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
par(mar = c(4,4, 1, 1))
withinModelPCAPlot(pcaOutput, "fh", xscore = 2, yscore = 3, colorBy = 'speciationRate', pchBy = 'protracted', cex = 2)
withinModelPCAPlot(pcaOutput, "fh", xscore = 1, yscore = 2, colorBy = 'speciationRate', pchBy = 'protracted', cex = 2)
```
Here is Allen's model (hs), coded by whether there is diversity dependence (carry.cap, colors) and whether there is a gradient in speciation rates.
``` {r, echo = FALSE, warning = FALSE, message = FALSE, fig.align = 'center'}
par(mar = c(4,4, 1, 1))
withinModelPCAPlot(pcaOutputLtd, "hs", xscore = 2, yscore = 3, colorBy = 'carry.cap', pchBy = 'specn.gradient', cex = 2)
withinModelPCAPlot(pcaOutputLtd, "hs", xscore = 1, yscore = 2, colorBy = 'carry.cap', pchBy = 'specn.gradient', cex = 2)
```