You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tskitr.md
+90-9Lines changed: 90 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,37 +12,65 @@ kernelspec:
12
12
name: ir
13
13
---
14
14
15
+
```{currentmodule} tskit
16
+
```
17
+
15
18
(sec_tskit_r)=
16
19
17
20
# Tskit and R
18
21
19
-
To interface with `tskit` in R, we can use the [reticulate](https://rstudio.github.io/reticulate/) R package, which lets you call Python functions within an R session. In this short tutorial, we'll go through a couple of examples to show you how to get started. If you haven't done so already, you'll need to install `reticulate` in your R session via `install.packages("reticulate")`.
22
+
To interface with `tskit` in R, we can use the [reticulate](https://rstudio.github.io/reticulate/) R package, which lets you call Python functions within an R session. In this tutorial, we'll go through a couple of examples to show you how to get started. If you haven't done so already, you'll need to install `reticulate` in your R session via `install.packages("reticulate")`.
20
23
21
24
We'll begin by simulating a small tree sequence using `msprime`.
`reticulate` allows us to access a Python object's attributes or call its methods via the `$` operator. For example, we can access (and assign to a variable) the number of samples in the tree sequence:
33
+
## Attributes and methods
34
+
35
+
`reticulate` allows us to access a Python object's attributes or call its methods via
36
+
the `$` operator. For example, we can access (and assign to a variable) the number of
37
+
samples in the tree sequence:
31
38
32
39
```{code-cell}
33
40
n <- ts$num_samples
34
41
n
35
42
```
36
43
37
-
We can also use `tskit`'s powerful [Statistics](https://tskit.dev/tskit/docs/stable/stats.html) framework to efficiently compute many different summary statistics from a tree sequence. To illustrate this, we'll first add some mutations to our tree sequence with `msprime`'s `sim_mutations` function, and then compute the genetic diversity for each of the tree sequence's sample nodes:
44
+
We can also call tskit methods on this tree sequence, such as
45
+
{meth}`TreeSequence.simplify`. The parameters are given as native R objects
46
+
(but note that we still retain tskit's 0-based indexing system
As a final example, we can also use the tree sequence `genotype_matrix()` method to return the genotypes of the the tree sequence as a matrix object in R.
55
+
56
+
## Analysis
57
+
58
+
From within R we can use `tskit`'s powerful
59
+
[Statistics](https://tskit.dev/tskit/docs/stable/stats.html) framework to efficiently
60
+
compute many different summary statistics from a tree sequence. To illustrate this,
61
+
we'll first add some mutations to our tree sequence with the
62
+
{func}`msprime:msprime.sim_mutations` function, and then compute the genetic diversity
Numerical arrays and matrices work as expected. For instance, we can use the tree
72
+
sequence {meth}`~TreeSequence.genotype_matrix()` method to return the genotypes of
73
+
the tree sequence as a matrix object in R.
46
74
47
75
```{code-cell}
48
76
G = ts_mut$genotype_matrix()
@@ -56,4 +84,57 @@ allele_frequency = rowMeans(G)
56
84
allele_frequency
57
85
```
58
86
59
-
It's as simple as that! Be sure to check out the [reticulate](https://rstudio.github.io/reticulate/) documentation, in particular on [Calling Python from R](https://rstudio.github.io/reticulate/articles/calling_python.html), which includes important information on how R data types are converted to their equivalent Python types.
87
+
## Jupyter notebook use
88
+
89
+
If you are running R within a [Jupyter notebook](https://jupyter.org), then you can
90
+
define a few magic functions that will display tskit tables and plots within the notebook:
91
+
92
+
```{code-cell}
93
+
# Define some magic functions to allow objects to be displayed in R Jupyter notebooks
And also allows trees and tree sequences to be easily plotted
106
+
107
+
```{code-cell}
108
+
ts_mut$draw_svg(y_axis=TRUE, y_ticks=0:10)
109
+
```
110
+
111
+
112
+
## Interaction with R libraries
113
+
114
+
R has a number of libraries to deal with genomic data and trees. Below we focus on the
115
+
phylogenetic tree representation defined in the the popular
116
+
[ape](http://ape-package.ird.fr) package, taking all the trees
117
+
:meth:`exported in Nexus format<TreeSequence.write_nexus>`, or
118
+
individual trees :meth:`exported in Newick format<TreeSequence.as_newick>`:
119
+
120
+
```{code-cell}
121
+
file = tempfile()
122
+
ts_mut$write_nexus(file)
123
+
# Warning - ape trees are stored independently, so this will use much more memory than tskit
124
+
trees <- ape::read.nexus(file, force.multi = TRUE) # return a set of trees
125
+
126
+
# Or simply read in a single tree
127
+
tree <- ape::read.tree(text=ts_mut$first()$as_newick())
128
+
129
+
# Now we can plot the tree in tskit style, but using the ape library
130
+
plot(tree, direction="downward", srt=90, adj=0.5) # or equivalently use trees[[1]]
131
+
```
132
+
133
+
Note that nodes are labelled with the prefix `n`, so that nodes `0`, `1`, `2`, ...
134
+
become `n0`, `n1`, `n2` ... etc. This helps to avoid
135
+
confusion between the the zero-based counting system used natively
136
+
by `tskit`, and the one-based counting system used in `R`.
137
+
138
+
## Further information
139
+
140
+
Be sure to check out the [reticulate](https://rstudio.github.io/reticulate/) documentation, in particular on [Calling Python from R](https://rstudio.github.io/reticulate/articles/calling_python.html), which includes important information on how R data types are converted to their equivalent Python types.
0 commit comments