Skip to content

Commit c861415

Browse files
authored
Merge pull request #223 from hyanwong/R-tutorial-notebook
Add information about indexing in R
2 parents 3d6d371 + 4ad878b commit c861415

1 file changed

Lines changed: 50 additions & 16 deletions

File tree

tskitr.md

Lines changed: 50 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,12 @@ We'll begin by simulating a small tree sequence using `msprime`.
2727
msprime <- reticulate::import("msprime")
2828
2929
ts <- msprime$sim_ancestry(80, sequence_length=1e4, recombination_rate=1e-4, random_seed=42)
30-
ts
30+
ts # See "Jupyter notebook tips", below for how to render this nicely
3131
```
3232

3333
## Attributes and methods
3434

35-
`reticulate` allows us to access a Python object's attributes or call its methods via
35+
`reticulate` allows us to access a Python object's attributes via
3636
the `$` operator. For example, we can access (and assign to a variable) the number of
3737
samples in the tree sequence:
3838

@@ -41,17 +41,51 @@ n <- ts$num_samples
4141
n
4242
```
4343

44-
We can also call tskit methods on this tree sequence, such as
45-
{meth}`TreeSequence.simplify`. The parameters are given as native R objects
46-
(but note that we still retain tskit's 0-based indexing system
44+
The `$` operator can also be used to call methods, for example, the
45+
{meth}`~TreeSequence.simplify` method associated with the tree sequence.
46+
The method parameters are given as native R objects
47+
(but note that object IDs are still use tskit's 0-based indexing system).
4748

4849
```{code-cell}
49-
reduced_ts <- ts$simplify(0:7) # only keep the first 8 samples
50-
reduced_ts <- reduced_ts$delete_intervals(list(c(6000, 10000))) # keep the first 6kb
51-
reduced_ts <- reduced_ts$trim() # remove the deleted end
52-
paste("Reduced from", ts$num_trees, "trees to", reduced_ts$num_trees, "trees")
50+
reduced_ts <- ts$simplify(0:7) # only keep samples with ids 0, 1, 2, 3, 4, 5, 6, 7
51+
reduced_ts <- reduced_ts$delete_intervals(list(c(6000, 10000))) # delete data after 6kb
52+
reduced_ts <- reduced_ts$trim() # remove the deleted region
53+
paste(
54+
"Reduced from", ts$num_trees, "trees over", ts$sequence_length/1e3, "kb to",
55+
reduced_ts$num_trees, "trees over", reduced_ts$sequence_length/1e3, "kb.")
5356
```
5457

58+
### IDs and indexes
59+
60+
Note that if a bare digit is provided to one of these methods, it will be treated as a
61+
floating point number. This is useful to know when calling `tskit` methods that
62+
require integers (e.g. object IDs). For example, the following will not work:
63+
64+
```{code-cell}
65+
:tags: [raises-exception, remove-output]
66+
ts$node(0) # Will raise an error
67+
```
68+
69+
In this case, to force the `0` to be passed as an integer, you can either coerce it
70+
using `as.integer` or simply prepend the letter `L`:
71+
72+
```{code-cell}
73+
ts$node(as.integer(0))
74+
# or
75+
ts$node(0L)
76+
```
77+
78+
Coercing in this way is only necessary when passing parameters to those underlying
79+
`tskit` methods that expect integers. It is not needed e.g. to index into numeric arrays.
80+
_However_, when using arrays, very careful attention must be paid to the fact that
81+
`tskit` IDs start at zero, whereas R indexes start at one:
82+
83+
```{code-cell}
84+
root_id <- ts$first()$root
85+
paste("Root time via tskit method:", ts$node(root_id)$time)
86+
# When indexing into tskit arrays in R, add 1 to the ID
87+
paste("Root time via array access:", ts$nodes_time[root_id + 1])
88+
```
5589

5690
## Analysis
5791

@@ -84,10 +118,10 @@ allele_frequency = rowMeans(G)
84118
allele_frequency
85119
```
86120

87-
## Jupyter notebook use
121+
## Jupyter notebook tips
88122

89-
If you are running R within a [Jupyter notebook](https://jupyter.org), then you can
90-
define a few magic functions that will display tskit tables and plots within the notebook:
123+
When running R within a [Jupyter notebook](https://jupyter.org), a few magic functions
124+
can be defined that allow tskit objects to be rendered within the notebook:
91125

92126
```{code-cell}
93127
# Define some magic functions to allow objects to be displayed in R Jupyter notebooks
@@ -96,13 +130,13 @@ repr_html.tskit.trees.Tree <- function(obj, ...){obj$`_repr_html_`()}
96130
repr_svg.tskit.drawing.SVGString <- function(obj, ...){obj$`__str__`()}
97131
```
98132

99-
This leads to much nicer summary tables:
133+
This leads to much nicer tabular summaries:
100134

101135
```{code-cell}
102136
ts_mut
103137
```
104138

105-
And also allows trees and tree sequences to be easily plotted
139+
It also allows trees and tree sequences to be plotted inline:
106140

107141
```{code-cell}
108142
ts_mut$draw_svg(y_axis=TRUE, y_ticks=0:10)
@@ -114,8 +148,8 @@ ts_mut$draw_svg(y_axis=TRUE, y_ticks=0:10)
114148
R has a number of libraries to deal with genomic data and trees. Below we focus on the
115149
phylogenetic tree representation defined in the the popular
116150
[ape](http://ape-package.ird.fr) package, taking all the trees
117-
:meth:`exported in Nexus format<TreeSequence.write_nexus>`, or
118-
individual trees :meth:`exported in Newick format<TreeSequence.as_newick>`:
151+
{meth}`exported in Nexus format<TreeSequence.write_nexus>`, or
152+
individual trees {meth}`exported in Newick format<Tree.as_newick>`:
119153

120154
```{code-cell}
121155
file = tempfile()

0 commit comments

Comments
 (0)