@@ -27,12 +27,12 @@ We'll begin by simulating a small tree sequence using `msprime`.
2727msprime <- reticulate::import("msprime")
2828
2929ts <- msprime$sim_ancestry(80, sequence_length=1e4, recombination_rate=1e-4, random_seed=42)
30- ts
30+ ts # See "Jupyter notebook tips", below for how to render this nicely
3131```
3232
3333## Attributes and methods
3434
35- ` reticulate ` allows us to access a Python object's attributes or call its methods via
35+ ` reticulate ` allows us to access a Python object's attributes via
3636the ` $ ` operator. For example, we can access (and assign to a variable) the number of
3737samples in the tree sequence:
3838
@@ -41,17 +41,51 @@ n <- ts$num_samples
4141n
4242```
4343
44- We can also call tskit methods on this tree sequence, such as
45- {meth}` TreeSequence.simplify ` . The parameters are given as native R objects
46- (but note that we still retain tskit's 0-based indexing system
44+ The ` $ ` operator can also be used to call methods, for example, the
45+ {meth}` ~TreeSequence.simplify ` method associated with the tree sequence.
46+ The method parameters are given as native R objects
47+ (but note that object IDs are still use tskit's 0-based indexing system).
4748
4849``` {code-cell}
49- reduced_ts <- ts$simplify(0:7) # only keep the first 8 samples
50- reduced_ts <- reduced_ts$delete_intervals(list(c(6000, 10000))) # keep the first 6kb
51- reduced_ts <- reduced_ts$trim() # remove the deleted end
52- paste("Reduced from", ts$num_trees, "trees to", reduced_ts$num_trees, "trees")
50+ reduced_ts <- ts$simplify(0:7) # only keep samples with ids 0, 1, 2, 3, 4, 5, 6, 7
51+ reduced_ts <- reduced_ts$delete_intervals(list(c(6000, 10000))) # delete data after 6kb
52+ reduced_ts <- reduced_ts$trim() # remove the deleted region
53+ paste(
54+ "Reduced from", ts$num_trees, "trees over", ts$sequence_length/1e3, "kb to",
55+ reduced_ts$num_trees, "trees over", reduced_ts$sequence_length/1e3, "kb.")
5356```
5457
58+ ### IDs and indexes
59+
60+ Note that if a bare digit is provided to one of these methods, it will be treated as a
61+ floating point number. This is useful to know when calling ` tskit ` methods that
62+ require integers (e.g. object IDs). For example, the following will not work:
63+
64+ ``` {code-cell}
65+ :tags: [raises-exception, remove-output]
66+ ts$node(0) # Will raise an error
67+ ```
68+
69+ In this case, to force the ` 0 ` to be passed as an integer, you can either coerce it
70+ using ` as.integer ` or simply prepend the letter ` L ` :
71+
72+ ``` {code-cell}
73+ ts$node(as.integer(0))
74+ # or
75+ ts$node(0L)
76+ ```
77+
78+ Coercing in this way is only necessary when passing parameters to those underlying
79+ ` tskit ` methods that expect integers. It is not needed e.g. to index into numeric arrays.
80+ _ However_ , when using arrays, very careful attention must be paid to the fact that
81+ ` tskit ` IDs start at zero, whereas R indexes start at one:
82+
83+ ``` {code-cell}
84+ root_id <- ts$first()$root
85+ paste("Root time via tskit method:", ts$node(root_id)$time)
86+ # When indexing into tskit arrays in R, add 1 to the ID
87+ paste("Root time via array access:", ts$nodes_time[root_id + 1])
88+ ```
5589
5690## Analysis
5791
@@ -84,10 +118,10 @@ allele_frequency = rowMeans(G)
84118allele_frequency
85119```
86120
87- ## Jupyter notebook use
121+ ## Jupyter notebook tips
88122
89- If you are running R within a [ Jupyter notebook] ( https://jupyter.org ) , then you can
90- define a few magic functions that will display tskit tables and plots within the notebook:
123+ When running R within a [ Jupyter notebook] ( https://jupyter.org ) , a few magic functions
124+ can be defined that allow tskit objects to be rendered within the notebook:
91125
92126``` {code-cell}
93127# Define some magic functions to allow objects to be displayed in R Jupyter notebooks
@@ -96,13 +130,13 @@ repr_html.tskit.trees.Tree <- function(obj, ...){obj$`_repr_html_`()}
96130repr_svg.tskit.drawing.SVGString <- function(obj, ...){obj$`__str__`()}
97131```
98132
99- This leads to much nicer summary tables :
133+ This leads to much nicer tabular summaries :
100134
101135``` {code-cell}
102136ts_mut
103137```
104138
105- And also allows trees and tree sequences to be easily plotted
139+ It also allows trees and tree sequences to be plotted inline:
106140
107141``` {code-cell}
108142ts_mut$draw_svg(y_axis=TRUE, y_ticks=0:10)
@@ -114,8 +148,8 @@ ts_mut$draw_svg(y_axis=TRUE, y_ticks=0:10)
114148R has a number of libraries to deal with genomic data and trees. Below we focus on the
115149phylogenetic tree representation defined in the the popular
116150[ ape] ( http://ape-package.ird.fr ) package, taking all the trees
117- : meth: ` exported in Nexus format<TreeSequence.write_nexus> ` , or
118- individual trees : meth: ` exported in Newick format<TreeSequence .as_newick> ` :
151+ { meth} ` exported in Nexus format<TreeSequence.write_nexus> ` , or
152+ individual trees { meth} ` exported in Newick format<Tree .as_newick> ` :
119153
120154``` {code-cell}
121155file = tempfile()
0 commit comments