@@ -291,13 +291,13 @@ its simplified version:
291291``` {code-cell}
292292large_sim_parameters = parameters.copy()
293293large_sim_parameters["sequence_length"] *= 1000
294- ts_arg = msprime.sim_ancestry(**large_sim_parameters, record_full_arg=True)
295- ts = ts_arg .simplify()
294+ large_ts_arg = msprime.sim_ancestry(**large_sim_parameters, record_full_arg=True)
295+ large_ts = large_ts_arg .simplify()
296296
297297print(
298298 "Non-coalescent nodes take up "
299- f"{(1-ts .num_nodes/ts_arg .num_nodes) * 100:0.2f}% "
300- f"of this {ts .sequence_length/1e6:g} megabase {ts .num_samples}-tip ARG"
299+ f"{(1-large_ts .num_nodes/large_ts_arg .num_nodes) * 100:0.2f}% "
300+ f"of this {large_ts .sequence_length/1e6:g} megabase {large_ts .num_samples}-tip ARG"
301301)
302302```
303303
@@ -314,18 +314,21 @@ vastly larger number of nodes than even the ARGs simulated here, and using such
314314structures for simulation or inference is therefore infeasible.
315315:::
316316
317- ## Working with the ARG
317+ ## Working with the tree sequence graph
318318
319- All the normal tskit functions can be used to analyse an ARG stored in tskit form. However, some
320- operations are naturally though of in terms of the tree sequence as a graph.
319+ All tree sequences, including, but not limited to full ARGs, can be treated as
320+ directed (acyclic) graphs. Although many tree sequence operations operate from left to
321+ right along the genome, some are more naturally though of as passing from node
322+ to node via the edges, regardless of the genomic position of the edge. This section
323+ describes some of these fundamental graph operations.
321324
322325### Graph traversal
323326
324327The standard edge iterator, {meth}` TreeSequence.edge_diffs() ` , goes from left to
325- right along the genome, matching the {meth}` TreeSequence.trees() ` iterator. This
326- means that unlike most conventional graph traversal methods, the returned edges
327- are * not * necessarily grouped by node ID ( either
328- the edge's parent node or the edge's child node) .
328+ right along the genome, matching the {meth}` TreeSequence.trees() ` iterator. Although
329+ this will visit all the edges in the graph, these will * not * necessarily be grouped
330+ by the node ID either of the edge parent or the edge child. To do this, an alternative
331+ traversal (from top-to-bottom or bottom-to-top of the tree sequence) is required .
329332
330333To traverse the graph by node ID, the {meth}` TreeSequence.nodes() ` iterator can be
331334used. In particular, because parents are required to be strictly older than their
@@ -334,10 +337,9 @@ are always visited before their parents (similar to a breadth-first or "level or
334337search). However, using {meth}` TreeSequence.nodes() ` is inefficient if you also
335338want to access the * edges* associated with each node.
336339
337- The examples below show how to efficiently traverse the connected nodes in a tree
338- sequence graph, visiting each
339- only once, ensuring children are visited before parents (or vice versa), while
340- simultaneously giving access to the edges associated with each node.
340+ The examples below show how to efficiently visit the all the edges in a
341+ tree sequence, grouped by the nodes to which they are connected, while
342+ also ensuring that children are visited before parents (or vice versa).
341343
342344#### Traversing parent nodes
343345
0 commit comments