@@ -316,6 +316,84 @@ structures for simulation or inference is therefore infeasible.
316316
317317## Working with the ARG
318318
319+ All the normal tskit functions can be used to analyse an ARG stored in tskit form. However, some
320+ operations are naturally though of in terms of the tree sequence as a graph.
321+
322+ ### Graph traversal
323+
324+ The standard edge iterator, {meth}` TreeSequence.edge_diffs() ` , goes from left to
325+ right along the genome, matching the {meth}` TreeSequence.trees() ` iterator. This
326+ means that unlike most conventional graph traversal methods, the returned edges
327+ are * not* necessarily grouped by node ID (either
328+ the edge's parent node or the edge's child node).
329+
330+ To traverse the graph by node ID, the {meth}` TreeSequence.nodes() ` iterator can be
331+ used. In particular, because parents are required to be strictly older than their
332+ children, iterating through nodes using ` order="timeasc" ` will ensure that children
333+ are always visited before their parents (similar to a breadth-first or "level order"
334+ search). However, using {meth}` TreeSequence.nodes() ` is inefficient if you also
335+ want to access the * edges* associated with each node.
336+
337+ The examples below show how to efficiently traverse the connected nodes in a tree
338+ sequence graph, visiting each
339+ only once, ensuring children are visited before parents (or vice versa), while
340+ simultaneously giving access to the edges associated with each node.
341+
342+ #### Traversing parent nodes
343+
344+ The most efficient graph traversal method visits all the parent nodes in the
345+ tree sequence, grouping edges for which that node is a parent. This is simple
346+ because edges in a tree sequence are {ref}` ordered<tskit:sec_edge_requirements> `
347+ firstly by the time of the parent node, then by node ID.
348+
349+ ``` {code-cell}
350+ import itertools
351+ import operator
352+ def edges_by_parent_timeasc(ts):
353+ return itertools.groupby(ts.edges(), operator.attrgetter("parent"))
354+
355+ for parent_id, edges in edges_by_parent_timeasc(ts):
356+ t = ts.node(parent_id).time
357+ children = {e.child for e in edges}
358+ print(f"Node {parent_id} at time {t} has these child node IDs: {children}")
359+ ```
360+
361+ This visits children before parents. To visit parents before children, you can
362+ simply use `` reversed(ts.edges()) `` rather than `` ts.edges() `` within the
363+ `` groupby `` function. Note that terminal nodes (i.e. which are either isolated
364+ or leaf nodes in all local trees) are not visited by this function: therefore
365+ the method above will omit all the terminal sample nodes.
366+
367+ #### Traversing child nodes
368+
369+ Sometimes you may wish to iterate over all the edges for which a node is a
370+ child (rather than a parent). This can be done by sorting the edges by
371+ child node time (then child node id). This is a little slower, but can be
372+ done relatively efficiently as follows:
373+
374+ ``` {code-cell}
375+ import itertools
376+ import operator
377+ import numpy as np
378+ def edges_by_child_timeasc(ts):
379+ # edges sorted by child node time, then child node id using np.lexsort
380+ it = (ts.edge(u) for u in np.lexsort((ts.edges_child, ts.nodes_time[ts.edges_child])))
381+ return itertools.groupby(it, operator.attrgetter("child"))
382+
383+ for child_id, edges in edges_by_child_timeasc(ts):
384+ t = ts.node(child_id).time
385+ parents = {e.parent for e in edges}
386+ print(f"Node {child_id} at time {t} has these parent node IDs: {parents}")
387+ ```
388+
389+ To visit parents before children, the `` lexsort `` can take the negated
390+ `` -ts.nodes_time `` rather than simply using `` ts.nodes_time `` . Note that nodes
391+ which are never children of an edge are not visited by this algorithm. Such
392+ nodes are either {ref}` isolated<sec_data_model_tree_isolated_nodes> ` or a
393+ {ref}`root<sec_data_model_tree_roots> in each local tree.
394+
395+ ### Misc
396+
319397:::{todo}
320398Add extra content as per [ https://github.com/tskit-dev/tutorials/issues/43 ] ( https://github.com/tskit-dev/tutorials/issues/43 )
321399:::
0 commit comments