Skip to content

Commit c3e1686

Browse files
authored
Merge pull request #260 from hyanwong/arg-viz
Arg viz
2 parents 1587d0d + eb38330 commit c3e1686

7 files changed

Lines changed: 340 additions & 158 deletions

File tree

.github/workflows/build.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ jobs:
2121
uses: actions/checkout@v4
2222

2323
# Install dependencies
24+
- name: Install Graphviz
25+
run: |
26+
sudo apt-get install graphviz {lib,}graphviz-dev
27+
2428
- name: Set up Python 3.11
2529
uses: actions/setup-python@v4
2630
with:

_toc.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,14 @@ parts:
2525
chapters:
2626
- file: simulation_overview
2727
- file: no_mutations
28-
- file: completing_forward_sims
2928
- file: advanced_msprime
3029
sections:
3130
- file: demography
3231
- file: bottlenecks
3332
- file: introgression
33+
- file: completing_forward_sims
3434
- file: forward_sims
35+
- file: more_forward_sims
3536
- caption: Other languages
3637
# TODO: add basic C and maybe Rust tutes
3738
chapters:

args.md

Lines changed: 202 additions & 142 deletions
Large diffs are not rendered by default.

forward_sims.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ as rows of the the {ref}`sec_individual_table_definition` (a node can then be as
5959
individual by storing that individual's id in the appropriate row of the node table).
6060

6161
An *edge* reflects a transmission event between nodes. An edge is a tuple `(Left, Right, Parent, Child)`
62-
whose meaning is "Child genome $C$ inherited the genomic interval $[L, R)$ from $Parent genome $P$".
62+
whose meaning is "The `Child` genome inherited the genomic interval [`Left`, `Right`) from the `Parent` genome".
6363
In a tree sequence this is stored in a row of the {ref}`sec_edge_table_definition`.
6464

6565
The *time*, in the discrete-time Wright-Fisher (WF) model which we will simulate, is measured in
@@ -90,7 +90,7 @@ import numpy as np
9090
random_seed = 2
9191
random = np.random.default_rng(random_seed) # A random number generator for general use
9292
93-
L = 50_000 # The sequence length = 50 Kb
93+
L = 50_000 # The sequence length: 50 Kb
9494
```
9595

9696
### Core steps

requirements-CI.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@ jupyter-cache==0.6.1
66
msprime==1.2.0
77
networkx==3.1
88
pandas==2.1.1
9+
pygraphviz==1.10
910
scikit-allel==1.3.7
1011
stdpopsim==0.2.0
1112
tqdm==4.66.1
1213
tskit==0.5.5
14+
tskit_arg_visualizer==0.0.1
1315
tszip==0.2.2
1416
jsonschema==4.18.6 # Pinned due to 4.19 "AttributeError module jsonschema has no attribute _validators"

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,10 @@ jupyter-cache
66
msprime>=1.0
77
networkx
88
pandas
9+
pygraphviz
910
scikit-allel
1011
stdpopsim
1112
tqdm
1213
tskit>=0.5.4
14+
tskit_arg_visualizer
1315
tszip

viz.md

Lines changed: 126 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1503,22 +1503,135 @@ of which are outlined below.
15031503

15041504
A tree sequence can be treated as a specific form of (directed)
15051505
[graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) consisting
1506-
of nodes connected by edges. Standard graph visualization software, such as
1507-
[graphviz](https://graphviz.org) can therefore be used to represent tree sequence
1508-
topologies. This is a relatively common approach to visualizing the full
1509-
"Ancestral Recombination Graph" or ARG (a structure in which some nodes are
1510-
"recombination nodes", and which is possible to
1511-
{ref}`represent as a tree sequence <msprime:sec_ancestry_full_arg>`).
1506+
of nodes connected by edges. Standard graph visualization software,
1507+
such as [graphviz](https://graphviz.org) can therefore be used to depict tree sequence
1508+
topologies. Alternatively, the [tskit_arg_visualizer](https://github.com/kitchensjn/tskit_arg_visualizer)
1509+
project will draw a interactive `tskit` graph directly. Below is an example, which uses the
1510+
`variable_edge_width` option to highlight the spans of genome inherited through different routes.
1511+
Nodes can be dragged horizontally and embedded trees highlighted:
15121512

1513+
```{code-cell} ipython3
1514+
:"tags": ["hide-input"]
1515+
%%javascript
1516+
require.config({paths: {d3: 'https://d3js.org/d3.v7.min'}});
1517+
require(["d3"], function(d3) {window.d3 = d3;});
1518+
```
15131519

1514-
:::{todo}
1515-
Link to the ARG tutorial,
1516-
[once it is created](https://github.com/tskit-dev/tutorials/issues/43), and show a
1517-
picture like this:
1520+
```{code-cell} ipython3
1521+
import msprime
1522+
import tskit_arg_visualizer
1523+
ts = msprime.sim_ancestry(4, sequence_length=1000, recombination_rate=0.001, random_seed=3)
1524+
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=ts)
1525+
tip_order = [0, 6, 3, 1, 2, 7, 4, 5] # Found by trial and error for this seed
1526+
d3arg.draw(width=500, height=500, variable_edge_width=True, sample_order=tip_order)
1527+
```
15181528

1519-
![A tree sequence (ARG) as a graph](https://user-images.githubusercontent.com/36134434/109398193-2ec6b700-7933-11eb-9cbf-99fdfab46df0.png)
1520-
(from [here](https://github.com/tskit-dev/tutorials/issues/43#issuecomment-787124425))
1521-
:::
1529+
For an `msprime` "full ARG" tree sequence, `edge_type="ortho"` can be used to draw a
1530+
traditional "[Ancestral Recombination Graph](sec_args)" style plot (variable edge widths
1531+
turned off for clarity):
1532+
1533+
```{code-cell} ipython3
1534+
import msprime
1535+
import tskit_arg_visualizer
1536+
full_arg_ts = msprime.sim_ancestry(
1537+
4, sequence_length=1000, recombination_rate=0.001, random_seed=3, record_full_arg=True)
1538+
d3arg = tskit_arg_visualizer.D3ARG.from_ts(ts=full_arg_ts)
1539+
d3arg.draw(width=500, height=500, edge_type="ortho", sample_order=tip_order)
1540+
```
1541+
1542+
For more general graph plots, it can be helpful convert the tree sequence to a
1543+
[networkx](https://networkx.org) graph first, as described in the
1544+
{ref}`sec_args_other_analysis` section of the {ref}`sec_args` tutorial.
1545+
This provides interfaces to graph plotting software such as
1546+
[graphviz](https://graphviz.org), which provides the `dot` layout engine for
1547+
directed graphs:
1548+
1549+
```{code-cell} ipython3
1550+
:"tags": ["hide-input"]
1551+
## Networkx conversion code taken from the ARG tutorial
1552+
1553+
import networkx as nx
1554+
import pandas as pd
1555+
import tskit
1556+
1557+
def to_networkx_graph(ts, interval_lists=False):
1558+
"""
1559+
Make an nx graph from a tree sequence. If `intervals_lists` is True, then
1560+
each graph edge will have an ``intervals`` attribute containing a *list*
1561+
of tskit.Intervals per parent/child combination. Otherwise each graph edge
1562+
will correspond to a tskit edge, with a ``left`` and ``right`` attribute.
1563+
"""
1564+
D = dict(source=ts.edges_parent, target=ts.edges_child, left=ts.edges_left, right=ts.edges_right)
1565+
G = nx.from_pandas_edgelist(pd.DataFrame(D), edge_attr=True, create_using=nx.MultiDiGraph)
1566+
if interval_lists:
1567+
GG = nx.DiGraph() # Mave a new graph with one edge that can contai
1568+
for parent, children in G.adjacency():
1569+
for child, edict in children.items():
1570+
ilist = [tskit.Interval(v['left'], v['right']) for v in edict.values()]
1571+
GG.add_edge(parent, child, intervals=ilist)
1572+
G = GG
1573+
nx.set_node_attributes(G, {n.id: {'flags':n.flags, 'time': n.time} for n in ts.nodes()})
1574+
return G
1575+
```
1576+
1577+
```{code-cell} ipython3
1578+
import networkx as nx
1579+
from IPython.display import SVG
1580+
1581+
def graphviz_svg(networkx_graph):
1582+
AG = nx.drawing.nx_agraph.to_agraph(networkx_graph) # Convert to graphviz "agraph"
1583+
nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
1584+
AG.add_subgraph(nodes_at_time_0, rank='same') # put time=0 at same rank
1585+
return AG.draw(prog="dot", format="svg")
1586+
1587+
G = to_networkx_graph(ts, interval_lists=True) # Function from the ARG tutorial
1588+
print("Converted `ts` to a networkx graph named `G`")
1589+
print("Plotting using graphviz...")
1590+
SVG(graphviz_svg(G))
1591+
```
1592+
1593+
Alternatively, you can read the `graphviz` positions back into `networkx`
1594+
and use the `networkx` drawing functionality, which
1595+
relies upon the [matplotlib](https://matplotlib.org) library. This
1596+
allows modification of node colours and symbols, labels, rotations,
1597+
annotations, etc., as shown below:
1598+
1599+
```{code-cell} ipython3
1600+
:"tags": ["hide-input"]
1601+
from matplotlib import pyplot as plt
1602+
import string
1603+
1604+
def get_graphviz_positions(networkx_graph):
1605+
AG = nx.drawing.nx_agraph.to_agraph(networkx_graph) # Convert to graphviz "agraph"
1606+
nodes_at_time_0 = [k for k, v in networkx_graph.nodes(data=True) if v['time'] == 0]
1607+
AG.add_subgraph(nodes_at_time_0, rank='same') # put time=0 at same rank
1608+
AG.layout(prog="dot") # create the layout, storing positions in the "pos" attribute
1609+
return {n: [float(x) for x in AG.get_node(n).attr["pos"].split(",")] for n in G.nodes()}
1610+
1611+
pos=get_graphviz_positions(G)
1612+
edge_labels = {
1613+
edge[0:2]: "\n".join([f"[{int(i.left)},{int(i.right)})" for i in edge[2]["intervals"]])
1614+
for edge in G.edges(data=True)
1615+
}
1616+
1617+
samples = set(ts.samples())
1618+
nonsamples = set(range(ts.num_nodes)) - samples
1619+
1620+
plt.figure(figsize=(10, 4))
1621+
# Sample nodes as dark green squares with white text
1622+
nx.draw(G, pos, node_color="#007700", font_size=9, node_size=250, node_shape="s", nodelist=samples, edgelist=[])
1623+
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: u for u in samples}, font_color="white")
1624+
# Others as blue circles with alphabetic labels
1625+
nx.draw(G, pos, node_color="#22CCFF", font_size=9, edgecolors="black", nodelist=nonsamples, edgelist=[])
1626+
nx.draw_networkx_labels(G, pos, font_size=9, labels={u: string.ascii_lowercase[u] for u in nonsamples})
1627+
1628+
nx.draw_networkx_edges(G, pos, edge_color="lightgrey", arrows=False, width=2);
1629+
nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=7, rotate=False);
1630+
```
1631+
1632+
Note, however, that finding node and edge layout positions that avoid too much overlap
1633+
can be tricky, even for the graphviz layout engine, and there is no easy functionality
1634+
to place nodes at specific vertical (time) positions.
15221635

15231636
(sec_tskit_viz_other_demographic)=
15241637

0 commit comments

Comments
 (0)