Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions doc/Estimating-amino-acid-substitution-models.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
layout: userdoc
title: "Estimating amino acid substitution models"
author: Hector Banos, Cuong Cao Dang, Minh Bui, Thomas Wong
date: 2024-06-28
author: Hector Banos, Cuong Cao Dang, Minh Bui, Thomas Wong, Ryo Harada
date: 2026-05-26
docid: 12
icon: info-circle
doctype: manual
Expand All @@ -18,6 +18,8 @@ sections:
url: estimating-a-non-reversible-model
- name: Estimating linked exchangeabilities
url: estimating-linked-exchangeabilities
- name: Estimating a multiple exchangeability and profile model
url: estimating-a-multiple-exchangeability-and-profile-model
---


Expand Down Expand Up @@ -172,7 +174,7 @@ To estimate a non-reversible model from a folder of alignments:
Estimating linked exchangeabilities
-----------------------------------

Starting with version 2.3.5, IQ-TREE allows users to estimate linked exchangeabilities under [profile mixture models](Substitution-Models#protein-mixture-models).
Starting with version 2.3.5, IQ-TREE allows users to estimate linked exchangeabilities under [profile mixture models](Substitution-Models#protein-mixture-models), called GTRpmix ([Banos et al., 2024]).

To start with, we show an example:

Expand Down Expand Up @@ -204,7 +206,30 @@ Because these routines can be computationally expensive, two exchangeability mat

If you use this routine in a publication please cite:

> __H. Banos et al.__ (2024) GTRpmix: A linked general-time reversible model for profile mixture models. _BioRxiv_. <https://doi.org/10.1101/2024.03.29.587376>
> __H. Banos et al.__ (2024) GTRpmix: A linked general time-reversible model for profile mixture models. _Molecular Biology and Evolution_ 41:msae174. <https://doi.org/10.1093/molbev/msae174>


Estimating a multiple exchangeability and profile model
-------------------------------------------------------

GTRspmix is a protein mixture model with multiple exchangeability matrices (GTRs) and profiles. Profiles are clustered into several groups, with each profile cluster linked to a unique exchangeability matrix.

IQ-TREE version 3.1.3 provides general GTRspmix models (SXXpfamCYY series).
To use these models, here is an example:

iqtree3 -s <alignment> -m MFP --madd S10pfamC60+G4,S28pfamC59+G4,S28pfamC60+G4

In addition to these, users can use a Python wrapper for IQ-TREE to optimize their own GTRspmix models.
This tool guides you through the estimation workflow and prepares the necessary model files for IQ-TREE. For detailed instructions, prerequisites, and usage examples, please visit the repository:
https://github.com/HRD-Ryo/GTRspmix

To use the custom model generated by this tool, run:

iqtree3 -s <alignment> -mdef <model_best.nex> -m SXXFYY+G4

If you use the GTRspmix model, please cite the following paper:

> __R. Harada et al.__ (2026) GTRspmix: Capturing Heterogeneity of Exchangeabilities Across Sites to Improve Protein Phylogenetics. (In preparation)


[Dang et al., 2022]: https://doi.org/10.1093/sysbio/syac007
Expand All @@ -213,3 +238,4 @@ If you use this routine in a publication please cite:
[El-Gebali et al., 2018]: https://doi.org/10.1093/nar/gky995
[Duchêne et al., 2019]: https://doi.org/10.1093/molbev/msz291
[Ran et al., 2018]: https://doi.org/10.1098/rspb.2018.1012
[Banos et al., 2024]: https://doi.org/10.1093/molbev/msae174
13 changes: 9 additions & 4 deletions doc/Substitution-Models.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
layout: userdoc
title: "Substitution Models"
author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato
date: 2025-06-10
author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato, Ryo Harada
date: 2026-05-19
docid: 10
icon: book
doctype: manual
Expand Down Expand Up @@ -170,6 +170,7 @@ IQ-TREE supports all common empirical amino-acid exchange rate matrices (alphabe
| FLAVI | viral | Flavivirus ([Le and Vinh, 2020]). |
| FLU | viral | Influenza virus ([Dang et al., 2010]). |
| GTR20 | general | General time reversible models with 190 rate parameters. *WARNING: Be careful when using this parameter-rich model as parameter estimates might not be stable, especially when not having enough phylogenetic information (e.g. not long enough alignments).* |
| G.pfam | nuclear | General GTRpmix exchangeability matrix estimated from Pfam version 31 database ([El-Gebali et al., 2018]). To be used with profile mixture models (for eg. G.pfam+C60)<!-- ([Harada et al., 2026])-->. |
| HIVb | viral | HIV between-patient matrix HIV-B<sub>m</sub> ([Nickle et al., 2007]). |
| HIVw | viral | HIV within-patient matrix HIV-W<sub>m</sub> ([Nickle et al., 2007]). |
| JTT | nuclear | General matrix ([Jones et al., 1992]). |
Expand Down Expand Up @@ -215,12 +216,16 @@ IQ-TREE also supports a series of protein mixture models:
| LG4M | Four-matrix model fused with [Gamma rate heterogeneity](#rate-heterogeneity-across-sites) ([Le et al., 2012]).
| LG4X | Four-matrix model fused with [FreeRate heterogeneity](#rate-heterogeneity-across-sites) ([Le et al., 2012]).
| CF4 | Five-profile mixture model ([Wang et al., 2008]).
| S10pfamC60, S28pfamC59, S28pfamC60 | General GTRspmix models estimated from Pfam version 31 database ([El-Gebali et al., 2018]). Profiles from the C60 set are grouped into 10 or 28 clusters using Site Posterior Probability Co-occurrence (SPPC), with each profile cluster linked to a unique exchangeability matrix. `S28pfamC59` and `S28pfamC60` are variants optimized by adjusting a low-weight profile<!-- ([Harada et al., 2026])-->.
| S10pfamC10, S20pfamC20, S30pfamC30 | Computationally efficient general GTRspmix models estimated from the Pfam version 31 database ([El-Gebali et al., 2018]). Each profile in the C10, C20, or C30 sets is directly linked to its own unique exchangeability matrix, designed to minimize computational cost for massive datasets<!-- ([Harada et al., 2026])-->.


One can even combine a protein matrix with a profile mixture model like:

* `LG+C20`: Applying `LG` matrix instead of `Poisson` for all 20 classes of AA profiles and a Gamma rate heterogeneity.
* `LG+C20+F`: Applying `LG` matrix for 20 classes plus the 21th class of empirical AA profile (counted from the current data) and Gamma rate heterogeneity.
* `JTT+CF4+G`: Applying `JTT` matrix for all 5 classes of AA profiles and Gamma rate heteorogeneity.
* `S28pfamC60+G`: Applying `S28pfamC60` mixture model with Gamma rate heterogeneity.

Moreover, one can override the Gamma rate by FreeRate heterogeneity:

Expand Down Expand Up @@ -435,7 +440,7 @@ Users can fix the parameters of the model. For example, `+I{0.2}` will fix the p
[Abascal et al., 2007]: https://doi.org/10.1093/molbev/msl136
[Adachi and Hasegawa, 1996]: https://doi.org/10.1007/BF02498640
[Adachi et al., 2000]: https://doi.org/10.1007/s002399910038
[Banos et al., 2024]: https://doi.org/10.1101/2024.03.29.587376
[Banos et al., 2024]: https://doi.org/10.1093/molbev/msae174
[Bielawski and Gold, 2002]: https://doi.org/10.1093/genetics/161.4.1589
[Dang et al., 2010]: https://doi.org/10.1186/1471-2148-10-99
[Dang et al., 2022]: https://doi.org/10.1093/sysbio/syac007
Expand Down Expand Up @@ -495,4 +500,4 @@ Users can fix the parameters of the model. For example, `+I{0.2}` will fix the p
[ej91016/MorphoParse]: https://github.com/ej91016/MorphoParse
[davidcerny/GEOS26100-Fall2022]: https://github.com/davidcerny/GEOS26100-Fall2022
[Černý & Simonoff (2023)]: https://doi.org/10.1038/s41598-023-35784-3

<!--[Harada et al., 2026]: https://doi.org/XXXXX-->