diff --git a/doc/Estimating-amino-acid-substitution-models.md b/doc/Estimating-amino-acid-substitution-models.md index 7e8cc8b..877dbf8 100644 --- a/doc/Estimating-amino-acid-substitution-models.md +++ b/doc/Estimating-amino-acid-substitution-models.md @@ -1,8 +1,8 @@ --- layout: userdoc title: "Estimating amino acid substitution models" -author: Hector Banos, Cuong Cao Dang, Minh Bui, Thomas Wong -date: 2024-06-28 +author: Hector Banos, Cuong Cao Dang, Minh Bui, Thomas Wong, Ryo Harada +date: 2026-05-26 docid: 12 icon: info-circle doctype: manual @@ -18,6 +18,8 @@ sections: url: estimating-a-non-reversible-model - name: Estimating linked exchangeabilities url: estimating-linked-exchangeabilities +- name: Estimating a multiple exchangeability and profile model + url: estimating-a-multiple-exchangeability-and-profile-model --- @@ -172,7 +174,7 @@ To estimate a non-reversible model from a folder of alignments: Estimating linked exchangeabilities ----------------------------------- -Starting with version 2.3.5, IQ-TREE allows users to estimate linked exchangeabilities under [profile mixture models](Substitution-Models#protein-mixture-models). +Starting with version 2.3.5, IQ-TREE allows users to estimate linked exchangeabilities under [profile mixture models](Substitution-Models#protein-mixture-models), called GTRpmix ([Banos et al., 2024]). To start with, we show an example: @@ -204,7 +206,30 @@ Because these routines can be computationally expensive, two exchangeability mat If you use this routine in a publication please cite: -> __H. Banos et al.__ (2024) GTRpmix: A linked general-time reversible model for profile mixture models. _BioRxiv_. +> __H. Banos et al.__ (2024) GTRpmix: A linked general time-reversible model for profile mixture models. _Molecular Biology and Evolution_ 41:msae174. + + +Estimating a multiple exchangeability and profile model +------------------------------------------------------- + +GTRspmix is a protein mixture model with multiple exchangeability matrices (GTRs) and profiles. Profiles are clustered into several groups, with each profile cluster linked to a unique exchangeability matrix. + +IQ-TREE version 3.1.3 provides general GTRspmix models (SXXpfamCYY series). +To use these models, here is an example: + + iqtree3 -s -m MFP --madd S10pfamC60+G4,S28pfamC59+G4,S28pfamC60+G4 + +In addition to these, users can use a Python wrapper for IQ-TREE to optimize their own GTRspmix models. +This tool guides you through the estimation workflow and prepares the necessary model files for IQ-TREE. For detailed instructions, prerequisites, and usage examples, please visit the repository: +https://github.com/HRD-Ryo/GTRspmix + +To use the custom model generated by this tool, run: + + iqtree3 -s -mdef -m SXXFYY+G4 + +If you use the GTRspmix model, please cite the following paper: + +> __R. Harada et al.__ (2026) GTRspmix: Capturing Heterogeneity of Exchangeabilities Across Sites to Improve Protein Phylogenetics. (In preparation) [Dang et al., 2022]: https://doi.org/10.1093/sysbio/syac007 @@ -213,3 +238,4 @@ If you use this routine in a publication please cite: [El-Gebali et al., 2018]: https://doi.org/10.1093/nar/gky995 [Duchêne et al., 2019]: https://doi.org/10.1093/molbev/msz291 [Ran et al., 2018]: https://doi.org/10.1098/rspb.2018.1012 +[Banos et al., 2024]: https://doi.org/10.1093/molbev/msae174 diff --git a/doc/Substitution-Models.md b/doc/Substitution-Models.md index f0af4cd..a0fafda 100644 --- a/doc/Substitution-Models.md +++ b/doc/Substitution-Models.md @@ -1,8 +1,8 @@ --- layout: userdoc title: "Substitution Models" -author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato -date: 2025-06-10 +author: Hector Banos, Cuong Cao Dang, Heiko Schmidt, Jana Trifinopoulos, Minh Bui, Nhan Ly-Trong, Hiroaki Sato, Ryo Harada +date: 2026-05-19 docid: 10 icon: book doctype: manual @@ -170,6 +170,7 @@ IQ-TREE supports all common empirical amino-acid exchange rate matrices (alphabe | FLAVI | viral | Flavivirus ([Le and Vinh, 2020]). | | FLU | viral | Influenza virus ([Dang et al., 2010]). | | GTR20 | general | General time reversible models with 190 rate parameters. *WARNING: Be careful when using this parameter-rich model as parameter estimates might not be stable, especially when not having enough phylogenetic information (e.g. not long enough alignments).* | +| G.pfam | nuclear | General GTRpmix exchangeability matrix estimated from Pfam version 31 database ([El-Gebali et al., 2018]). To be used with profile mixture models (for eg. G.pfam+C60). | | HIVb | viral | HIV between-patient matrix HIV-Bm ([Nickle et al., 2007]). | | HIVw | viral | HIV within-patient matrix HIV-Wm ([Nickle et al., 2007]). | | JTT | nuclear | General matrix ([Jones et al., 1992]). | @@ -215,12 +216,16 @@ IQ-TREE also supports a series of protein mixture models: | LG4M | Four-matrix model fused with [Gamma rate heterogeneity](#rate-heterogeneity-across-sites) ([Le et al., 2012]). | LG4X | Four-matrix model fused with [FreeRate heterogeneity](#rate-heterogeneity-across-sites) ([Le et al., 2012]). | CF4 | Five-profile mixture model ([Wang et al., 2008]). +| S10pfamC60, S28pfamC59, S28pfamC60 | General GTRspmix models estimated from Pfam version 31 database ([El-Gebali et al., 2018]). Profiles from the C60 set are grouped into 10 or 28 clusters using Site Posterior Probability Co-occurrence (SPPC), with each profile cluster linked to a unique exchangeability matrix. `S28pfamC59` and `S28pfamC60` are variants optimized by adjusting a low-weight profile. +| S10pfamC10, S20pfamC20, S30pfamC30 | Computationally efficient general GTRspmix models estimated from the Pfam version 31 database ([El-Gebali et al., 2018]). Each profile in the C10, C20, or C30 sets is directly linked to its own unique exchangeability matrix, designed to minimize computational cost for massive datasets. + One can even combine a protein matrix with a profile mixture model like: * `LG+C20`: Applying `LG` matrix instead of `Poisson` for all 20 classes of AA profiles and a Gamma rate heterogeneity. * `LG+C20+F`: Applying `LG` matrix for 20 classes plus the 21th class of empirical AA profile (counted from the current data) and Gamma rate heterogeneity. * `JTT+CF4+G`: Applying `JTT` matrix for all 5 classes of AA profiles and Gamma rate heteorogeneity. +* `S28pfamC60+G`: Applying `S28pfamC60` mixture model with Gamma rate heterogeneity. Moreover, one can override the Gamma rate by FreeRate heterogeneity: @@ -435,7 +440,7 @@ Users can fix the parameters of the model. For example, `+I{0.2}` will fix the p [Abascal et al., 2007]: https://doi.org/10.1093/molbev/msl136 [Adachi and Hasegawa, 1996]: https://doi.org/10.1007/BF02498640 [Adachi et al., 2000]: https://doi.org/10.1007/s002399910038 -[Banos et al., 2024]: https://doi.org/10.1101/2024.03.29.587376 +[Banos et al., 2024]: https://doi.org/10.1093/molbev/msae174 [Bielawski and Gold, 2002]: https://doi.org/10.1093/genetics/161.4.1589 [Dang et al., 2010]: https://doi.org/10.1186/1471-2148-10-99 [Dang et al., 2022]: https://doi.org/10.1093/sysbio/syac007 @@ -495,4 +500,4 @@ Users can fix the parameters of the model. For example, `+I{0.2}` will fix the p [ej91016/MorphoParse]: https://github.com/ej91016/MorphoParse [davidcerny/GEOS26100-Fall2022]: https://github.com/davidcerny/GEOS26100-Fall2022 [Černý & Simonoff (2023)]: https://doi.org/10.1038/s41598-023-35784-3 - +