GTRspmix is a protein mixture model with multiple exchangeability matrices (GTRs) and profiles.
Profiles are clustered into several groups. Then each group has own GTR matrix.
Details are shown in our paper. Please cite it if you find this useful.
Ryo Harada et. al. (2026). GTRspmix: Capturing Heterogeneity of Exchangeability Across Sites to Improve Protein Phylogenetics. (In preparation).
Note: We are planning to update scripts to use real EM instead of approximation.
This script iteratively optimizes GTRspmix model parameters by repeating specific steps (EM approximation). Use --opt-gtr and/or --opt-profile-FO/--opt-profile-F to enable the corresponding optimization phases.
Each iteration consists of two main phases. Each phase is only executed if its corresponding flag is enabled.
| Phase | Step | Optimized params | Fixed params | Required flags |
|---|---|---|---|---|
| GTR Optimization | 1 | GTR, Profile | --opt-gtr |
|
| 2 | GTR |
|
--opt-gtr |
|
| Profile Optimization | 3 | GTR, Profile |
--opt-profile-FO or --opt-profile-F
|
|
| 4 | Profile |
|
--opt-profile-FO or --opt-profile-F
|
-
$\theta$ is a parameter set including branch lengths, weights, and rates. - If only
--opt-gtris set: Step 3 and 4 are skipped. The script only optimizes GTR matrices and global parameters. - If only
--opt-profile-FOor--opt-profile-Fis set: Step 1 and 2 are skipped. The script optimizes only frequency profiles and global parameters. - If
--opt-gtrand either--opt-profile-FOor--opt-profile-Fare set: All steps (1–4) are executed in each iteration.
Profile optimization is experimental and may be unstable.
For optimizing GTR and profile parameters, soft-partitioning is used to approximate EM algorithm.
We generate a sub-alignment for each profile cluster or profile class.
Each site is allocated round(
Please download the source code from the latest release in this repository. Unzip the downloaded archive and navigate to the directory:
GTRspmix script has been tested with following specific versions:
Singularity is highly recommended for HPC clusters to ensure the reproducibility of the environment. The Singularity image includes both IQ-TREE and gotree.
sudo singularity build gtrspmix.sif singularity.def
singularity run gtrspmix.sif -h
If you prefer not to use Singulairty, you have to install IQ-TREE and gotree on your system.
Then python requirements will be installed by
pip install -r requirements.txt
If IQ-TREE and gotree are not available in your system $PATH, you must specify their absolute paths using the following flags during execution:
--iqtree /path/to/iqtree3--gotree /path/to/gotree
In addition to alignment and guide tree, GTRspmix requires an initial profile mixture model.
Or
- Empirical Models: Standard empirical models (C60 series and SXXCYY series) are pre-installed. You can specify these names (e.g.,
--nexus-few C10 --nexus-many C60,--nexus C60, and--model S10C60) without providing external files.
Select a mode based on your starting input.
| Mode | Input Requirements | Use Case |
|---|---|---|
| FromScratch (SPPC) |
--nexus-few & --nexus-many |
Cluster a large profile set (e.g. MEOW60) into fewer groups (e.g. 10). |
| FromScratch (K-means) |
--nexus & -km |
Directly cluster profiles using K-means. |
| ReStart | --nexus & --json |
Resume an interrupted run using a model and clustering JSON from previous run. |
| PreDefined | --model |
Fine-tune an empirical GTRspmix model (e.g., S10pfamC60). |
You can control which parameters are optimized using the following flags:
--opt-gtr: Optimize GTRs.--opt-profile-FO: Optimize frequency profiles by ML estimation. (Note: This function is Experimental)--opt-profile-F: Optimize frequency profiles by weighted observed frequency. (Note: This function is Experimental)--opt-gtrand either--opt-profile-FOor--opt-profile-F: Performs full optimization of GTRspmix model parameters.
| File | Description |
|---|---|
model_best.nex |
The final optimized model. |
GTRspmix_maker.log |
The main execution log. Check this file to monitor the optimization progress, specifically the |
d_cluster.json |
Cluster mapping information. This JSON file records which Profile classes belong to which GTR cluster. Essential for restarting runs or manually inspecting the model structure. |
model_X.nex |
The model file generated at iteration |
Detailed flags can be found via gtrspmix.py --help. Below are typical command examples.
gtrspmix.py \
--opt-gtr \
-s alignment.fasta \
-t guide.treefile \
--nexus-few meow_10.nex \
--nexus-many meow_60.nex \
-m-gtr20 ELM \
-m-rate G4 \
--scale-gtr 10 \
-me-theta 0.01 \
-me-gtr 0.99 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_outgtrspmix.py \
--opt-gtr \
-s alignment.fasta \
-te guide.treefile \
--kmeans 10 \
--nexus meow_60.nex \
-m-gtr20 ELM \
-m-rate G4 \
--scale-gtr 10 \
-me-theta 0.01 \
-me-gtr 0.99 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_outWhen you restart runs, please copy latest nexus file and d_cluster.json file. Then specify new output directory.
gtrspmix.py \
--opt-gtr \
-s alignment.fasta \
-te guide.treefile \
--json d_cluster.json \
--nexus model_best.nex \
-m-rate G4 \
--scale-gtr 10 \
-me-theta 0.01 \
-me-gtr 0.99 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_out_restartgtrspmix.py \
--opt-gtr \
-s alignment.fasta \
-te guide.treefile \
--model S10pfamC60 \
-m-rate G4 \
--scale-gtr 10 \
-me-theta 0.01 \
-me-gtr 0.99 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_outgtrspmix.py \
--opt-gtr \
--opt-profile-FO \
-s alignment.fasta \
-t guide.treefile \
--nexus-few meow_10.nex \
--nexus-many meow_60.nex \
-m-gtr20 ELM \
-m-rate G4 \
--scale-gtr 10 \
--scale-profile 100 \
-me-theta 0.01 \
-me-gtr 0.99 \
-me-pro 0.01 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_outgtrspmix.py \
--opt-profile-FO \
-s alignment.fasta \
-te guide.treefile \
--nexus meow_60.nex \
-m-gtr20 ELM \
-m-rate G4 \
--scale-profile 100 \
-me-theta 0.01 \
-me-pro 0.01 \
-me 10 \
-nt 8 \
-mem 100G \
-o GTRspmix_outThis project is licensed under the GPL-3.0 License. See the LICENSE file for details.