vaxrank

Vaxrank is the vaccine peptide ranking component of the OpenVax pipeline for designing personalized cancer vaccines. Given a patient's somatic mutations, tumor RNA sequencing data, and HLA type, Vaxrank selects and ranks the mutant peptides most likely to elicit a T-cell response, producing a report suitable for guiding vaccine manufacture.

Overview

Personalized cancer vaccines (also called neoantigen vaccines) work by training the immune system to recognise peptides that arise from somatic mutations unique to a patient's tumor. Designing such a vaccine requires a computational pipeline that bridges raw sequencing data and the peptide synthesiser:

Variant calling — Whole-exome or whole-genome sequencing of the tumor and matched normal identifies somatic mutations. This is typically done with tools such as MuTect or Strelka, upstream of Vaxrank.
Mutant transcript assembly — Tumor RNA-seq reads overlapping each mutation are assembled by Isovar to determine the true mutant protein sequence. This step phases nearby germline variants and captures any mutation-associated splicing differences, producing a more accurate reading frame than DNA-only prediction.
MHC binding prediction — Candidate epitopes (short peptide subsequences spanning the mutation) are scored for predicted binding to the patient's HLA class I molecules using mhctools, which wraps predictors such as MHCflurry, NetMHCpan, and BigMHC.
Vaccine peptide selection — Vaxrank assembles longer synthetic long peptides (SLPs, typically 25-mers) around the mutation, scores them by the number and strength of their predicted MHC-binding epitopes, filters out peptides that appear in the reference proteome, annotates known cancer hotspot mutations, and ranks candidates by a combined immunogenicity and manufacturability score.

Vaxrank outputs ranked reports in ASCII, HTML, PDF, and XLSX formats. Each report lists the top vaccine peptide candidates per variant, their predicted epitopes, and supporting evidence from the RNA data.

Clinical Use

Vaxrank is the ranking engine behind the OpenVax neoantigen vaccine pipeline, which has been used in several clinical trials of personalized cancer vaccines at Mount Sinai:

PGV001 (NCT02721043) — A phase I study of personalised neoantigen vaccines in patients with solid and haematologic malignancies. All 11 treated patients developed neoantigen-specific T-cell responses (Bortman et al., Cancer Discovery 2025).
PGV001 + atezolizumab in urothelial cancer (NCT03359239) — A phase I trial combining PGV001 with checkpoint inhibition. The combination was safe and induced neoantigen-specific CD4+ and CD8+ T-cell responses in all evaluated patients (Galsky et al., Nature Cancer 2025).
PGV001 + TTFields in newly diagnosed glioblastoma (NCT03223103) — A phase I trial combining PGV001 with tumor treating fields and standard-of-care temozolomide (paper in preparation).

The computational pipeline used in these trials is described in Kodysh & Rubinsteyn, Methods Mol. Biol. 2020.

Quick Start

vaxrank \
    --vcf tests/data/b16.f10/b16.vcf \
    --bam tests/data/b16.f10/b16.combined.bam \
    --vaccine-peptide-length 25 \
    --mhc-predictor netmhc \
    --mhc-alleles H2-Kb,H2-Db \
    --padding-around-mutation 5 \
    --output-ascii-report vaccine-peptides.txt \
    --output-pdf-report vaccine-peptides.pdf \
    --output-html-report vaccine-peptides.html

Inputs:

--vcf — Somatic variants (VCF from any variant caller)
--bam — Tumor RNA-seq alignments (used by Isovar to assemble mutant transcripts)
--mhc-alleles — Patient HLA alleles (e.g. HLA-A*02:01,HLA-B*07:02)
--mhc-predictor — Which MHC binding predictor to use (see table below)

Installation

pip install vaxrank

Requirements: Python 3.9+

Vaxrank uses PyEnsembl for reference genome annotation. Install an Ensembl release matching your reference genome:

# GRCh38
pyensembl install --release 113 --species human
# GRCh37 (legacy)
pyensembl install --release 75 --species human

PDF report generation uses wkhtmltopdf by default:

brew install --cask wkhtmltopdf

Alternatively, pass --pdf-backend=weasyprint to use WeasyPrint (experimental), which has no external binary dependency:

pip install weasyprint
# macOS also needs: brew install pango

On Apple Silicon, WeasyPrint loads Pango via dyld, which doesn't search Homebrew's /opt/homebrew/lib by default. Add this to your shell profile:

export DYLD_FALLBACK_LIBRARY_PATH="/opt/homebrew/lib:$DYLD_FALLBACK_LIBRARY_PATH"

(Intel macOS doesn't need this — Homebrew's /usr/local/lib is in dyld's default fallback path.)

Configuration

YAML config file

Common parameters can be stored in a YAML file to avoid repeating them on every run:

vaxrank --config my_config.yaml --vcf variants.vcf --bam tumor.bam

Example my_config.yaml:

epitopes:
  min_score: 0.00001                        # drop epitopes below this score
  scoring_mode: affinity                    # "affinity" or "percentile_rank"
  logistic_midpoint: 350.0                  # IC50 (nM) at which score = 0.5
  logistic_width: 150.0                     # steepness of logistic curve
  affinity_cutoff: 5000.0                   # IC50 >= this → score 0
  percentile_rank_cutoff: 10.0              # rank >= this → score 0 (percentile mode)
  top_epitopes_per_candidate: 1000          # 0 = keep all

vaccine_peptides:
  preferred_length: 25                      # target amino acids per vaccine peptide
  min_length: 25                            # minimum vaccine peptide length
  max_length: 25                            # maximum vaccine peptide length
  padding_around_mutation: 5                # off-centre windows to consider
  per_mutation: 1                           # peptides to keep per variant
  max_epitopes_per_candidate: 1000          # 0 = keep all
  score_fraction_of_best: 0.99              # drop candidates scoring < 99% of best
  manufacturability:                        # GRAVY = mean hydropathy
    max_c_terminal_hydropathy: 1.5          # max GRAVY of C-terminal 7-mer
    min_kmer_hydropathy: 0.0                # min max-7mer GRAVY (floor)
    max_kmer_hydropathy_low_priority: 1.5   # low-priority max-7mer GRAVY cap
    max_kmer_hydropathy_high_priority: 2.5  # high-priority max-7mer GRAVY cap

Custom filtering and scoring with the topiary DSL

For anything beyond the scalar logistic / percentile-rank defaults, set epitopes.filter_expr and/or epitopes.score_expr to a topiary DSL string. Both accept the full topiary 5.0 expression grammar (kind accessors like affinity / presentation, arithmetic, & / |, .logistic(...) / .clip(...) transforms, column(col_name) for raw DataFrame columns, etc.).

epitopes:
  # Drop rows wholesale before scoring
  filter_expr: "affinity <= 500 & affinity.rank <= 2.0"
  # Compute a per-(peptide, allele) score in [0, 1] (binder-quality score)
  score_expr:  "affinity.logistic_normalized(350, 150)"

When filter_expr is omitted, no rows are dropped up-front; the default score_expr is synthesized from the scalar fields above (binding_affinity_cutoff, logistic_midpoint, logistic_width, etc.) and masked so ic50 >= affinity_cutoff → 0, reproducing the pre-5.0 behavior byte-for-byte.

Use affinity.logistic_normalized(m, w) for a [0, 1] binder-quality score (the topiary 5.1+ primitive); the plain affinity.logistic(m, w) is the raw sigmoid and caps below 1 (≈0.912 at default m=350, w=150).

Invalid DSL strings are rejected at config load (not mid-pipeline), so typos in the YAML surface before any predictions run.

CLI overrides

CLI arguments override YAML values. You can also use --config-value to override individual keys without editing the file:

vaxrank --config my_config.yaml \
  --config-value vaccine_peptides.score_fraction_of_best=0.95 \
  --config-value epitopes.percentile_rank_cutoff=5.0

Use --config-text when the right-hand side should be kept as a raw string instead of being YAML-parsed.

Resolution order

Config values are resolved in order (later wins):

Compiled-in defaults (see vaxrank/config/defaults.py)
YAML config file (--config)
--config-value / --config-text overrides
Dedicated CLI flags (e.g. --vaccine-peptide-length)

Config reference

`EpitopeConfig` — epitope scoring and filtering

Field	Default	Description
`logistic_epitope_score_midpoint`	350.0	IC50 (nM) at which epitope score = 0.5
`logistic_epitope_score_width`	150.0	Steepness of logistic scoring curve
`min_epitope_score`	0.00001	Epitopes scoring below this are dropped
`binding_affinity_cutoff`	5000.0	IC50 >= this → score 0
`scoring_mode`	`"affinity"`	`"affinity"` (IC50-based) or `"percentile_rank"`
`percentile_rank_cutoff`	10.0	Rank >= this → score 0 (percentile mode)
`filter_expr`	`None`	Topiary DSL string; drops rows where the expression is false. Parsed eagerly at config load.
`score_expr`	`None`	Topiary DSL string; overrides the default per-`(peptide, allele)` score.

`VaccineConfig` — peptide assembly and manufacturability

Field	Default	Description
`preferred_peptide_length`	25	Preferred amino acids per vaccine peptide
`min_peptide_length`	25	Minimum vaccine peptide length
`max_peptide_length`	25	Maximum vaccine peptide length
`padding_around_mutation`	5	Off-centre window positions to consider
`max_vaccine_peptides_per_variant`	1	Peptides to keep per variant
`num_mutant_epitopes_to_keep`	1000	Max epitope predictions per peptide (0 = all)
`score_fraction_of_best`	0.99	Drop candidates scoring below this fraction of the best
`max_c_terminal_hydropathy`	1.5	Max GRAVY score of the C-terminal 7-mer
`min_kmer_hydropathy`	0.0	Minimum max-7mer GRAVY (floor)
`max_kmer_hydropathy_low_priority`	1.5	Low-priority max-7mer GRAVY cap
`max_kmer_hydropathy_high_priority`	2.5	High-priority max-7mer GRAVY cap

The four *_hydropathy* fields control the manufacturability tie-breaking in vaccine peptide ranking. See VaccinePeptide.peptide_synthesis_difficulty_score_tuple for details on how each threshold is applied.

MHC Binding Predictors

Vaxrank integrates with MHC binding predictors via mhctools. Use --mhc-predictor <name> to select one:

`--mhc-predictor`	Tool	MHC Class	Notes
`mhcflurry`	MHCflurry	I	Open-source neural network; installed with mhctools
`bigmhc`	BigMHC	I	Auto-detects EL or IM model
`bigmhc-el`	BigMHC EL	I	Presentation (eluted ligand) model
`bigmhc-im`	BigMHC IM	I	Immunogenicity model
`pepsickle`	Pepsickle	I	Proteasomal cleavage predictor
`netmhc`	NetMHC	I	Auto-detects NetMHC3 or NetMHC4
`netmhc3`	NetMHC 3.x	I	Requires local install
`netmhc4`	NetMHC 4.0	I	Requires local install
`netmhcpan`	NetMHCpan	I	Auto-detects installed version
`netmhcpan28`	NetMHCpan 2.8	I	Requires local install
`netmhcpan3`	NetMHCpan 3.x	I	Requires local install
`netmhcpan4`	NetMHCpan 4.0	I	Default mode (EL + BA)
`netmhcpan4-ba`	NetMHCpan 4.0	I	Binding affinity mode only
`netmhcpan4-el`	NetMHCpan 4.0	I	Eluted ligand mode only
`netmhcpan41`	NetMHCpan 4.1	I	Default mode (EL + BA)
`netmhcpan41-ba`	NetMHCpan 4.1	I	Binding affinity mode only
`netmhcpan41-el`	NetMHCpan 4.1	I	Eluted ligand mode only
`netmhcpan42`	NetMHCpan 4.2	I	Default mode (EL + BA)
`netmhcpan42-ba`	NetMHCpan 4.2	I	Binding affinity mode only
`netmhcpan42-el`	NetMHCpan 4.2	I	Eluted ligand mode only
`netmhccons`	NetMHCcons	I	Requires local install
`netmhcstabpan`	NetMHCstabpan	I	Stability predictor; requires local install
`netchop`	NetChop	--	Proteasomal cleavage predictor
`netmhciipan`	NetMHCIIpan	II	Auto-detects installed version
`netmhciipan3`	NetMHCIIpan 3.x	II	Requires local install
`netmhciipan4`	NetMHCIIpan 4.0	II	Default mode (EL + BA)
`netmhciipan4-ba`	NetMHCIIpan 4.0	II	Binding affinity mode only
`netmhciipan4-el`	NetMHCIIpan 4.0	II	Eluted ligand mode only
`netmhciipan43`	NetMHCIIpan 4.3	II	Default mode (EL + BA)
`netmhciipan43-ba`	NetMHCIIpan 4.3	II	Binding affinity mode only
`netmhciipan43-el`	NetMHCIIpan 4.3	II	Eluted ligand mode only
`mixmhcpred`	MixMHCpred	I	Requires local install
`netmhcpan-iedb`	NetMHCpan via IEDB	I	Uses IEDB web API
`netmhccons-iedb`	NetMHCcons via IEDB	I	Uses IEDB web API
`netmhciipan-iedb`	NetMHCIIpan via IEDB	II	Uses IEDB web API
`smm-iedb`	SMM via IEDB	I	Uses IEDB web API
`smm-pmbec-iedb`	SMM-PMBEC via IEDB	I	Uses IEDB web API
`random`	Random	--	Returns random scores; for testing only

How It Works

Upstream inputs

Vaxrank does not perform variant calling or read alignment itself. Those steps happen upstream, typically as part of a larger bioinformatics pipeline (e.g. neoantigen-vaccine-pipeline):

Tumor and matched-normal DNA are sequenced and aligned; a variant caller (MuTect, Strelka, etc.) produces a VCF of somatic mutations.
Tumor RNA is sequenced and aligned to produce a BAM file.
The patient's HLA class I alleles are typed (from sequencing data or clinical records).

Vaxrank takes these three inputs — the VCF, the tumor RNA BAM, and the HLA alleles — and produces a ranked list of vaccine peptide candidates.

Mutant transcript assembly (Isovar)

For each somatic variant, Isovar extracts RNA-seq reads overlapping the mutant locus and assembles them into a mutant protein fragment. This is more accurate than simply applying the DNA variant to the reference transcript because it:

Phases adjacent germline and somatic variants that fall on the same read, producing the true amino acid sequence
Captures splicing differences such as intron retention events that may alter the reading frame near the mutation
Confirms expression — variants with no supporting RNA reads are filtered out

Epitope scoring

Each mutant protein fragment is sliced into overlapping subsequences of epitope length (typically 8–15 amino acids). These candidate epitopes are scored for predicted MHC binding affinity using the selected predictor. Binding predictions are converted to a score between 0 and 1 via a logistic function parameterised by the EpitopeConfig settings.

Vaccine peptide ranking

Candidate vaccine peptides (longer SLPs, typically 25-mers) are constructed around each mutation. Each candidate is scored by the combined immunogenicity of the epitopes it contains. Candidates are then filtered and ranked by:

Epitope content — total predicted immunogenicity score
Reference proteome filtering — peptides matching the human reference proteome are removed to ensure only truly novel sequences are selected
Cancer hotspot annotation — variants at known recurrently mutated positions (bundled data from cancerhotspots.org, ~2,700 mutations across cancer types) are flagged
Manufacturability — tie-breaking by hydropathy-based synthesis difficulty (C-terminal and 7-mer window GRAVY scores)

Key modules

core_logic.py: Main vaccine peptide selection algorithm
epitope_logic.py: Epitope scoring and filtering
reference_proteome.py: Set-based kmer index for reference proteome filtering (O(1) lookup, built once and cached)
cancer_hotspots.py: Cancer mutation hotspot annotation
vaccine_peptide.py: Vaccine peptide scoring and manufacturability
report.py: Report generation (ASCII, HTML, PDF, XLSX)

Papers & Citations

Vaxrank algorithm:

Rubinsteyn, A., Hodes, I., Kodysh, J. & Hammerbacher, J. Vaxrank: A Computational Tool For Designing Personalized Cancer Vaccines. bioRxiv (2017).

OpenVax pipeline (methods):

Kodysh, J. & Rubinsteyn, A. OpenVax: An Open-Source Computational Pipeline for Cancer Neoantigen Prediction. Methods Mol. Biol. 2120, 147–160 (2020).

PGV001 clinical results:

Bortman et al. PGV001, a Multi-Peptide Personalized Neoantigen Vaccine Platform: Phase I Study in Patients with Solid and Hematologic Malignancies in the Adjuvant Setting. Cancer Discovery 15(5), 930–945 (2025).

Galsky et al. Atezolizumab plus personalized neoantigen vaccination in urothelial cancer: a phase 1 trial. Nature Cancer (2025).

BibTeX for the Vaxrank paper:

@article {Rubinsteyn142919,
    author = {Rubinsteyn, Alex and Hodes, Isaac and Kodysh, Julia and Hammerbacher, Jeffrey},
    title = {Vaxrank: A Computational Tool For Designing Personalized Cancer Vaccines},
    year = {2017},
    doi = {10.1101/142919},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2017/05/27/142919},
    journal = {bioRxiv}
}

Dependencies

Vaxrank is built on the OpenVax ecosystem:

pyensembl: Reference genome annotation
varcode: Variant effect prediction from DNA
isovar: RNA-based mutant transcript assembly and variant phasing
mhctools: Unified interface to MHC binding predictors

Other key dependencies:

msgspec: Configuration serialization (YAML/JSON)
pandas, numpy: Data processing
jinja2, pdfkit/weasyprint: Report generation

Development

To install Vaxrank for local development:

git clone git@github.com:openvax/vaxrank.git
cd vaxrank
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .
# Examples; adjust release to match your reference
pyensembl install --release 113 --species human
pyensembl install --release 113 --species mouse

Run linting and tests:

./lint.sh && ./test.sh

The first run of the tests may take a while to build the reference proteome kmer index, but subsequent runs will use the cached index.

Scripts

develop.sh: installs the package in editable mode and sets PYTHONPATH to the repo root.
lint.sh: runs ruff on vaxrank and tests.
test.sh: runs pytest with coverage.
deploy.sh: runs lint/tests, builds a distribution with build, uploads via twine, and tags the release (vX.Y.Z). Deploy is restricted to the main/master branch.

Name		Name	Last commit message	Last commit date
Latest commit History 559 Commits
.github/workflows		.github/workflows
tests		tests
vaxrank		vaxrank
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASING.md		RELEASING.md
code-of-conduct.md		code-of-conduct.md
deploy.sh		deploy.sh
develop.sh		develop.sh
lint.sh		lint.sh
mkdocs.yml		mkdocs.yml
pylintrc		pylintrc
requirements.txt		requirements.txt
run-vaxrank-b16-test-data.sh		run-vaxrank-b16-test-data.sh
setup.py		setup.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vaxrank

Overview

Clinical Use

Quick Start

Installation

Configuration

YAML config file

Custom filtering and scoring with the topiary DSL

CLI overrides

Resolution order

Config reference

`EpitopeConfig` — epitope scoring and filtering

`VaccineConfig` — peptide assembly and manufacturability

MHC Binding Predictors

How It Works

Upstream inputs

Mutant transcript assembly (Isovar)

Epitope scoring

Vaccine peptide ranking

Key modules

Papers & Citations

Dependencies

Development

Scripts

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vaxrank

Overview

Clinical Use

Quick Start

Installation

Configuration

YAML config file

Custom filtering and scoring with the topiary DSL

CLI overrides

Resolution order

Config reference

EpitopeConfig — epitope scoring and filtering

VaccineConfig — peptide assembly and manufacturability

MHC Binding Predictors

How It Works

Upstream inputs

Mutant transcript assembly (Isovar)

Epitope scoring

Vaccine peptide ranking

Key modules

Papers & Citations

Dependencies

Development

Scripts

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`EpitopeConfig` — epitope scoring and filtering

`VaccineConfig` — peptide assembly and manufacturability

Packages