This repository contains the code used to generate the results reported in "Bidirectional genetic and phenotypic links between smoking and striatal iron content involving dopaminergic and inflammatory pathways" (Addiction, 2026) investigating phenotypic, genetic, and causal relationships between tobacco smoking and MRI-derived markers of iron content in the striatum using UK Biobank data.
The aim of this project is to examine the relationship between smoking behaviour (smoking initiation, smoking status, pack-years, and years since cessation) and brain iron accumulation in striatal regions implicated in reward processing (putamen, caudate, and nucleus accumbens). To this end, the project integrates:
- Phenotypic association analyses using individual-level UK Biobank data
- Genome-wide association study (GWAS) summary statistics
- Genetic correlation analyses
- Cross-GWAS gene-wise coherence and causal inference analyses
- Mendelian randomisation (MR)
The analyses combine MRI-derived markers of iron content (T2* and quantitative susceptibility mapping, QSM) with smoking-related phenotypes to investigate shared genetic architecture and potential bidirectional causal mechanisms.
The repository is organised into the following main directories:
.
├── gwas/
├── phenotypic/
├── ldsc/
├── pascalx/
└── MR/
Download and format GWAS summary statistics used across downstream genetic analyses.
- Download GWAS summary statistics from public repositories
(download_gwas_stats.sh) - Merge UK Biobank brain GWAS summary statistics from discovery and replication samples using an inverse-variance weighted estimator
(gwas_qsm_stats_merge.R) - Generate analysis-ready GWAS files compatible with LD Score Regression
(gwas_stats_format_for_ldsc.py)
Phenotypic association analyses between smoking-related measures and striatal iron markers using individual-level UK Biobank data.
- Custom dataset with variables of interest
(dataset_creation.py) - Data preprocessing
(data_preprocessing.py) - Linear regression models
(linear_models.py) - Correlation analyses
(correlations.py) - Visualisation and plotting
(plot_phen_gen_corr.py) - Robustness metrics for sensitivity analyses
(robustness_metrics.py)
Genetic correlations between smoking phenotypes and striatal iron traits using Linkage Disequilibrium Score Regression.
- Cross-trait genetic correlation analyses
(ldsc_gcorr.sh) - False discovery rate correction
(ldsc_fdr.py)
Cross-GWAS gene-wise coherence testing and causal relationship analyses using PascalX.
- Cross-trait coherence test
(xscorer.py) - Cross-trait ratio test (causal analysis)
(xscorer_ratio.py) - Short script to create anti-coherence result file from coherence file (
$p_{anticoherence} = 1 - p_{coherence}$ ) to save time
(make_anticohe_from_cohe.py) - Visualisation and plotting
(gene_tables_heatmaps.py)
Supporting files:
xscorer_config.py: PascalX configuration forxscorer.pyandxscorer_ratio.pycluster_loc.csv: names and locations of genes that are part of a clusterconfounder_genes.csv: genes previously associated with possible confounders (weekly alcohol consumption and serum iron)genes2remove.csv: non-coding genes and duplicates with different names
Mendelian randomisation analyses assessing potential causal relationships between smoking and striatal iron measures.
- MR forward
(MR_forward.R) - MR reverse
(MR_reverse.R) - Sensitivity analysis
(MR_sensitivity.R) - Visualisation and plotting
(MR_figure.py)
Supporting files:
MR_functions.R: Functions for MR and sensitivity analysisconfig_forward.R: Configuration for MR forwardconfig_reverse.R: Configuration for MR reverse
- UK Biobank data: Access requires an approved UK Biobank application. Individual-level data are not included in this repository.
- GWAS summary statistics: Publicly available sources were used (see scripts and comments in
gwas/for details).
Analyses were conducted using a combination of Python, R, and external genetic analysis tools.
Core Python and R package requirements are listed in requirements.txt and include:
- Python 3.7 with standard scientific computing libraries
- R 4.2.2 with the
TwoSampleMRpackage (v0.5.7)
The following external tools were used and must be installed separately:
- LD Score Regression v1.0.1 (Python 2.7)
- PascalX v0.0.3 (Python 3.8.19)
Due to data access restrictions, full reproduction of results requires authorised access to UK Biobank data.
For questions regarding the code or analyses, please contact the corresponding author of the paper.