Author: Joshua Siegel & Matthew Singh.
Modified and documented by Ruiqi Chen.
This repo contains the fMRI preprocessing scripts for MINDy resting state & task state modeling. The pipeline is similar to (Singh et al., 2020) with the differences in bold text:
- HCP minimal preprocessing pipeline (WITHOUT ICA-FIX) were applied.
- Framewise displacement (FD) and temporal derivative of variation (DVARS) were calculated (see below) and saved.
- FD and DVARS were filtered for respiratory artifact with a 40th-order 0.06-0.14 Hz band-stop filter and then saved in another file. (Note: DVARS was NOT used; only the UNFILTERED FD was used later.)
- Data were demeaned and standardized per vertex using the vertex mean & std from corresponding resting state scanning run (for resting state, just the current run; for task, rfMRI session one run one for RL or session one run two for LR).
- The mean and linear trend in the data was removed with MATLAB's
detrend(). - EV files were loaded and saved along with the data. Nuisance regressors for (Siegel et al., 2017) pipeline were also loaded and saved with the data.
- Confound regression (four version, the last one is the default):
GSR = 0: 12 HCP movement regressors and their quadratics.GSR = 1:GSR = 0+ mean signal from white matter, CSF, and gray matter, and their derivatives.GSR = 2:GSR = 0+ top 5 principal components of white matter and CSF (CompCor).GSR = 3:GSR = 2+ mean signal from gray matter.- Note: in all cases, the intercept was included in regression and retained.
- Motion scrubbing with UNFILTERED FD but NOT DVARS:
- Frames with FD > .2mm for rest and FD > .9mm for task (Etzel, 2023) were censored.
- Censored frames were linearly interpolated with MATLAB's
interp1()after parcellation.- Note: the first frame will always be censored since we don't know whether it has high motion. Besides, if frame 2, 3, ... are all bad, it is necessary to censor frame 1 too.
- Note: we don't use extrapolation since that might be very wrong for long sequences at both ends. Therefore, censored frames at both ends (e.g., the first frame) will be replaced with NaN.
- Data were averaged within each parcel according to Schaefer atlas (Schaefer et al., 2018). There would be one output file for each level of granularity (100-1000 parcel) and each subject.
- Combined with the previous standardization step, we are actually computing a weighted average across the vertices, with weights being the precision of the vertex timeseries in resting state. The intercept would not matter as it will be removed later during Z-scoring in MINDy fitting.
- Also, unlike in
Singh2020PreProc, no frame was removed during preprocessing. - Some more preprocessing steps (e.g., HRF deconvolution, subsampling the derivatives, etc.) will be applied in MINDy functions (not included here).
IMPORTANT: The parcellation files used in the repo is an outdated version of the Schaefer atlas released on Jan 29, 2018 in commit 4c32a6b of the official repo (CBIG/stable_projects/brain_parcellation/Schaefer2018_LocalGlobal/Parcellations/HCP/fslr32k/cifti at 4c32a6b). The Schaefer parcellations have since been updated, including name changes (v0.14.3 update in Sep 2019) and the change of medial wall vertices (v0.17.1 update in May 2020). For some parcellations, even the order of ROI has changed.
Also note that although the scripts can be used to preprocessed both "minimally preprocessed" and FIX-ICA versions of data, the DVARS is always calculated using the "minimally preprocessed" (no FIX-ICA) data. This is more conservative because more high-motion frames will be identified and interpolated when using this more noisy version of data.
The main entry point to the scripts is MINDyRestTask/MyStartServerDT_HCP_All.m. MINDyRestTask/MyStartServerDT_HCP_Simple_00.m is a wrapper that calls it. You can include MINDyRestTask in MATLAB path to run it, or you can use a even higher level wrapper MyWrapper.m in the root directory. See the comments in MyWrapper.m for details.
The original (Singh et. al., 2020) paper uses FSL and a bash script (MINDyRestTask/utilities/dvars_nichols.sh) from Thomas Nichols to compute "standardized DVARS". Seems like this script came from here as given by Nichols' 2013 paper.
However, we now use a package developed by Nichols and Afyouni in 2017 called DVARS associated with their NeuroImage paper to calculate the "relative DVARS". For whatever reason, the results from the two methods are different by ~0.1% even after scaling (note that scaling DVARS will not influence our results). We still decided to use this one since it's newer and listed on Thomas Nichols' website. It also does not require FSL (though it requires MATLAB's statistical toolbox). And most importantly, it has much fewer file IO, making it much faster than the old one on our server (10 min per run -> 100s per run).
However, if you want to use the identical pipeline as Singh2020, you can set use_old_fslDVARS = true in calc_DVAR_mod.m.
The directory MINDyRestTask contains the main scripts. You'll need to add this folder (not its subfolders) to MATLAB path. Alternatively, you can call the wrapper MyWrapper.m from the root which will identify the unprocessed data folders, manage the paths and run the codes.
The subdirectory MINDyRestTask/utilities contains the external package DVARS and some atlas files.
Apart from those, you need to download fieldtrip and specify the fieldtrip directory at the beginning of MINDyRestTask/MyStartServerDT_HCP_All.m.
If you want to compute DVARS with the old bash scripts instead of the MATLAB package, you'll also need to set up FSL.
It took about 1000 seconds to process one subject's data (resting state & all tasks) on an Intel Xeon E5-2670 CPU (2.60 GHz) with single thread. Since the main process is just a giant FOR-loop over the subjects (in MINDyRestTask/MyStartServerDT_HCP_All.m), it could be massively paralleled if needed.
The input and output folders are specified in MINDyRestTask/MyStartServerDT_HCP_Simple_00.m. The input folder should be the HCP folder (containing one subfolder for each participant) and the output folder will contain one MAT file for each participant and each level of parcellation, entitled like sub[HCP_ID]Y[xx]_All.mat where [xx] is the number of parcels divided by 100, e.g., sub100206Y01_All.mat. Each file contains a single structure X with the following fields:
TaskName: The session names.(1, 1 + nTask)cellstr, one for resting state followed by one for each task.Dat: The data.(1, 1 + nTask)cell array (the first one being the resting state), each containing a(1, nRuns)cell array. The cells (runs) were ordered according toMINDyRestTask/RestNameSetup.mandMINDyRestTask/TaskNameSetup.m(currently, for rest: day1_LR, day1_RL, day2_LR, day2_RL; for task: LR, RL; note that this might be different from the actual scanning order for each participant). Each cell contains a(nParcels + 19, nTRs)matrix. The parcels are in the order of Schaefer atlas (Schaefer et al., 2018), followed by the 19 FreeSurfer subcortical regions.regs: The nuisance regressors.(1, 1 + nTask)structure array, one for resting state followed by one for each task. Each structure contains a single fieldrun, which is a(1, nRuns)structure array (one for each run). Each structure contains the following field:MV:(nTRs, 12), timeseries of six rigid-body movement regressors followed by their derivatives.CompCor:(nTRs, 10), the top five principal components of white matter timeseries, followed by the top five principal components of cerebrospinal fluid timeseries.GSR:(nTRs, 1), the mean over gray matter.
RegW: The regression slopes used during preprocessing, averaged within each parcel.(1, 1 + nTask)cell array, each containing a(1, nRuns)cell array containing one structure with the following fields (depending on preprocessing settings, some might be missing):MV,MV2(corresponding to squaredMV),CompCor,GSR(ifGSR = 3, the first two columns will be zeros), andGSR_prime(corresponding to[[0 0 0]; diff(GSR)]).switches: Preprocessing settings.(1, 1 + nTask)cell array containing one structure in each cell. SeeMINDyRestTask/SwitchSetup.mfor details. Some fields of the structures that worth mentioning:FIX: Whether to use FIX-ICA data. Note that the DVARS and nuisance regressors are all calculated using the "minimally preprocessed" (no FIX-ICA) data.nGSR: The level of nuisance regression. Here it will only influence the regressors to save inregs. Default is 3.DVARcut: The threshold for motion scrubbing (ratio over the median of each run). Here we turn it off in the scripts.
EV: The explanatory variables directly loaded from HCPEVs/files.(1, 1 + nTask)cell array containing(1, nRuns)cell array (except for resting state, which is empty). Each cell contains a structure with the field names being the EVs and the field values being(N, 3)matrices in FSL format ([onset, duration, value]). Note that the EVs are not convolved with HRF and the units are seconds.QC: Quality control statistics.(1, 1 + nTask)cell array containing one structure in each cell, with the following fields that worth mentioning:frames:(1, nRuns)matrix, number of motion-free frames in each run.excluded:(1, nRuns)binary matrix, whether each run should be excluded. Basicallyframes < switches.minframes.run:(1, nRuns)structure array, with the following fields:tmask:(1, nTRs)binary vector, whether each frame should be excluded based on movement.FD:(1, nTRs)vector, the framewise displacement.DV:(1, nTRs)vector, the DVARS.DVcut: The DVARS cut threshold for this run.switches.DVARcut * median(DVARS).
TaskMatch: Indices of the resting state run that were used to standardize (i.e., using its mean and std) the task data.(1, nTask)cell array containing(1, nRuns)cell array. Each cell contains a single number, the index of the resting state run that was used to standardize the corresponding task run.
HCP data on the WUSTL NIL cluster (need WUSTL VPN to access):
/net/10.27.136.121/hcpdb/packages/unzip/HCP_1200HCP data on the CHPC cluster:
/ceph/hcpdb/packages/unzip/HCP_1200They contain all 1113 subjects' neuroimaging data.
Singh, M. F., Braver, T. S., Cole, M. W., & Ching, S. (2020). Estimation and validation of individualized dynamic brain models with resting state fMRI. NeuroImage, 221, 117046. https://doi.org/10.1016/j.neuroimage.2020.117046
Siegel, J. S., Mitra, A., Laumann, T. O., Seitzman, B. A., Raichle, M., Corbetta, M., & Snyder, A. Z. (2017). Data Quality Influences Observed Links Between Functional Connectivity and Behavior. Cerebral Cortex, 27(9), 4492–4502. https://doi.org/10.1093/cercor/bhw253
Etzel, J. A. (2023). Efficient evaluation of the Open QC task fMRI dataset. Frontiers in Neuroimaging, 2. https://www.frontiersin.org/articles/10.3389/fnimg.2023.1070274
Schaefer, A., Kong, R., Gordon, E. M., Laumann, T. O., Zuo, X.-N., Holmes, A. J., Eickhoff, S. B., & Yeo, B. T. T. (2018). Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cerebral Cortex, 28(9), 3095–3114. https://doi.org/10.1093/CERCOR/BHX179