|
1 | 1 | ## Build Instructions for MPNST Dataset |
2 | 2 |
|
3 | 3 | To build the MPNST dataset, follow these steps from the coderdata root |
4 | | -directory. Currently using the test files as input. |
| 4 | +directory. |
5 | 5 |
|
6 | | -1. Build the Docker image: |
| 6 | +### Step 1: Set the SYNAPSE_AUTH_TOKEN Environment Variable. |
| 7 | +This is required to download the data. |
| 8 | +``` |
| 9 | +export SYNAPSE_AUTH_TOKEN="Your Synapse Token" |
| 10 | +``` |
| 11 | +### Step 2: Choose an option below depending on your needs. |
| 12 | +--- |
| 13 | +### Option 1: QuickBuild the test dataset using build_dataset.py |
| 14 | + |
| 15 | +This quick build process does not map sample identifers with previous data versions and is only for personal use. |
| 16 | +``` |
| 17 | +python build/build_dataset.py --dataset mpnst --build |
| 18 | +``` |
| 19 | +--- |
| 20 | +### Option 2: Build the test dataset using build_dataset.py with a previous dataset. |
| 21 | + |
| 22 | +This build process assumes you already built or have access to a previously built dataset. This previous dataset must be located in `$PWD/local`. The validate argument ensures the output aligns with the schema. |
| 23 | +``` |
| 24 | +python build/build_dataset.py --dataset mpnst --build --validate --use_prev_dataset beataml |
| 25 | +``` |
| 26 | +--- |
| 27 | +### Option 3: Build each test file one at a time. |
| 28 | +This process does not map sample identifers with previous data versions and is only for personal use. |
| 29 | + |
| 30 | +1. Create an empty local directory in the coderdata root directory. |
| 31 | + ``` |
| 32 | + mkdir local |
| 33 | + ``` |
| 34 | +2. Build the Docker image with the optional HTTPS_PROXY argument: |
7 | 35 | ``` |
8 | 36 | docker build -f build/docker/Dockerfile.mpnst -t mpnst . --build-arg HTTPS_PROXY=$HTTPS_PROXY |
9 | 37 | ``` |
10 | 38 |
|
11 | | -2. Generate new identifiers for these samples to create a |
| 39 | +3. Generate new identifiers for these samples to create a |
12 | 40 | `mpnst_samples.csv` file. This pulls from the latest synapse |
13 | 41 | project metadata table. |
14 | 42 | ``` |
15 | | - docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_samples.sh /tmp/build/build_test/test_samples.csv |
| 43 | + docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_samples.sh [Previous Samples file or Empty Quotes ("")] |
16 | 44 | ``` |
17 | 45 |
|
18 | | -3. Pull the data and map it to the samples. This uses the metadata |
| 46 | +4. Pull the data and map it to the samples. This uses the metadata |
19 | 47 | table pulled above. |
20 | 48 | ``` |
21 | | - docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_omics.sh /tmp/build/build_test/test_genes.csv /tmp/mpnst_samples.csv |
| 49 | + docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_omics.sh /tmp/genes.csv /tmp/mpnst_samples.csv |
22 | 50 | ``` |
23 | 51 |
|
24 | | -4. Process drug data |
| 52 | +5. Process drug data |
25 | 53 | ``` |
26 | | - docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_drugs.sh /tmp/build/build_test/test_drugs.tsv |
| 54 | + docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_drugs.sh [Previous Drugs file or Empty Quotes ("")] |
27 | 55 | ``` |
28 | 56 |
|
29 | | -5. Process experiment data. This uses the metadata from above as well as the file directory on synapse: |
| 57 | +6. Process experiment data. This uses the metadata from above as well as the file directory on synapse: |
30 | 58 | ``` |
31 | | - docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv.gz |
| 59 | + docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv |
32 | 60 | ``` |
33 | 61 |
|
34 | 62 | Please ensure that each step is followed in order for correct dataset compilation. |
|
0 commit comments