Skip to content

Commit 1c8c988

Browse files
authored
Update README.md
1 parent 0054cab commit 1c8c988

1 file changed

Lines changed: 38 additions & 10 deletions

File tree

build/mpnst/README.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,62 @@
11
## Build Instructions for MPNST Dataset
22

33
To build the MPNST dataset, follow these steps from the coderdata root
4-
directory. Currently using the test files as input.
4+
directory.
55

6-
1. Build the Docker image:
6+
### Step 1: Set the SYNAPSE_AUTH_TOKEN Environment Variable.
7+
This is required to download the data.
8+
```
9+
export SYNAPSE_AUTH_TOKEN="Your Synapse Token"
10+
```
11+
### Step 2: Choose an option below depending on your needs.
12+
---
13+
### Option 1: QuickBuild the test dataset using build_dataset.py
14+
15+
This quick build process does not map sample identifers with previous data versions and is only for personal use.
16+
```
17+
python build/build_dataset.py --dataset mpnst --build
18+
```
19+
---
20+
### Option 2: Build the test dataset using build_dataset.py with a previous dataset.
21+
22+
This build process assumes you already built or have access to a previously built dataset. This previous dataset must be located in `$PWD/local`. The validate argument ensures the output aligns with the schema.
23+
```
24+
python build/build_dataset.py --dataset mpnst --build --validate --use_prev_dataset beataml
25+
```
26+
---
27+
### Option 3: Build each test file one at a time.
28+
This process does not map sample identifers with previous data versions and is only for personal use.
29+
30+
1. Create an empty local directory in the coderdata root directory.
31+
```
32+
mkdir local
33+
```
34+
2. Build the Docker image with the optional HTTPS_PROXY argument:
735
```
836
docker build -f build/docker/Dockerfile.mpnst -t mpnst . --build-arg HTTPS_PROXY=$HTTPS_PROXY
937
```
1038

11-
2. Generate new identifiers for these samples to create a
39+
3. Generate new identifiers for these samples to create a
1240
`mpnst_samples.csv` file. This pulls from the latest synapse
1341
project metadata table.
1442
```
15-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_samples.sh /tmp/build/build_test/test_samples.csv
43+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_samples.sh [Previous Samples file or Empty Quotes ("")]
1644
```
1745

18-
3. Pull the data and map it to the samples. This uses the metadata
46+
4. Pull the data and map it to the samples. This uses the metadata
1947
table pulled above.
2048
```
21-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_omics.sh /tmp/build/build_test/test_genes.csv /tmp/mpnst_samples.csv
49+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_omics.sh /tmp/genes.csv /tmp/mpnst_samples.csv
2250
```
2351

24-
4. Process drug data
52+
5. Process drug data
2553
```
26-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_drugs.sh /tmp/build/build_test/test_drugs.tsv
54+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_drugs.sh [Previous Drugs file or Empty Quotes ("")]
2755
```
2856

29-
5. Process experiment data. This uses the metadata from above as well as the file directory on synapse:
57+
6. Process experiment data. This uses the metadata from above as well as the file directory on synapse:
3058
```
31-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv.gz
59+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv
3260
```
3361

3462
Please ensure that each step is followed in order for correct dataset compilation.

0 commit comments

Comments
 (0)