Skip to content

Commit bc7d75a

Browse files
authored
Merge pull request #250 from PNNL-CompBio/mpnst-readme-update
Update MPNST and build_dataset.py README.md files
2 parents f2120e4 + 7b5024b commit bc7d75a

2 files changed

Lines changed: 46 additions & 16 deletions

File tree

build/README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,21 +56,22 @@ It requires the following authorization tokens to be set in the local environmen
5656
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Follow the directions above to use gain access.
5757

5858
Available arguments:
59-
- `--dataset`: Required. Name of the dataset to build.
59+
- `--dataset`: Required. Name of the dataset to build. At a minimum, this will build the docker images.
6060
- `--use_prev_dataset`: Optional. Prefix of the previous dataset for sample and drug ID continuation. The previous dataset files must be in the "local" directory.
61-
- `--validate`: Optional. Runs the schema checker on the built files.
61+
- `--build`: Optional. Build the desired Dataset.
62+
- `--validate`: Optional. Run the schema checker on the built files.
6263
- `--continue`: Optional. Continues from where the build left off by skipping existing files in "local" directory.
6364
Example usage:
6465

6566
Build the broad_sanger dataset:
6667
```bash
67-
python build/build_dataset.py --dataset broad_sanger
68+
python build/build_dataset.py --build --dataset broad_sanger
6869
```
6970
Build the mpnst dataset continuing from broad_sanger sample and drug IDs:
7071
```bash
71-
python build/build_dataset.py --dataset mpnst --use_prev_dataset broad_sanger
72+
python build/build_dataset.py --build --dataset mpnst --use_prev_dataset broad_sanger
7273
```
73-
Build the hcmi dataset and run validation:
74+
Build run schema validation on hcmi dataset:
7475
```bash
7576
python build/build_dataset.py --dataset hcmi --validate
7677
```

build/mpnst/README.md

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,63 @@
11
## Build Instructions for MPNST Dataset
22

33
To build the MPNST dataset, follow these steps from the coderdata root
4-
directory. Currently using the test files as input.
4+
directory.
55

6-
1. Build the Docker image:
6+
### Step 1: Set the SYNAPSE_AUTH_TOKEN Environment Variable.
7+
This is required to download the data.
8+
```
9+
export SYNAPSE_AUTH_TOKEN="Your Synapse Token"
10+
```
11+
### Step 2: Choose an option below depending on your needs.
12+
---
13+
### Option 1: QuickBuild the test dataset using build_dataset.py
14+
15+
This quick build process does not map sample identifers with previous data versions and is only for personal use.
16+
```
17+
python build/build_dataset.py --dataset mpnst --build
18+
```
19+
---
20+
### Option 2: Build the test dataset using build_dataset.py with a previous dataset.
21+
22+
This build process assumes you already built or have access to a previously built dataset. This previous dataset must be located in `$PWD/local`. The validate argument ensures the output aligns with the schema.
23+
```
24+
python build/build_dataset.py --dataset mpnst --build --validate --use_prev_dataset beataml
25+
```
26+
---
27+
### Option 3: Build each test file one at a time.
28+
This process does not map sample identifers with previous data versions and is only for personal use.
29+
30+
1. Create an empty local directory in the coderdata root directory.
31+
```
32+
mkdir local
33+
```
34+
2. Build the Docker image with the optional HTTPS_PROXY argument:
735
```
836
docker build -f build/docker/Dockerfile.mpnst -t mpnst . --build-arg HTTPS_PROXY=$HTTPS_PROXY
937
```
1038

11-
2. Generate new identifiers for these samples to create a
39+
3. Generate new identifiers for these samples to create a
1240
`mpnst_samples.csv` file. This pulls from the latest synapse
1341
project metadata table.
1442
```
15-
docker run -v $PWD:/tmp -e -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_samples.sh /tmp/build/build_test/test_samples.csv
16-
```
1743
18-
3. Pull the data and map it to the samples. This uses the metadata
44+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_samples.sh [Previous Samples file or Empty Quotes ("")]
45+
46+
47+
4. Pull the data and map it to the samples. This uses the metadata
1948
table pulled above.
2049
```
21-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_omics.sh /tmp/build/build_test/test_genes.csv /tmp/mpnst_samples.csv
50+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_omics.sh /tmp/genes.csv /tmp/mpnst_samples.csv
2251
```
2352
24-
4. Process drug data
53+
5. Process drug data
2554
```
26-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_drugs.sh /tmp/build/build_test/test_drugs.tsv
55+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_drugs.sh [Previous Drugs file or Empty Quotes ("")]
2756
```
2857
29-
5. Process experiment data. This uses the metadata from above as well as the file directory on synapse:
58+
6. Process experiment data. This uses the metadata from above as well as the file directory on synapse:
3059
```
31-
docker run -v $PWD:/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst sh build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv.gz
60+
docker run -v "$PWD/local":/tmp -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN mpnst bash build_exp.sh /tmp/mpnst_samples.csv /tmp/mpnst_drugs.tsv
3261
```
3362
3463
Please ensure that each step is followed in order for correct dataset compilation.

0 commit comments

Comments
 (0)