You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: build/README.md
+57-14Lines changed: 57 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,32 +10,76 @@ are added.
10
10
11
11
## build_all.py script
12
12
13
-
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi.
13
+
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare.
14
14
15
-
It requires the following authorization tokens to be set in the local environment depending on the use case:
16
-
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
17
-
`PYPI_TOKEN`: This token is required to upload to PyPI.
18
-
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
15
+
It requires the following authorization tokens to be set in the local environment depending on the use case:
16
+
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
17
+
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
18
+
`GITHUB_TOKEN`: This token is required to upload to GitHub.
19
19
20
-
Available arguments:
20
+
**Available arguments**:
21
21
22
22
-`--docker`: Initializes and builds all docker containers.
23
23
-`--samples`: Processes and builds the sample data files.
24
24
-`--omics`: Processes and builds the omics data files.
25
25
-`--drugs`: Processes and builds the drug data files.
26
26
-`--exp`: Processes and builds the experiment data files.
27
-
-`--all`: Executes all available processes above (docker, samples, omics, drugs, exp).
28
-
-`--validate`: Validates the generated datasets using the schema check scripts.
29
-
-`--figshare`: Uploads the datasets to Figshare.
30
-
-`--pypi`: Uploads the package to PyPI.
31
-
-`--high_mem`: Utilizes high memory mode for concurrent data processing.
27
+
-`--all`: Executes all available processes above (docker, samples, omics, drugs, exp). This does not run the validate or figshare commands.
28
+
-`--validate`: Validates the generated datasets using the schema check scripts. This is automatically included if data upload occurs.
29
+
-`--figshare`: Uploads the datasets to Figshare. FIGSHARE_TOKEN must be set in local environment.
30
+
-`--high_mem`: Utilizes high memory mode for concurrent data processing. This has been successfully tested using 32 or more vCPUs.
32
31
-`--dataset`: Specifies the datasets to process (default='broad_sanger,hcmi,beataml,mpnst,cptac').
33
-
-`--version`: Specifies the version number for the package and data upload title. This is required to upload to figshare and PyPI
32
+
-`--version`: Specifies the version number for the Figshare upload title (e.g., "0.1.29"). This must be a higher version than previously published versions.
33
+
-`--github-username`: GitHub username matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
34
+
-`--github-email`: GitHub email matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
35
+
36
+
**Example usage**:
37
+
- Build all datasets and upload to Figshare and GitHub.
38
+
Required tokens for the following command: `SYNAPSE_AUTH_TOKEN`, `FIGSHARE_TOKEN`, `GITHUB_TOKEN`.
**Note**: Preceding steps will not automatically be run. This assumes that docker images, samples, omics, and drugs were all previously built. Ensure all required tokens are set.
45
+
```bash
46
+
python build/build_all.py --exp
47
+
```
34
48
49
+
## build_dataset.py script
50
+
This script builds a single dataset for **debugging purposes only**. It can help determine if a dataset will build correctly in isolation. Note that the sample and drug identifiers generated may not align with those from other datasets, so this script is not suitable for building production datasets.
51
+
52
+
It requires the following authorization tokens to be set in the local environment depending on the dataset:
53
+
54
+
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Follow the directions above to use gain access.
55
+
56
+
Available arguments:
57
+
-`--dataset`: Required. Name of the dataset to build. At a minimum, this will build the docker images.
58
+
-`--use_prev_dataset`: Optional. Prefix of the previous dataset for sample and drug ID continuation. The previous dataset files must be in the "local" directory.
59
+
-`--build`: Optional. Build the desired Dataset.
60
+
-`--validate`: Optional. Run the schema checker on the built files.
61
+
-`--continue`: Optional. Continues from where the build left off by skipping existing files in "local" directory.
| BeatAML | NCI Proteomic Data Commons | Mapping the proteogenomic landscape enables prediction of drug response in acute myeloid leukemia | James Pino et al. | 23
67
111
| MPNST | NF Data Portal | Chromosome 8 gain is associated with high-grade transformation in MPNST | David P Nusinow et al. | 24
0 commit comments