You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a recent explosion of deep learning algorithms that to tackle
4
-
the computational problem of predicting drug treatment outcome from
5
-
baseline molecular measurements. To support this,we have built a
6
-
Python package that enables access to and facile usage of cancer drug
7
-
sensitivity datsets for AI applications.
3
+
There is a recent explosion of deep learning algorithms that to tackle the computational problem of predicting drug treatment outcome from baseline molecular measurements. To support this,we have built a benchmark dataset that harmonizes diverse datasets to better assess algorithm performance.
8
4
9
5
This package collects diverse sets of paired molecular datasets with corresponding drug sensitivity data. All data here is reprocessed and standardized so it can be easily used as a benchmark dataset for the
10
6
This repository leverages existing datasets to collect the data
|`build_samples.sh`|[latest_samples]| Latest version of samples generated by coderdata build |
80
+
|`build_omics.sh`|[gene file][samplefile]| This includes the `genes.csv` that was generated in the original build as well as the sample file generated above. |
81
+
|`build_drugs.sh`|[drugfile1,drugfile2,...]| This includes a comma-delimited list of all drugs files generated from previous build |
82
+
|`build_exp.sh`|[samplfile ][drugfile]| sample file and drug file generated by previous scripts |
83
+
84
+
5. Put the Docker container file inside the [Docker
85
+
directory](./build/docker) with the name
86
+
`Dockerfile.[datasetname]`.
87
+
88
+
6. Run `build_all.py` from the root directory, which should now add in
89
+
your Dockerfile in the mix and call the scripts in your Docker
## Build a local version using the `build_all.py` script
11
+
## build_all.py script
12
12
13
13
This script initializes all docker containers, builds all datasets, validates them, and uploads them to figshare and pypi.
14
14
15
15
It requires the following authorization tokens to be set in the local environment depending on the use case:
16
-
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
17
-
`PYPI_TOKEN`: This token is required to upload to PyPI.
18
-
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
19
-
`GITHUB_TOKEN`: This token is required to upload to GitHub.
16
+
`SYNAPSE_AUTH_TOKEN`: Required for beataml and mpnst datasets. Join the [CoderData team](https://www.synapse.org/#!Team:3503472) on Synapse and generate an access token.
17
+
`PYPI_TOKEN`: This token is required to upload to PyPI.
18
+
`FIGSHARE_TOKEN`: This token is required to upload to Figshare.
19
+
`GITHUB_TOKEN`: This token is required to upload to GitHub.
20
+
20
21
**Available arguments**:
21
22
22
23
-`--docker`: Initializes and builds all docker containers.
@@ -34,20 +35,20 @@ It requires the following authorization tokens to be set in the local environmen
34
35
-`--github-username`: GitHub username matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
35
36
-`--github-email`: GitHub email matching the GITHUB_TOKEN. Required to push the new Tag to the GitHub Repository.
36
37
37
-
**Example usage**:
38
-
- Build all datasets and upload to Figshare and PyPI and GitHub.
39
-
Required tokens for the following command: `SYNAPSE_AUTH_TOKEN`, `PYPI_TOKEN`, `FIGSHARE_TOKEN`, `GITHUB_TOKEN`.
38
+
**Example usage**:
39
+
- Build all datasets and upload to Figshare and PyPI and GitHub.
40
+
Required tokens for the following command: `SYNAPSE_AUTH_TOKEN`, `PYPI_TOKEN`, `FIGSHARE_TOKEN`, `GITHUB_TOKEN`.
**Note**: Preceding steps will not automatically be run. This assumes that docker images, samples, omics, and drugs were all previously built. Ensure all required tokens are set.
45
+
- Build only the experiment files.
46
+
**Note**: Preceding steps will not automatically be run. This assumes that docker images, samples, omics, and drugs were all previously built. Ensure all required tokens are set.
46
47
```bash
47
48
python build/build_all.py --exp
48
49
```
49
50
50
-
## Build/test individual datset using the `build_dataset.py` script
51
+
## build_dataset.py script
51
52
This script builds a single dataset for **debugging purposes only**. It can help determine if a dataset will build correctly in isolation. Note that the sample and drug identifiers generated may not align with those from other datasets, so this script is not suitable for building production datasets.
52
53
53
54
It requires the following authorization tokens to be set in the local environment depending on the dataset:
@@ -78,59 +79,6 @@ Build the broad_sanger dataset but skip previously built files in "local" direct
|`build_samples.sh`|[latest_samples]| Latest version of samples generated by coderdata build |
120
-
|`build_omics.sh`|[gene file][samplefile]| This includes the `genes.csv` that was generated in the original build as well as the sample file generated above. |
121
-
|`build_drugs.sh`|[drugfile1,drugfile2,...]| This includes a comma-delimited list of all drugs files generated from previous build |
122
-
|`build_exp.sh`|[samplfile ][drugfile]| sample file and drug file generated by previous scripts |
123
-
124
-
5. Put the Docker container file inside the [Docker
125
-
directory](./build/docker) with the name
126
-
`Dockerfile.[datasetname]`.
127
-
128
-
6. Run `build_all.py` from the root directory, which should now add in
129
-
your Dockerfile in the mix and call the scripts in your Docker
0 commit comments