Skip to content

Commit 3414aea

Browse files
committed
update documentation
1 parent b2725c2 commit 3414aea

1 file changed

Lines changed: 9 additions & 9 deletions

File tree

dataset/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This document provides instructions on downloading and preparing all datasets ut
2424
*TL;DR to download and prepare a dataset, run `dataset_setup.py`:*
2525

2626
```bash
27-
python3 datasets/dataset_setup.py \
27+
python3 dataset/dataset_setup.py \
2828
--data_dir=~/data \
2929
--<dataset_name>
3030
--<optional_flags>
@@ -88,7 +88,7 @@ By default, a user will be prompted before any files are deleted. If you do not
8888
From `algorithmic-efficiency` run:
8989

9090
```bash
91-
python3 datasets/dataset_setup.py \
91+
python3 dataset/dataset_setup.py \
9292
--data_dir $DATA_DIR \
9393
--ogbg
9494
```
@@ -124,7 +124,7 @@ In total, it should contain 13 files (via `find -type f | wc -l`) for a total of
124124
From `algorithmic-efficiency` run:
125125

126126
```bash
127-
python3 datasets/dataset_setup.py \
127+
python3 dataset/dataset_setup.py \
128128
--data_dir $DATA_DIR \
129129
--wmt
130130
```
@@ -194,7 +194,7 @@ you should get an email containing the URLS for "knee_singlecoil_train",
194194
"knee_singlecoil_val" and "knee_singlecoil_test".
195195

196196
```bash
197-
python3 datasets/dataset_setup.py \
197+
python3 dataset/dataset_setup.py \
198198
--data_dir $DATA_DIR \
199199
--fastmri \
200200
--fastmri_knee_singlecoil_train_url '<knee_singlecoil_train_url>' \
@@ -229,13 +229,13 @@ In total, it should contain 1280 files (via `find -type f | wc -l`) for a total
229229

230230
Register on <https://image-net.org/> and follow directions to obtain the
231231
URLS for the ILSVRC2012 train and validation images.
232-
The script will additionally automatically download the `matched-frequency` version of [ImageNet v2](https://www.tensorflow.org/datasets/catalog/imagenet_v2#imagenet_v2matched-frequency_default_config), which is used as the test set of the ImageNet workloads.
232+
The script will additionally automatically download the `matched-frequency` version of [ImageNet v2](https://www.tensorflow.org/dataset/catalog/imagenet_v2#imagenet_v2matched-frequency_default_config), which is used as the test set of the ImageNet workloads.
233233

234234
The ImageNet data pipeline differs between the PyTorch and JAX workloads.
235235
Therefore, you will have to specify the framework (either `pytorch` or `jax`) through the framework flag.
236236

237237
```bash
238-
python3 datasets/dataset_setup.py \
238+
python3 dataset/dataset_setup.py \
239239
--data_dir $DATA_DIR \
240240
--imagenet \
241241
--temp_dir $DATA_DIR/tmp \
@@ -349,7 +349,7 @@ In total, it should contain 20 files (via `find -type f | wc -l`) for a total of
349349
### Criteo1TB
350350

351351
```bash
352-
python3 datasets/dataset_setup.py \
352+
python3 dataset/dataset_setup.py \
353353
--data_dir $DATA_DIR \
354354
--temp_dir $DATA_DIR/tmp \
355355
--criteo1tb
@@ -378,7 +378,7 @@ In total, it should contain 885 files (via `find -type f | wc -l`) for a total o
378378
To download, train a tokenizer and preprocess the librispeech dataset:
379379

380380
```bash
381-
python3 datasets/dataset_setup.py \
381+
python3 dataset/dataset_setup.py \
382382
--data_dir $DATA_DIR \
383383
--temp_dir $DATA_DIR/tmp \
384384
--librispeech
@@ -458,7 +458,7 @@ python3 librispeech_preprocess.py --data_dir=$DATA_DIR/librispeech --tokenizer_v
458458
From `algorithmic-efficiency` run:
459459

460460
```bash
461-
python3 datasets/dataset_setup.py \
461+
python3 dataset/dataset_setup.py \
462462
--data_dir $DATA_DIR \
463463
--temp_dir $DATA_DIR/tmp \
464464
--fineweb_edu

0 commit comments

Comments
 (0)