Skip to content

Commit 93d2426

Browse files
committed
WIP instructions for running with S3.
1 parent 2159bef commit 93d2426

3 files changed

Lines changed: 64 additions & 2 deletions

File tree

README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,11 @@ cd storage
6060
pip3 install -e .
6161
```
6262

63+
Install optional S3 libraries if necessary:
64+
```bash
65+
pip3 install s3torchconnector
66+
```
67+
6368
The working directory structure is as follows
6469

6570
```
@@ -575,3 +580,60 @@ In addition to what can be changed in the CLOSED category, the following paramet
575580
## Submission Rules
576581
577582
MLPerf™ Storage Benchmark submission rules are described in this [doc](https://github.com/mlcommons/storage/blob/main/Submission_guidelines.md). If you have questions, please contact [Storage WG chairs](https://mlcommons.org/en/groups/research-storage/).
583+
584+
## S3 DLIO Benchmark
585+
**WIP readme**
586+
Required information:
587+
- Endpoint URL
588+
- Must start with `http://` or `https://`
589+
- May include a port number
590+
- Example: `http://s3.ml.perf:1337
591+
- AWS Access Key ID and Secret Access Key
592+
- Bucket
593+
Optional information:
594+
- Region string
595+
- Default: `us-east-1`
596+
- Virtual-hosted buckets
597+
- If your object store only supports path style bucket addressing, you must set `s3_force_path_style` to `True`
598+
- Default: `False`
599+
600+
The `mlpstorage training datagen` and `mlpstorage training run` commands currently work.
601+
The `--data-dir`/`-dd` argument acts as a object key prefix. Specify `""` to pass in an empty string if you do not want to use a prefix.
602+
Using a prefix may be helpful if storing different datasets or checkpoints in the same bucket.
603+
604+
Currently the easiest way to configure S3 is by using the `--param` argument.
605+
Use the following values with the `--param` argument:
606+
```
607+
storage.storage_type=s3 storage.storage_options.endpoint_url="${AWS_ENDPOINT_URL}" storage.storage_options.access_key_id="${AWS_ACCESS_KEY_ID}" storage.storage_options.secret_access_key="${AWS_SECRET_ACCESS_KEY}" storage.storage_root=my-bucket
608+
```
609+
610+
Currently the new parameters have not been allow listed in mlpstorage. For the time being you must use the `-aip` argument to run the `CLOSED` category.
611+
612+
Complete example:
613+
```bash
614+
export AWS_ENDPOINT_URL=http://s3.ml.perf:1337 \
615+
AWS_ACCESS_KEY_ID=123456789 \
616+
AWS_SECRET_ACCESS_KEY="123/abc" \
617+
S3_BUCKET=my-bucket \
618+
DATA_DIR="my-run-123/"
619+
620+
s3_params="storage.storage_type=s3 storage.storage_options.endpoint_url=${AWS_ENDPOINT_URL} storage.storage_options.access_key_id=${AWS_ACCESS_KEY_ID} storage.storage_options.secret_access_key=${AWS_SECRET_ACCESS_KEY} storage.storage_root=${S3_BUCKET}"
621+
622+
# Generate Data
623+
mlpstorage training datagen --model unet3d -np 8 -dd "${DATA_DIR}" --param dataset.num_files_train=100 $s3_params
624+
625+
# Run the benchmark
626+
mlpstorage training run --model unet3d --client-host-memory-in-gb 32 --num-accelerators 1 --accelerator-type h100 --results-dir results -dd "${DATA_DIR}" --closed -aip --param dataset.num_files_train=100 $s3_params
627+
```
628+
629+
### Optional Parameters
630+
- storage.s3_force_path_style
631+
- storage.region
632+
- reader.read_threads
633+
634+
635+
### Known Limitations
636+
- Training data and checkpoints are stored in the same bucket.
637+
- `mlpstorage checkpointing` does not work
638+
- https endpoints must use a certificate issued by a certificate authority that is trusted by the OS certificate store.
639+
- Note: a self-signed certificate may be used if trusted by the OS certificate store.

mlpstorage/benchmarks/dlio.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ def __init__(self, args, **kwargs):
144144
if self.args.command not in ("datagen", "datasize"):
145145
self.verify_benchmark()
146146

147-
if self.args.command != "datasize":
147+
if self.args.command != "datasize" and self.args.data_dir:
148148
# The datasize command uses --data-dir and needs to generate a command that also calls --data-dir
149149
# The add_datadir_param would convert --data-dir to --dataset.data_folder which is invalid to
150150
# mlpstorage.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ authors = [
1212
]
1313
requires-python = ">=3.10.0"
1414
dependencies = [
15-
"dlio-benchmark @ git+https://github.com/argonne-lcf/dlio_benchmark.git@mlperf_storage_v2.0",
15+
"dlio-benchmark @ git+https://github.com/dpsi/dlio_benchmark.git@darien-s3-refactor",
1616
"psutil>=5.9",
1717
"pyarrow"
1818
]

0 commit comments

Comments
 (0)