Skip to content

Commit 8733748

Browse files
authored
Merge pull request #131 from geoadmin/update_large_asset_download_doc
Update large-assets.md
2 parents ea81d82 + b7c66ef commit 8733748

1 file changed

Lines changed: 16 additions & 6 deletions

File tree

download-data/stac-api/large-assets.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,23 @@ Assets larger than **50 GB** cannot be downloaded with a regular HTTP `GET` or `
44

55
The workaround is to use **HTTP range requests**, which bypass the CloudFront limit by fetching the file in sequential chunks directly from the S3 origin.
66

7-
## How It Works
7+
Downloading a large asset involves three steps that we detail in the following subsections:
88

9-
A `GET` request with the header `Range: bytes=0-0` is sent first to probe the asset.
9+
1. Probe the asset
10+
2. Download the file in chunks
11+
3. Optional: Verify SHA‑256 checksum
12+
13+
## 1. Probe the asset
14+
15+
Send a `GET` request with the header `Range: bytes=0-0` to probe the asset.
1016
The S3 origin responds with `HTTP 206 Partial Content` and includes two useful headers:
1117

1218
| Header | Value |
1319
| ------------------- | ----------------------------------------------------------------- |
1420
| `Content-Range` | `bytes 0-0/<total_size>` — the total size of the object |
1521
| `x-amz-meta-sha256` | SHA-256 hex digest of the full object (when set by the publisher) |
1622

17-
The file is then downloaded chunk by chunk using `Range: bytes=<start>-<end>`, and the final file is verified against the expected size and checksum.
18-
19-
You can probe an asset manually with `curl`:
23+
Example to probe an asset manually with `curl`:
2024

2125
```bash
2226
curl --silent --show-error --location \
@@ -39,7 +43,7 @@ x-amz-meta-sha256: <hex>
3943
`HEAD` requests are **also blocked** by CloudFront for objects > 50 GB. Always use `GET` with a `Range` header to probe asset metadata.
4044
:::
4145

42-
## Download Script
46+
## 2. Download the file in chunks
4347

4448
The script below requires **Python 3.6+ and no third-party packages** (stdlib only). It works on Linux, macOS, and Windows.
4549

@@ -297,6 +301,12 @@ if __name__ == '__main__':
297301
main()
298302
```
299303

304+
## 3. Optional: Verify SHA‑256 checksum
305+
306+
If the asset publisher provided a checksum, the download script automatically verifies it after the download completes. The expected SHA‑256 is read from the `x-amz-meta-sha256` response header during the probe step and compared against a hash of the downloaded file.
307+
308+
If the values do not match, the script exits with an error so you can detect a corrupted or incomplete download before using the file.
309+
300310
::: tip Parallel Downloads
301311
The script above downloads chunks sequentially, which is simple and reliable. For faster downloads on high-bandwidth connections, you can parallelize by downloading multiple chunks simultaneously using threads or asyncio.
302312

0 commit comments

Comments
 (0)