You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Fix intro sentence to use active voice
- Remove "on Linux" qualifier from curl example
- Make section headings consistent with numbered list
- Add section for optional SHA-256 checksum verification
Copy file name to clipboardExpand all lines: download-data/stac-api/large-assets.md
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,13 @@ Assets larger than **50 GB** cannot be downloaded with a regular HTTP `GET` or `
4
4
5
5
The workaround is to use **HTTP range requests**, which bypass the CloudFront limit by fetching the file in sequential chunks directly from the S3 origin.
6
6
7
-
The actual download is completed in this steps:
7
+
Downloading a large asset involves three steps that we detail in the following subsections:
8
8
9
9
1. Probe the asset
10
10
2. Download the file in chunks
11
11
3. Optional: Verify SHA‑256 checksum
12
12
13
-
## Probe the asset
13
+
## 1. Probe the asset
14
14
15
15
Send a `GET` request with the header `Range: bytes=0-0` to probe the asset.
16
16
The S3 origin responds with `HTTP 206 Partial Content` and includes two useful headers:
@@ -20,7 +20,7 @@ The S3 origin responds with `HTTP 206 Partial Content` and includes two useful h
20
20
|`Content-Range`|`bytes 0-0/<total_size>` — the total size of the object |
21
21
|`x-amz-meta-sha256`| SHA-256 hex digest of the full object (when set by the publisher) |
22
22
23
-
Example to probe an asset manually with `curl` on Linux:
23
+
Example to probe an asset manually with `curl`:
24
24
25
25
```bash
26
26
curl --silent --show-error --location \
@@ -43,7 +43,7 @@ x-amz-meta-sha256: <hex>
43
43
`HEAD` requests are **also blocked** by CloudFront for objects > 50 GB. Always use `GET` with a `Range` header to probe asset metadata.
44
44
:::
45
45
46
-
## Download
46
+
## 2. Download the file in chunks
47
47
48
48
The script below requires **Python 3.6+ and no third-party packages** (stdlib only). It works on Linux, macOS, and Windows.
49
49
@@ -301,6 +301,12 @@ if __name__ == '__main__':
301
301
main()
302
302
```
303
303
304
+
## 3. Optional: Verify SHA‑256 checksum
305
+
306
+
If the asset publisher provided a checksum, the download script automatically verifies it after the download completes. The expected SHA‑256 is read from the `x-amz-meta-sha256` response header during the probe step and compared against a hash of the downloaded file.
307
+
308
+
If the values do not match, the script exits with an error so you can detect a corrupted or incomplete download before using the file.
309
+
304
310
::: tip Parallel Downloads
305
311
The script above downloads chunks sequentially, which is simple and reliable. For faster downloads on high-bandwidth connections, you can parallelize by downloading multiple chunks simultaneously using threads or asyncio.
0 commit comments