You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: download-data/stac-api/large-assets.md
+16-6Lines changed: 16 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,19 +4,23 @@ Assets larger than **50 GB** cannot be downloaded with a regular HTTP `GET` or `
4
4
5
5
The workaround is to use **HTTP range requests**, which bypass the CloudFront limit by fetching the file in sequential chunks directly from the S3 origin.
6
6
7
-
## How It Works
7
+
Downloading a large asset involves three steps that we detail in the following subsections:
8
8
9
-
A `GET` request with the header `Range: bytes=0-0` is sent first to probe the asset.
9
+
1. Probe the asset
10
+
2. Download the file in chunks
11
+
3. Optional: Verify SHA‑256 checksum
12
+
13
+
## 1. Probe the asset
14
+
15
+
Send a `GET` request with the header `Range: bytes=0-0` to probe the asset.
10
16
The S3 origin responds with `HTTP 206 Partial Content` and includes two useful headers:
|`Content-Range`|`bytes 0-0/<total_size>` — the total size of the object |
15
21
|`x-amz-meta-sha256`| SHA-256 hex digest of the full object (when set by the publisher) |
16
22
17
-
The file is then downloaded chunk by chunk using `Range: bytes=<start>-<end>`, and the final file is verified against the expected size and checksum.
18
-
19
-
You can probe an asset manually with `curl`:
23
+
Example to probe an asset manually with `curl`:
20
24
21
25
```bash
22
26
curl --silent --show-error --location \
@@ -39,7 +43,7 @@ x-amz-meta-sha256: <hex>
39
43
`HEAD` requests are **also blocked** by CloudFront for objects > 50 GB. Always use `GET` with a `Range` header to probe asset metadata.
40
44
:::
41
45
42
-
## Download Script
46
+
## 2. Download the file in chunks
43
47
44
48
The script below requires **Python 3.6+ and no third-party packages** (stdlib only). It works on Linux, macOS, and Windows.
45
49
@@ -297,6 +301,12 @@ if __name__ == '__main__':
297
301
main()
298
302
```
299
303
304
+
## 3. Optional: Verify SHA‑256 checksum
305
+
306
+
If the asset publisher provided a checksum, the download script automatically verifies it after the download completes. The expected SHA‑256 is read from the `x-amz-meta-sha256` response header during the probe step and compared against a hash of the downloaded file.
307
+
308
+
If the values do not match, the script exits with an error so you can detect a corrupted or incomplete download before using the file.
309
+
300
310
::: tip Parallel Downloads
301
311
The script above downloads chunks sequentially, which is simple and reliable. For faster downloads on high-bandwidth connections, you can parallelize by downloading multiple chunks simultaneously using threads or asyncio.
0 commit comments