|
12 | 12 | "[Introduction](#introduction)\n", |
13 | 13 | "\n", |
14 | 14 | "[Query-only modes](#query-only-modes)\n", |
15 | | - "- [`ls`/`query` mode](#ls-query)\n", |
16 | | - "- [`s3_ls`/`s3_query` mode](#s3-ls-s3-query)\n", |
| 15 | + "- [`ls`/`query` mode](#ls/query-mode)\n", |
| 16 | + "- [`s3_ls`/`s3_query` mode](#s3-ls/s3-query-mode)\n", |
17 | 17 | "\n", |
18 | 18 | "[Direct download modes](#direct-download-modes)\n", |
19 | | - "- [`download` mode](#download)\n", |
20 | | - "- [`download_ifspace` mode](#download-ifspace)\n", |
21 | | - "- [`download_subset` mode](#download-subset)\n", |
| 19 | + "- [`download` mode](#download-mode)\n", |
| 20 | + "- [`download_ifspace` mode](#download-ifspace-mode)\n", |
| 21 | + "- [`download_subset` mode](#download-subset-mode)\n", |
22 | 22 | "\n", |
23 | 23 | "[In-cloud only access modes](#in-cloud-only-access-modes)\n", |
24 | | - "- [`s3_open` mode](#s3-open)\n", |
25 | | - "- [`s3_open_fsspec` mode](#s3-open-fsspec)\n", |
26 | | - "- [`s3_get` mode](#s3-get)\n", |
27 | | - "- [`s3_get_ifspace` mode](#s3-get-ifspace)\n", |
| 24 | + "- [`s3_open` mode](#s3-open-mode)\n", |
| 25 | + "- [`s3_open_fsspec` mode](#s3-open-fsspec-mode)\n", |
| 26 | + "- [`s3_get` mode](#s3-get-mode)\n", |
| 27 | + "- [`s3_get_ifspace` mode](#s3-get-ifspace-mode)\n", |
28 | 28 | "\n", |
29 | 29 | "[Time comparison of access modes](#time-comparison-of-access-modes)\n", |
30 | 30 | "\n", |
|
62 | 62 | "\n", |
63 | 63 | "These modes return the URLs (`ls`/`query`) or S3 file paths (`s3_ls`/`s3_query`) to access the ECCO output. These modes only work with `ecco_podaac_access` (not `ecco_podaac_to_xrdataset`), since we are not opening a dataset, just querying the location of the data.\n", |
64 | 64 | "\n", |
65 | | - "```{note}\n", |
66 | | - ">The `ls` and `query` modes are interchangeable and have the same functionality, just by different names. The same is true for `s3_ls` and `s3_query`.\n", |
67 | | - "```\n", |
68 | 65 | "\n", |
69 | | - "(ls-query)=\n", |
| 66 | + "> The `ls` and `query` modes are interchangeable and have the same functionality, just by different names. The same is true for `s3_ls` and `s3_query`.\n", |
| 67 | + "\n", |
| 68 | + "\n", |
70 | 69 | "### `ls`/`query` mode" |
71 | 70 | ] |
72 | 71 | }, |
|
139 | 138 | "id": "91024eb5-5693-4561-9faf-c5d98fee8058", |
140 | 139 | "metadata": {}, |
141 | 140 | "source": [ |
142 | | - "(s3-ls-s3-query)=\n", |
143 | 141 | "### `s3_ls`/`s3_query` mode\n", |
144 | 142 | "\n", |
145 | 143 | "You can use the `s3_ls`/`s3_query` mode to find the `S3` bucket file paths for AWS in-cloud access:" |
|
1701 | 1699 | "id": "835394df-0193-4f61-bb86-2ec305a7201e", |
1702 | 1700 | "metadata": {}, |
1703 | 1701 | "source": [ |
1704 | | - "(direct-download-modes)=\n", |
1705 | 1702 | "## Direct download modes\n", |
1706 | 1703 | "\n", |
1707 | | - "(download)=\n", |
1708 | 1704 | "### `download` mode\n", |
1709 | 1705 | "\n", |
1710 | 1706 | "The `download` mode directly downloads the queried files under a root directory of your choosing, creating the directory if needed. If `ecco_podaac_access` is called using this mode, the dictionary returned includes list(s) of the downloaded files that can be passed to `xarray.open_mfdataset` (or `xarray.open_dataset`, one file at a time). If `ecco_podaac_to_xrdataset` is used, the `xarray.open_mfdataset` step is included and an `xarray` Dataset is returned." |
|
3317 | 3313 | "id": "1f741975-9f61-4980-9fa8-2e34e2ecc880", |
3318 | 3314 | "metadata": {}, |
3319 | 3315 | "source": [ |
3320 | | - "(download-ifspace)=\n", |
3321 | 3316 | "### `download_ifspace` mode\n", |
3322 | 3317 | "\n", |
3323 | 3318 | "This mode is similar to `download`, but it will also query how much storage is available at the target download location before carrying out downloads, and returns an error if the space to be occupied by the downloaded files is more than a specified fraction of available storage. The function also takes into account if some or all of the queried files are already on disk, and therefore do not need to be downloaded again." |
|
3416 | 3411 | "id": "9115627b-2406-49f6-9780-e0cf1fbf5cfe", |
3417 | 3412 | "metadata": {}, |
3418 | 3413 | "source": [ |
3419 | | - "(download-subset)=\n", |
3420 | 3414 | "### `download_subset` mode\n", |
3421 | 3415 | "\n", |
3422 | 3416 | "The `download_subset` mode is essentially a wrapper for the `ecco_podaac_download_subset` function, which uses Opendap to allow spatial, temporal, and variable-based subsetting of ECCO datasets and granules at the download stage. Depending on the size of the source dataset (e.g., whether the dataset has a depth dimension or not), this mode may be faster or slower than downloading the full granule files; it will almost certainly be slower than using mode = `s3_open_fsspec` when you have the `json` files available. But it can be a space- and time-saver when you are working on your local machine and not in the cloud.\n", |
|
4951 | 4945 | "source": [ |
4952 | 4946 | "## In-cloud only access modes\n", |
4953 | 4947 | "\n", |
4954 | | - "(s3-open)=\n", |
4955 | 4948 | "### `s3_open` mode\n", |
4956 | 4949 | "\n", |
4957 | 4950 | "If you are working in the AWS cloud (in region `us-west-2`), you can open files from `S3` storage without downloading them; this is called \"direct access\"." |
|
4991 | 4984 | "id": "55e7d4c8-5e72-400c-8988-fe1367af361d", |
4992 | 4985 | "metadata": {}, |
4993 | 4986 | "source": [ |
4994 | | - "(s3-open-fsspec)=\n", |
4995 | 4987 | "### `s3_open_fsspec` mode\n", |
4996 | 4988 | "\n", |
4997 | 4989 | "The `s3_open` mode allows you to access data \"remotely\" from `S3`, but it is usually slower than downloading the data. However, the [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) and [kerchunk](https://fsspec.github.io/kerchunk/) libraries provide an [efficient way to access data](https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685) by storing pointers to data chunks in `json` files. These files have been produced for the ECCO datasets, and by using mode = `s3_open_fsspec` we can access the data much more quickly without downloading it! \n", |
|
5786 | 5778 | "id": "cef69c04-7115-4712-b55b-f60aa0f7d9b9", |
5787 | 5779 | "metadata": {}, |
5788 | 5780 | "source": [ |
5789 | | - "(s3-get)=\n", |
5790 | 5781 | "### `s3_get` mode\n", |
5791 | 5782 | "\n", |
5792 | 5783 | "The `s3_get` mode functions much like the `download` mode, except files are accesed in-cloud and downloading them to your local instance. If used with `ecco_podaac_access`, a dictionary containing the file paths/names is returned, that can then be used to open an `xarray` Dataset." |
|
5947 | 5938 | "id": "e114f7de-a9da-40e3-9bea-3aeeeeb54b7b", |
5948 | 5939 | "metadata": {}, |
5949 | 5940 | "source": [ |
5950 | | - "(s3-get-ifspace)=\n", |
5951 | 5941 | "### `s3_get_ifspace` mode\n", |
5952 | 5942 | "\n", |
5953 | 5943 | "This mode is similar to `s3_get`, but it will also query how much storage is available at the target download location before carrying out downloads. If the space to be occupied by the downloaded files is more than a specified fraction of available storage, the files are opened remotely (using `s3_open`), rather than using `s3_get`." |
|
5999 | 5989 | "source": [ |
6000 | 5990 | "## Time comparison of access modes\n", |
6001 | 5991 | "\n", |
6002 | | - "Based on the examples above, here are the wall times for generating `ds_dict` for each function (except `download_subset` which used a different set of data):\n", |
| 5992 | + "Based on the examples above, here are some wall times for generating `ds_dict` using different modes, on a `large` instance in the AWS Cloud:\n", |
6003 | 5993 | "\n", |
6004 | 5994 | "- `download`: 5.52 s\n", |
6005 | 5995 | "\n", |
|
0 commit comments