From 5696ab2cb58c40b06fe8df9ad8dce8f6483e29da Mon Sep 17 00:00:00 2001 From: aboydnw Date: Wed, 20 May 2026 19:41:54 +0000 Subject: [PATCH 1/5] docs: add obstore tutorial under overview Adds a new tutorial walking through reading Planetary Computer data with obstore (auto-refreshing SAS tokens, range reads, async, library composability). Companion notebook lives in PlanetaryComputerExamples at quickstarts/obstore.ipynb and is wired in via external_docs_config. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/index.md | 1 + docs/overview/obstore.md | 171 ++++++++++++++++++++++++++++ etl/config/external_docs_config.yml | 1 + 3 files changed, 173 insertions(+) create mode 100644 docs/overview/obstore.md diff --git a/docs/index.md b/docs/index.md index 65c87cbe..c0bcc4c8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -16,6 +16,7 @@ Explorer Use VS Code Use GitHub Codespaces Using QGIS +Reading data with obstore Changelog ``` diff --git a/docs/overview/obstore.md b/docs/overview/obstore.md new file mode 100644 index 00000000..1ca602b7 --- /dev/null +++ b/docs/overview/obstore.md @@ -0,0 +1,171 @@ +# Reading Planetary Computer data with obstore + +[obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. While much of the Planetary Computer ecosystem supports `planetary_computer.sign()` and fsspec, obstore offers a more modern path: SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating. + +A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub][nb-hub] · [Open in Colab][nb-colab] · [View on GitHub][nb-github] + +## Install obstore + +obstore works in any Python project — a script, a Jupyter notebook, a FastAPI backend, a Dagster or Airflow pipeline. To get started, install obstore alongside `pystac-client` (for searching the Planetary Computer's STAC API) and the HTTP libraries that power its credential providers: + +```bash +uv add obstore pystac-client requests aiohttp aiohttp_retry +``` + +`requests` powers the sync credential provider; `aiohttp` and `aiohttp_retry` power the async one. Install both unless you know you only need one path. If you already have a project, you can substitute `pip install`, `poetry add`, or whatever your project uses. + +## Connect to a Planetary Computer asset + +The most common starting point is a STAC asset returned from a search. obstore's `PlanetaryComputerCredentialProvider` reads the asset's blob URL and handles SAS token acquisition and refresh for you. + +1. Open the Planetary Computer STAC catalog and pick a scene to work with. + + ```python + import pystac_client + from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider + + catalog = pystac_client.Client.open( + "https://planetarycomputer.microsoft.com/api/stac/v1" + ) + item = next(catalog.search(collections=["naip"], max_items=1).items()) + asset = item.assets["image"] + ``` + +2. Build a credential provider from the asset. + + ```python + provider = PlanetaryComputerCredentialProvider.from_asset(asset) + ``` + +3. Build a store using that provider. The store is your reusable connection to that asset. + + ```python + from obstore.store import AzureStore + + store = AzureStore(credential_provider=provider) + ``` + +That's the full setup. Every read, write, or library handoff below reuses the same `store`. + +## Read bytes from the store + +Once you have a working store, obstore exposes three read operations that map directly to native Azure Blob API calls. + +A note before you read: `from_asset()` scopes the store to that *specific blob* — the asset URL becomes the store's prefix. Reads use `""` as the path. Passing the asset href on top of the prefix would double it up and fail with `BlobNotFound`. + +1. **Read a byte range.** Useful when you only need part of the file — for example, the first ~16 KB of a Cloud Optimized GeoTIFF (the header). Most libraries (async-geotiff, GDAL, rasterio) only need the header to start working. + + ```python + import obstore + + header = obstore.get_range(store, "", start=0, end=16384) + ``` + +2. **Read multiple byte ranges in a single request.** Cuts round-trip latency when you need several non-contiguous slices of the same file (e.g. multiple COG tiles). + + ```python + ranges = obstore.get_ranges( + store, "", starts=[0, 65536], ends=[16384, 81920] + ) + ``` + +3. **Read the entire file.** Avoid this for large rasters — NAIP scenes can be 100–500 MB and Azure caps single-stream downloads at ~8–15 MB/s. Range reads and async (below) exist precisely to avoid this scenario. + + ```python + buf = obstore.get(store, "").bytes() + ``` + +## Run reads in parallel + +For multi-file workloads — building a mosaic, fetching all bands across all scenes in an AOI — running reads in parallel is dramatically faster than serial. obstore exposes async equivalents of every read function (`get_async`, `get_range_async`, etc.) that you can compose with `asyncio.gather`. + +Async needs its own credential provider class, `PlanetaryComputerAsyncCredentialProvider`, backed by `aiohttp` instead of `requests`. Same `from_asset()` signature. + +```python +import asyncio +from obstore.auth.planetary_computer import PlanetaryComputerAsyncCredentialProvider + +async_provider = PlanetaryComputerAsyncCredentialProvider.from_asset(asset) +async_store = AzureStore(credential_provider=async_provider) + +async def fetch(start, end): + return await obstore.get_range_async(async_store, "", start=start, end=end) + +results = await asyncio.gather(*[fetch(i * 4096, (i + 1) * 4096) for i in range(8)]) +``` + +The companion notebook benchmarks the speedup against serial reads — typically 3–5× faster in practice. + +## List objects across a container + +The asset-scoped pattern above is the right default, but it doesn't grant `List` permission on the container. To enumerate objects under a prefix ("show me every NAIP scene in Montana in 2023"), build a fresh provider against the container URL instead. + +```python +container_provider = PlanetaryComputerCredentialProvider( + "https://naipeuwest.blob.core.windows.net/naip/" +) +container_store = AzureStore( + account_name="naipeuwest", + container_name="naip", + credential_provider=container_provider, +) + +for batch in obstore.list(container_store, prefix="v002/mt/2023/"): + for entry in batch: + print(entry["path"], entry["size"]) +``` + +## Hand the store to other libraries + +obstore really shines as a foundation. Any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store reads through your authenticated connection without re-doing auth. Open the same NAIP scene as a Cloud Optimized GeoTIFF using async-geotiff: + +```python +from async_geotiff import GeoTIFF + +geotiff = await GeoTIFF.open("", store=async_store) +print(geotiff.transform, geotiff.crs.name) +``` + +The same pattern works for [Lonboard](https://developmentseed.org/lonboard/) visualization and [zarr-python](https://zarr.dev/) datasets. See the [obstore Zarr example](https://developmentseed.org/obstore/latest/examples/zarr/) for a Planetary Computer Daymet walkthrough. + +## Migrate from `planetary_computer.sign()` + fsspec + +If you're updating an existing project, here's the side-by-side. The old pattern: + +```python +import planetary_computer +import fsspec + +signed = planetary_computer.sign(asset.href) +with fsspec.open(signed) as f: + data = f.read() +``` + +The obstore equivalent: + +```python +from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider +from obstore.store import AzureStore +import obstore + +provider = PlanetaryComputerCredentialProvider.from_asset(asset) +store = AzureStore(credential_provider=provider) +data = obstore.get(store, "").bytes() +``` + +obstore handles re-signing on expiry, talks to Azure's native blob API instead of routing through HTTP via fsspec, and exposes async I/O for parallel reads — all without changing your auth code per request. + +## Use the same code against other clouds + +obstore implements the [obspec](https://github.com/developmentseed/obspec) protocol, so the same read and write calls work against S3 or GCS — only the store constructor changes. Any library built on obspec inherits this portability automatically. + +```python +from obstore.store import S3Store + +s3_store = S3Store(bucket="my-bucket", region="us-west-2") +buf = obstore.get(s3_store, "path/to/object").bytes() +``` + +[nb-hub]: # "TODO: link to notebook on Planetary Computer Hub" +[nb-colab]: # "TODO: link to notebook on Colab" +[nb-github]: # "TODO: link to notebook in companion repo" diff --git a/etl/config/external_docs_config.yml b/etl/config/external_docs_config.yml index 9ef8dd49..96326b5e 100644 --- a/etl/config/external_docs_config.yml +++ b/etl/config/external_docs_config.yml @@ -28,3 +28,4 @@ - file_url: quickstarts/reading-tabular-data.ipynb - file_url: quickstarts/reading-zarr-data.ipynb - file_url: quickstarts/storage.ipynb +- file_url: quickstarts/obstore.ipynb From 863040a0dc26c81a62cd3b96077c1ba1e2b44999 Mon Sep 17 00:00:00 2001 From: aboydnw Date: Wed, 20 May 2026 19:53:42 +0000 Subject: [PATCH 2/5] docs: fill in obstore notebook badge links Drops the Colab badge (off-brand for PC; Hub is the canonical JupyterLab environment) and replaces the TODO placeholders with real URLs: nbgitpuller deep link to PC Hub and a github.com blob link to the companion notebook. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/overview/obstore.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/overview/obstore.md b/docs/overview/obstore.md index 1ca602b7..c1b1d5ae 100644 --- a/docs/overview/obstore.md +++ b/docs/overview/obstore.md @@ -2,7 +2,7 @@ [obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. While much of the Planetary Computer ecosystem supports `planetary_computer.sign()` and fsspec, obstore offers a more modern path: SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating. -A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub][nb-hub] · [Open in Colab][nb-colab] · [View on GitHub][nb-github] +A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub][nb-hub] · [View on GitHub][nb-github] ## Install obstore @@ -166,6 +166,5 @@ s3_store = S3Store(bucket="my-bucket", region="us-west-2") buf = obstore.get(s3_store, "path/to/object").bytes() ``` -[nb-hub]: # "TODO: link to notebook on Planetary Computer Hub" -[nb-colab]: # "TODO: link to notebook on Colab" -[nb-github]: # "TODO: link to notebook in companion repo" +[nb-hub]: https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/obstore.ipynb&branch=main +[nb-github]: https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/obstore.ipynb From 6d4c423619c942116e8f8fead89e7453777d28a7 Mon Sep 17 00:00:00 2001 From: aboydnw Date: Wed, 20 May 2026 20:16:42 +0000 Subject: [PATCH 3/5] docs: inline notebook badge URLs and tighten copy Inlines the Hub and GitHub URLs on the badge line and drops the reference-style defs at the bottom. Also picks up the inline copy edits across the body. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/overview/obstore.md | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/docs/overview/obstore.md b/docs/overview/obstore.md index c1b1d5ae..b72db36c 100644 --- a/docs/overview/obstore.md +++ b/docs/overview/obstore.md @@ -1,18 +1,18 @@ # Reading Planetary Computer data with obstore -[obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. While much of the Planetary Computer ecosystem supports `planetary_computer.sign()` and fsspec, obstore offers a more modern path: SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating. +[obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. Using obstore, SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating. -A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub][nb-hub] · [View on GitHub][nb-github] +A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/obstore.ipynb&branch=main) · [View on GitHub](https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/obstore.ipynb) ## Install obstore -obstore works in any Python project — a script, a Jupyter notebook, a FastAPI backend, a Dagster or Airflow pipeline. To get started, install obstore alongside `pystac-client` (for searching the Planetary Computer's STAC API) and the HTTP libraries that power its credential providers: +obstore works in any Python project. To get started, install obstore alongside `pystac-client` (for searching the Planetary Computer's STAC API) and the HTTP libraries that power its credential providers: ```bash uv add obstore pystac-client requests aiohttp aiohttp_retry ``` -`requests` powers the sync credential provider; `aiohttp` and `aiohttp_retry` power the async one. Install both unless you know you only need one path. If you already have a project, you can substitute `pip install`, `poetry add`, or whatever your project uses. +`requests` powers the sync credential provider; `aiohttp` and `aiohttp_retry` power the async one. Install both unless you know you only need one path. ## Connect to a Planetary Computer asset @@ -45,15 +45,11 @@ The most common starting point is a STAC asset returned from a search. obstore's store = AzureStore(credential_provider=provider) ``` -That's the full setup. Every read, write, or library handoff below reuses the same `store`. - ## Read bytes from the store Once you have a working store, obstore exposes three read operations that map directly to native Azure Blob API calls. -A note before you read: `from_asset()` scopes the store to that *specific blob* — the asset URL becomes the store's prefix. Reads use `""` as the path. Passing the asset href on top of the prefix would double it up and fail with `BlobNotFound`. - -1. **Read a byte range.** Useful when you only need part of the file — for example, the first ~16 KB of a Cloud Optimized GeoTIFF (the header). Most libraries (async-geotiff, GDAL, rasterio) only need the header to start working. +1. **Read a byte range.** Useful when you only need part of the file. For example, the first ~16 KB of a Cloud Optimized GeoTIFF. ```python import obstore @@ -69,7 +65,7 @@ A note before you read: `from_asset()` scopes the store to that *specific blob* ) ``` -3. **Read the entire file.** Avoid this for large rasters — NAIP scenes can be 100–500 MB and Azure caps single-stream downloads at ~8–15 MB/s. Range reads and async (below) exist precisely to avoid this scenario. +3. **Read the entire file.** Avoid this for large rasters. Range reads and async (below) exist to avoid this scenario. ```python buf = obstore.get(store, "").bytes() @@ -77,7 +73,7 @@ A note before you read: `from_asset()` scopes the store to that *specific blob* ## Run reads in parallel -For multi-file workloads — building a mosaic, fetching all bands across all scenes in an AOI — running reads in parallel is dramatically faster than serial. obstore exposes async equivalents of every read function (`get_async`, `get_range_async`, etc.) that you can compose with `asyncio.gather`. +For multi-file workloads like building a mosaic or fetching all bands across all scenes in an AOI, running reads in parallel is faster. obstore exposes async equivalents of every read function (`get_async`, `get_range_async`, etc.) that you can compose with `asyncio.gather`. Async needs its own credential provider class, `PlanetaryComputerAsyncCredentialProvider`, backed by `aiohttp` instead of `requests`. Same `from_asset()` signature. @@ -94,11 +90,11 @@ async def fetch(start, end): results = await asyncio.gather(*[fetch(i * 4096, (i + 1) * 4096) for i in range(8)]) ``` -The companion notebook benchmarks the speedup against serial reads — typically 3–5× faster in practice. +This is typically 3–5× faster in practice. ## List objects across a container -The asset-scoped pattern above is the right default, but it doesn't grant `List` permission on the container. To enumerate objects under a prefix ("show me every NAIP scene in Montana in 2023"), build a fresh provider against the container URL instead. +To enumerate objects under a prefix ("show me every NAIP scene in Montana in 2023"), build a fresh provider against the container URL instead. ```python container_provider = PlanetaryComputerCredentialProvider( @@ -117,7 +113,7 @@ for batch in obstore.list(container_store, prefix="v002/mt/2023/"): ## Hand the store to other libraries -obstore really shines as a foundation. Any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store reads through your authenticated connection without re-doing auth. Open the same NAIP scene as a Cloud Optimized GeoTIFF using async-geotiff: +Any library that accepts an [obspec](https://github.com/developmentseed/obspec)-compatible store reads through your authenticated connection without re-doing auth. Open the same NAIP scene as a Cloud Optimized GeoTIFF using async-geotiff: ```python from async_geotiff import GeoTIFF @@ -157,7 +153,7 @@ obstore handles re-signing on expiry, talks to Azure's native blob API instead o ## Use the same code against other clouds -obstore implements the [obspec](https://github.com/developmentseed/obspec) protocol, so the same read and write calls work against S3 or GCS — only the store constructor changes. Any library built on obspec inherits this portability automatically. +obstore implements the [obspec](https://github.com/developmentseed/obspec) protocol, so the same read and write calls work against S3 or GCS. Any library built on obspec inherits this portability automatically. ```python from obstore.store import S3Store @@ -166,5 +162,3 @@ s3_store = S3Store(bucket="my-bucket", region="us-west-2") buf = obstore.get(s3_store, "path/to/object").bytes() ``` -[nb-hub]: https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/obstore.ipynb&branch=main -[nb-github]: https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/obstore.ipynb From bee26a1d866fd576466133584bfc9310d1da3298 Mon Sep 17 00:00:00 2001 From: aboydnw Date: Wed, 20 May 2026 20:17:49 +0000 Subject: [PATCH 4/5] docs: drop GitHub badge from obstore tutorial Hub link is the canonical way to open the notebook; the GitHub view duplicates what the docs site already renders. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/overview/obstore.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/overview/obstore.md b/docs/overview/obstore.md index b72db36c..fc76a867 100644 --- a/docs/overview/obstore.md +++ b/docs/overview/obstore.md @@ -2,7 +2,7 @@ [obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. Using obstore, SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating. -A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/obstore.ipynb&branch=main) · [View on GitHub](https://github.com/microsoft/PlanetaryComputerExamples/blob/main/quickstarts/obstore.ipynb) +A companion notebook walks through every step end-to-end with live timings. [Open in Planetary Computer Hub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/obstore.ipynb&branch=main) ## Install obstore From 0e839a7a1d8a90c088b57221b8482f50cb9fc75e Mon Sep 17 00:00:00 2001 From: aboydnw Date: Fri, 22 May 2026 19:30:08 +0000 Subject: [PATCH 5/5] docs: tighten obstore composability claim in tutorial Drops Lonboard reference (no obstore integration in Lonboard) and notes that zarr-python access goes through the zarr.storage.ObjectStore adapter rather than direct hand-off. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/overview/obstore.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/overview/obstore.md b/docs/overview/obstore.md index fc76a867..ddef904b 100644 --- a/docs/overview/obstore.md +++ b/docs/overview/obstore.md @@ -122,7 +122,7 @@ geotiff = await GeoTIFF.open("", store=async_store) print(geotiff.transform, geotiff.crs.name) ``` -The same pattern works for [Lonboard](https://developmentseed.org/lonboard/) visualization and [zarr-python](https://zarr.dev/) datasets. See the [obstore Zarr example](https://developmentseed.org/obstore/latest/examples/zarr/) for a Planetary Computer Daymet walkthrough. +[zarr-python](https://zarr.dev/) works through a thin adapter (`zarr.storage.ObjectStore` wraps your obstore store). See the [obstore Zarr example](https://developmentseed.org/obstore/latest/examples/zarr/) for a Planetary Computer Daymet walkthrough. ## Migrate from `planetary_computer.sign()` + fsspec