Skip to content

Latest commit

 

History

History
46 lines (33 loc) · 4.33 KB

File metadata and controls

46 lines (33 loc) · 4.33 KB
layout single
author_profile false
title Zarr Components
sidebar
title nav
Components
sidebar

Zarr consists of several components, both abstract and concrete. These span both the physical storage layer and the conceptual structural layer. Zarr-related projects all use the Zarr Protocol (and hence data model), described by the Zarr Specification, but otherwise may choose to implement other layers however they wish.

Abstract components

These abstract components together describe what type of data can be stored in zarr, and how to store it, without assuming you are working in a particular programming language, or with a particular storage system.

Protocol: All zarr-related projects use the Zarr Protocol, described in the Zarr Specification, which allows transfer of chunked array data and metadata between devices (or between memory regions of the same device). The protocol works by serializing and de-serializing array data as byte streams and storing both this data and accompanying metadata via an Abstract Key-Value Store Interface. A system of Codecs is used to describe the encoding and serialization steps.

Data Model: The specification's description of the Stored Representation implies a particular data model, based on the HDF Abstract Data Model. It consists of a hierarchical tree of groups and arrays, with optional arbitrary metadata at every node. This model is completely domain-agnostic.

Format: If the keys in the abstract key-value store interface are mapped unaltered to paths in a POSIX filesystem or prefixes in object storage, the data written to disk will follow the "Native Zarr Format". Most, but not all, zarr implementations will serialize to this format.

Extensions: Zarr provides a core set of generally-useful features, but extensions to this core are encouraged. These might take the form of domain-specific metadata conventions, new codecs, or additions to the data model via extension points. These can be abstract, or enforced by implementations or client libraries however they like, but generally should be opt-in.

Concrete components

Concrete implementations of the abstract components can be implemented in any language. The canonical reference implementation is Zarr-Python, but there are many other implementations. Zarr-Python contains reference examples of useful constructs that can be re-implemented in other languages.

Abstract Base Classes: Zarr-python's zarr.abc module contains abstract base classes enforcing a particular python realization of the specification's Key-Value Store interface, using a Store ABC, which is based on a MutableMapping-like API. This component is concrete in the sense that it is implemented in a specific programming language, and enforces particular syntax for getting and setting values in a key-value store.

Store Implementations: Zarr-python's zarr.storage module contains concrete implementations of the Store ABC for interacting with particular storage systems. The zarr-python store implementations which write to local filesystems or object storage write data in the Native Zarr Format. It's expected that most users of zarr from python will just use one of these implementations.

User API: Zarr-python's zarr.api module contains functions and classes for interacting with any concrete implementation of the zarr.abc.Store interface. This allows user applications to use a standard zarr API to read and write from a variety of common storage systems.

These various components allow for a huge amount of flexibility.