@@ -779,6 +779,8 @@ If your strings are all ASCII strings, and you know the maximum length of the st
779779your dataset, then you can use an array with a fixed-length bytes dtype. E.g.::
780780
781781 >>> z = zarr.zeros(10, dtype='S6')
782+ >>> z
783+ <zarr.core.Array (10,) |S6>
782784 >>> z[0] = b'Hello'
783785 >>> z[1] = b'world!'
784786 >>> z[:]
@@ -793,37 +795,68 @@ A fixed-length unicode dtype is also available, e.g.::
793795 ... 'เฮลโลเวิลด์']
794796 >>> text_data = greetings * 10000
795797 >>> z = zarr.array(text_data, dtype='U20')
798+ >>> z
799+ <zarr.core.Array (120000,) <U20>
796800 >>> z[:]
797801 array(['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', ...,
798802 'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'],
799803 dtype='<U20')
800804
801- For variable-length strings, the " object" dtype can be used, but a codec must be
805+ For variable-length strings, the `` object `` dtype can be used, but a codec must be
802806provided to encode the data (see also :ref: `tutorial_objects ` below). At the time of
803- writing there are three codecs available that can encode variable length string
804- objects, :class: `numcodecs.JSON `, :class: `numcodecs.MsgPack `. and
805- :class: `numcodecs.Pickle `. E.g. using JSON ::
807+ writing there are four codecs available that can encode variable length string
808+ objects: :class: ` numcodecs.VLenUTF8 ` , :class: `numcodecs.JSON `, :class: `numcodecs.MsgPack `.
809+ and :class: `numcodecs.Pickle `. E.g. using `` VLenUTF8 `` ::
806810
807811 >>> import numcodecs
808- >>> z = zarr.array(text_data, dtype=object, object_codec=numcodecs.JSON())
812+ >>> z = zarr.array(text_data, dtype=object, object_codec=numcodecs.VLenUTF8())
813+ >>> z
814+ <zarr.core.Array (120000,) object>
815+ >>> z.filters
816+ [VLenUTF8()]
809817 >>> z[:]
810818 array(['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', ...,
811819 'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)
812820
813- ...or alternatively using msgpack (requires ` msgpack-python
814- <https://github.com/msgpack/msgpack-python> `_ to be installed) ::
821+ As a convenience, `` dtype=str `` (or `` dtype=unicode `` on Python 2.7) can be used, which
822+ is a short-hand for `` dtype=object, object_codec=numcodecs.VLenUTF8() ``, e.g. ::
815823
816- >>> z = zarr.array(text_data, dtype = object , object_codec = numcodecs.MsgPack())
824+ >>> z = zarr.array(text_data, dtype=str)
825+ >>> z
826+ <zarr.core.Array (120000,) object>
827+ >>> z.filters
828+ [VLenUTF8()]
817829 >>> z[:]
818830 array(['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', ...,
819831 'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)
820832
821- If you know ahead of time all the possible string values that can occur, then you could
822- also use the :class: `numcodecs.Categorize ` codec to encode each unique value as an
833+ Variable-length byte strings are also supported via ``dtype=object ``. Again an
834+ ``object_codec `` is required, which can be one of :class: `numcodecs.VLenBytes ` or
835+ :class: `numcodecs.Pickle `. For convenience, ``dtype=bytes `` (or ``dtype=str `` on Python
836+ 2.7) can be used as a short-hand for ``dtype=object, object_codec=numcodecs.VLenBytes() ``,
837+ e.g.::
838+
839+ >>> bytes_data = [g.encode('utf-8') for g in greetings] * 10000
840+ >>> z = zarr.array(bytes_data, dtype=bytes)
841+ >>> z
842+ <zarr.core.Array (120000,) object>
843+ >>> z.filters
844+ [VLenBytes()]
845+ >>> z[:]
846+ array([b'\xc2\xa1Hola mundo!', b'Hej V\xc3\xa4rlden!', b'Servus Woid!',
847+ ..., b'Hell\xc3\xb3, vil\xc3\xa1g!', b'Zdravo svete!',
848+ b'\xe0\xb9\x80\xe0\xb8\xae\xe0\xb8\xa5\xe0\xb9\x82\xe0\xb8\xa5\xe0\xb9\x80\xe0\xb8\xa7\xe0\xb8\xb4\xe0\xb8\xa5\xe0\xb8\x94\xe0\xb9\x8c'], dtype=object)
849+
850+ If you know ahead of time all the possible string values that can occur, you could
851+ also use the :class: `numcodecs.Categorize ` codec to encode each unique string value as an
823852integer. E.g.::
824853
825854 >>> categorize = numcodecs.Categorize(greetings, dtype=object)
826855 >>> z = zarr.array(text_data, dtype=object, object_codec=categorize)
856+ >>> z
857+ <zarr.core.Array (120000,) object>
858+ >>> z.filters
859+ [Categorize(dtype='|O', astype='|u1', labels=['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', ...])]
827860 >>> z[:]
828861 array(['¡Hola mundo!', 'Hej Världen!', 'Servus Woid!', ...,
829862 'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)
@@ -835,13 +868,14 @@ Object arrays
835868-------------
836869
837870Zarr supports arrays with an "object" dtype. This allows arrays to contain any type of
838- object, such as variable length unicode strings, or variable length lists , or other
839- possibilities. When creating an object array, a codec must be provided via the
871+ object, such as variable length unicode strings, or variable length arrays of numbers , or
872+ other possibilities. When creating an object array, a codec must be provided via the
840873``object_codec `` argument. This codec handles encoding (serialization) of Python objects.
841- At the time of writing there are three codecs available that can serve as a
842- general purpose object codec and support encoding of a variety of
843- object types: :class: `numcodecs.JSON `, :class: `numcodecs.MsgPack `. and
844- :class: `numcodecs.Pickle `.
874+ The best codec to use will depend on what type of objects are present in the array.
875+
876+ At the time of writing there are three codecs available that can serve as a general
877+ purpose object codec and support encoding of a mixture of object types:
878+ :class: `numcodecs.JSON `, :class: `numcodecs.MsgPack `. and :class: `numcodecs.Pickle `.
845879
846880For example, using the JSON codec::
847881
@@ -861,6 +895,40 @@ code can be embedded within pickled data. The JSON and MsgPack codecs do not hav
861895security issues and support encoding of unicode strings, lists and dictionaries.
862896MsgPack is usually faster for both encoding and decoding.
863897
898+ Ragged arrays
899+ ~~~~~~~~~~~~~
900+
901+ If you need to store an array of arrays, where each member array can be of any length
902+ and stores the same primitive type (a.k.a. a ragged array), the
903+ :class: `numcodecs.VLenArray ` codec can be used, e.g.::
904+
905+ >>> z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
906+ >>> z
907+ <zarr.core.Array (4,) object>
908+ >>> z.filters
909+ [VLenArray(dtype='<i8')]
910+ >>> z[0] = np.array([1, 3, 5])
911+ >>> z[1] = np.array([4])
912+ >>> z[2] = np.array([7, 9, 14])
913+ >>> z[:]
914+ array([array([1, 3, 5]), array([4]), array([ 7, 9, 14]),
915+ array([], dtype=int64)], dtype=object)
916+
917+ As a convenience, ``dtype='array:T' `` can be used as a short-hand for
918+ ``dtype=object, object_codec=numcodecs.VLenArray('T') ``, where 'T' can be any NumPy
919+ primitive dtype such as 'i4' or 'f8'. E.g.::
920+
921+ >>> z = zarr.empty(4, dtype='array:i8')
922+ >>> z
923+ <zarr.core.Array (4,) object>
924+ >>> z.filters
925+ [VLenArray(dtype='<i8')]
926+ >>> z[0] = np.array([1, 3, 5])
927+ >>> z[1] = np.array([4])
928+ >>> z[2] = np.array([7, 9, 14])
929+ >>> z[:]
930+ array([array([1, 3, 5]), array([4]), array([ 7, 9, 14]),
931+ array([], dtype=int64)], dtype=object)
864932
865933.. _tutorial_chunks :
866934
@@ -1079,25 +1147,19 @@ E.g., pickle/unpickle an array stored on disk::
10791147Datetimes and timedeltas
10801148------------------------
10811149
1082- Please note that NumPy's ``datetime64 `` and ``timedelta64 `` dtypes are **not ** currently
1083- supported for Zarr arrays. If you would like to store datetime or timedelta data, you
1084- can store the data in an array with an integer dtype, e.g.::
1150+ NumPy's ``datetime64 `` ('M8') and ``timedelta64 `` ('m8') dtypes are supported for Zarr
1151+ arrays, as long as the units are specified. E.g.::
10851152
1086- >>> a = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
1087- >>> z = zarr.array(a.view('i8'))
1153+ >>> z = zarr.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='M8[D]')
10881154 >>> z
1089- <zarr.core.Array (3,) int64 >
1155+ <zarr.core.Array (3,) datetime64[D] >
10901156 >>> z[:]
1091- array([13707, 13161, 14834])
1092- >>> z[:].view(a.dtype)
1093- array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
1094-
1095- If you would like a convenient way to retrieve the data from this array viewed as the
1096- original datetime64 dtype, try the :func: `zarr.core.Array.astype ` method, e.g.::
1097-
1098- >>> zv = z.astype(a.dtype)
1099- >>> zv[:]
11001157 array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
1158+ >>> z[0]
1159+ numpy.datetime64('2007-07-13')
1160+ >>> z[0] = '1999-12-31'
1161+ >>> z[:]
1162+ array(['1999-12-31', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
11011163
11021164.. _tutorial_tips :
11031165
0 commit comments