You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/pyDataverse/docs/source/user/basic-usage.rst
+59-3Lines changed: 59 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,10 +72,21 @@ can then be used (e. g. :meth:`json() <requests.Response.json>`).
72
72
Create Dataverse Collection
73
73
-----------------------------
74
74
75
-
The top-level data-type in the Dataverse software is called a Dataverse collection, so we will
76
-
start with that.
75
+
The top-level data-type in the Dataverse software is called a Dataverse collection, so we will start with that.
76
+
Take a look at the figure below to better understand the relationship between a Dataverse collection, a dataset, and a datafile.
77
77
78
-
First, instantiate a :class:`Dataverse <pyDataverse.models.Dataverse>`
78
+
.. figure:: ../_images/collection_dataset.png
79
+
:align:center
80
+
:alt:collection dataset datafile
81
+
82
+
A dataverse collection (also known as a :class:`Dataverse <pyDataverse.models.Dataverse>`) acts as a container for your :class:`Datasets<pyDataverse.models.Dataverse>`.
83
+
It can also store other collections (:class:`Dataverses <pyDataverse.models.Dataverse>`).
84
+
You could create your own Dataverse collections, but it is not a requirement.
85
+
A Dataset is a container for :class:`Datafiles<pyDataverse.models.Datafile>`, such as data, documentation, code, metadata, etc.
86
+
You need to create a Dataset to deposit your files. All Datasets are uniquely identified with a DOI at Dataverse.
87
+
For more detailed explanations, check out `the Dataverse User Guide <https://guides.dataverse.org/en/latest/user/dataset-management.html>`_.
88
+
89
+
Going back to the example, first, instantiate a :class:`Dataverse <pyDataverse.models.Dataverse>`
79
90
object and import the metadata from the Dataverse Software's own JSON format with
@@ -287,6 +298,51 @@ always leads to a major version change:
287
298
Dataset doi:10.5072/FK2/EO7BNB published
288
299
289
300
301
+
.. _user_basic-usage_download-data:
302
+
303
+
Download and save a dataset to disk
304
+
----------------------------------------
305
+
306
+
You may want to download and explore an existing dataset from Dataverse. The following code snippet will show how to retrieve and save a dataset to your machine.
307
+
308
+
Note that if the dataset is public, you don't need to have an API_TOKEN. Furthermore, you don't even need to have a Dataverse account to use this functionality. The code would therefore look as follows:
309
+
310
+
::
311
+
312
+
>>> from pyDataverse.api import NativeApi, DataAccessApi
313
+
>>> from pyDataverse.models import Dataverse
314
+
315
+
>>> base_url = 'https://dataverse.harvard.edu/'
316
+
317
+
>>> api = NativeApi(base_url)
318
+
>>> data_api = DataAccessApi(base_url)
319
+
320
+
However, you need to know the DOI of the dataset that you want to download. In this example, we use ``doi:10.7910/DVN/KBHLOD``, which is hosted on Harvard's Dataverse instance that we specified as ``base_url``. The code looks as follows:
321
+
322
+
::
323
+
324
+
>>> DOI = "doi:10.7910/DVN/KBHLOD"
325
+
>>> dataset = api.get_dataset(DOI)
326
+
327
+
As previously mentioned, every dataset comprises of datafiles, therefore, we need to get the list of datafiles by ID and save them on disk. That is done in the following code snippet:
>>> print("File name {}, id {}".format(filename, file_id))
337
+
338
+
>>> response = data_api.get_datafile(file_id)
339
+
>>> with open(filename, "wb") as f:
340
+
>>> f.write(response.content)
341
+
File name cat.jpg, id 2456195
342
+
343
+
Please note that in this example, the dataset will be saved in the execution directory. You could change that by adding a desired path in the ``open()`` function above.
0 commit comments