You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/02-starting-with-data.md
+30-13Lines changed: 30 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,19 +36,19 @@ This should help us avoid path and file name issues. At this time please
36
36
navigate to the workshop directory. If you working in IPython Notebook be sure
37
37
that you start your notebook in the workshop directory.
38
38
39
-
A quick aside that there are Python libraries like [OS
40
-
Library](https://docs.python.org/3/library/os.html) that can work with our
39
+
A quick aside that there are Python libraries like [OS Library][os-lib] that can work with our
41
40
directory structure, however, that is not our focus today.
42
41
43
42
### Our Data
44
43
45
44
For this lesson, we will be using the Portal Teaching data, a subset of the data
46
45
from Ernst et al
47
-
[Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA](http://www.esapubs.org/archive/ecol/E090/118/default.htm)
46
+
[Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal,
47
+
Arizona, USA][ernst].
48
48
49
-
We will be using files from the [Portal Project Teaching Database](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459).
49
+
We will be using files from the [Portal Project Teaching Database][pptd].
50
50
This section will use the `surveys.csv` file that can be downloaded here:
We are studying the species and weight of animals caught in sites in our study
54
54
area. The dataset is stored as a `.csv` file: each row holds information for a
@@ -93,10 +93,10 @@ Once a library is set up, it can be used or called to perform many tasks.
93
93
94
94
## Pandas in Python
95
95
One of the best options for working with tabular data in Python is to use the
96
-
[Python Data Analysis Library](http://pandas.pydata.org/) (a.k.a. Pandas). The
96
+
[Python Data Analysis Library][pandas] (a.k.a. Pandas). The
97
97
Pandas library provides data structures, produces high quality plots with
98
-
[matplotlib](http://matplotlib.org/) and integrates nicely with other libraries
99
-
that use [NumPy](http://www.numpy.org/) (which is another Python library) arrays.
98
+
[matplotlib][matplotlib] and integrates nicely with other libraries
99
+
that use [NumPy][numpy] (which is another Python library) arrays.
100
100
101
101
Python doesn't load all of the libraries available to it by default. We have to
102
102
add an `import` statement to our code in order to use library functions. To import
@@ -119,9 +119,14 @@ time we call a Pandas function.
119
119
120
120
# Reading CSV Data Using Pandas
121
121
122
-
We will begin by locating and reading our survey data which are in CSV format. CSV stands for Comma-Separated Values and is a common way store formatted data. Other symbols may also be used, so you might see tab-separated, colon-separated or space separated files. It is quite easy to replace one separator with another, to match your application. The first line in the file often has headers to explain what is in each column. CSV (and other separators) make it easy to share data, and can be imported and exported from many applications, including Microsoft Excel. For more details on CSV files, see the [Data Organisation in Spreadsheets](http://www.datacarpentry.org/spreadsheet-ecology-lesson/05-exporting-data/) lesson.
123
-
We can use Pandas' `read_csv` function to pull the file directly into a
0 commit comments