@@ -30,8 +30,8 @@ keypoints:
3030We can automate the process of performing data manipulations in Python. It's efficient to spend time
3131building the code to perform these tasks because once it's built, we can use it
3232over and over on different datasets that use a similar format. This makes our
33- methods easily reproducible. We can also easily share our code with colleagues
34- and they can replicate the same analysis.
33+ data manipulation processes easily reproducible. We can also easily share our code with
34+ colleagues and they can replicate the same analysis starting with the same original data .
3535
3636### Starting in the same spot
3737
@@ -40,13 +40,13 @@ This should help us avoid path and file name issues. At this time please
4040navigate to the workshop directory. If you are working in IPython Notebook be sure
4141that you start your notebook in the workshop directory.
4242
43- A quick aside that there are Python libraries like [ OS Library] [ os-lib ] that can work with our
44- directory structure, however, that is not our focus today.
43+ Note: while there are Python libraries like [ OS Library] [ os-lib ] that can work with our
44+ directory structure, that is not our focus today.
4545
4646### Our Data
4747
4848For this lesson, we will be using the Portal Teaching data, a subset of the data
49- from Ernst et al
49+ from Ernst et al.
5050[ Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal,
5151Arizona, USA] [ ernst ] .
5252
@@ -126,10 +126,11 @@ time we call a Pandas function.
126126
127127We will begin by locating and reading our survey data which are in CSV format. CSV stands for
128128Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
129- you might see tab-separated, colon-separated or space separated files. It is quite easy to replace
130- one separator with another, to match your application. The first line in the file often has headers
131- to explain what is in each column. CSV (and other separators) make it easy to share data, and can be
132- imported and exported from many applications, including Microsoft Excel. For more details on CSV
129+ you might see tab-separated, colon-separated or space separated files. pandas can work with each of these
130+ types of separators, as it allows you to specify the appropriate separator for your data.
131+ The first line in the file often has headers to explain what is in each column.
132+ CSV files (and other -separated value file types) make it easy to share data, and can be imported and exported
133+ from many applications, including Microsoft Excel. For more details on CSV
133134files, see the [ Data Organisation in Spreadsheets] [ spreadsheet-lesson5 ] lesson.
134135We can use Pandas' ` read_csv ` function to pull the file directly into a [ DataFrame] [ pd-dataframe ] .
135136
@@ -182,7 +183,7 @@ surveys_df = pd.read_csv("data/surveys.csv")
182183~~~
183184{: .language-python}
184185
185- Notice when you assign the imported DataFrame to a variable, Python does not
186+ Note: when you assign the imported DataFrame to a variable, Python does not
186187produce any output on the screen. We can view the value of the ` surveys_df `
187188object by typing its name into the Python command prompt.
188189
@@ -246,9 +247,12 @@ of data:
246247~~~
247248{: .output}
248249
249- Never fear, all the data is there, if you scroll up. Selecting just a few rows, so it is
250- easier to fit on one window, you can see that pandas has neatly formatted the data to fit
251- our screen:
250+ Don't worry: all the data is there! You can confirm this by scrolling upwards, or by
251+ looking at the ` [# of rows x # of columns] ` block at the end of the output.
252+
253+ If you use ` .head() ` to view only a subset of rows, you will observe an output
254+ that is easier to fit in one window. After doing this, you can see that pandas has neatly formatted
255+ the data to fit our screen:
252256
253257~~~
254258surveys_df.head() # The head() method displays the first several lines of a file. It
@@ -309,9 +313,9 @@ dtype: object
309313~~~
310314{: .output}
311315
312- All the values in a column have the same type. For example, months have type
313- ` int64 ` , which is a kind of integer. Cells in the month column cannot have
314- fractional values, but the weight and hindfoot_length columns can, because they
316+ All the values in a single column have the same type. For example, values in the month
317+ column have type ` int64 ` , which is a kind of integer. Cells in the month column cannot have
318+ fractional values, but values in weight and hindfoot_length columns can, because they
315319have type ` float64 ` . The ` object ` type doesn't have a very helpful name, but in
316320this case it represents strings (such as 'M' and 'F' in the case of sex).
317321
@@ -543,9 +547,9 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
543547
544548## Basic Math Functions
545549
546- If we wanted to, we could perform math on an entire column of our data. For
547- example let's multiply all weight values by 2. A more practical use of this might
548- be to normalize the data according to a mean, area, or some other value
550+ If we wanted to, we could apply a mathmatical operation like addition or division
551+ on an entire column of our data. For example let's multiply all weight values by 2.
552+ A more practical use of this might be to normalize the data according to a mean, area, or some other value
549553calculated from our data.
550554
551555~~~
0 commit comments