@@ -30,8 +30,8 @@ keypoints:
3030We can automate the process of performing data manipulations in Python. It's efficient to spend time
3131building the code to perform these tasks because once it's built, we can use it
3232over and over on different datasets that use a similar format. This makes our
33- methods easily reproducible. We can also easily share our code with colleagues
34- and they can replicate the same analysis.
33+ data manipulation processes reproducible. We can also share our code with
34+ colleagues and they can replicate the same analysis starting with the same original data .
3535
3636### Starting in the same spot
3737
@@ -40,13 +40,11 @@ This should help us avoid path and file name issues. At this time please
4040navigate to the workshop directory. If you are working in Jupyter Notebook be sure
4141that you start your notebook in the workshop directory.
4242
43- A quick aside that there are Python libraries like [ OS Library] [ os-lib ] that can work with our
44- directory structure, however, that is not our focus today.
4543
4644### Our Data
4745
4846For this lesson, we will be using the Portal Teaching data, a subset of the data
49- from Ernst et al
47+ from Ernst et al.
5048[ Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal,
5149Arizona, USA] [ ernst ] .
5250
@@ -126,10 +124,10 @@ time we call a Pandas function.
126124
127125We will begin by locating and reading our survey data which are in CSV format. CSV stands for
128126Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
129- you might see tab-separated, colon-separated or space separated files. It is quite easy to replace
130- one separator with another, to match your application. The first line in the file often has headers
131- to explain what is in each column. CSV (and other separators ) make it easy to share data, and can be
132- imported and exported from many applications, including Microsoft Excel. For more details on CSV
127+ you might see tab-separated, colon-separated or space separated files. pandas can work with each of these
128+ types of separators, as it allows you to specify the appropriate separator for your data.
129+ CSV files (and other -separated value file types ) make it easy to share data, and can be imported and exported
130+ from many applications, including Microsoft Excel. For more details on CSV
133131files, see the [ Data Organisation in Spreadsheets] [ spreadsheet-lesson5 ] lesson.
134132We can use Pandas' ` read_csv ` function to pull the file directly into a [ DataFrame] [ pd-dataframe ] .
135133
@@ -182,8 +180,8 @@ surveys_df = pd.read_csv("data/surveys.csv")
182180~~~
183181{: .language-python}
184182
185- Notice when you assign the imported DataFrame to a variable, Python does not
186- produce any output on the screen. We can view the value of the ` surveys_df `
183+ Note that Python does not produce any output on the screen when you assign the imported DataFrame to a variable.
184+ We can view the value of the ` surveys_df `
187185object by typing its name into the Python command prompt.
188186
189187~~~
@@ -246,9 +244,12 @@ of data:
246244~~~
247245{: .output}
248246
249- Never fear, all the data is there, if you scroll up. Selecting just a few rows, so it is
250- easier to fit on one window, you can see that pandas has neatly formatted the data to fit
251- our screen:
247+ Don't worry: all the data is there! You can confirm this by scrolling upwards, or by
248+ looking at the ` [# of rows x # of columns] ` block at the end of the output.
249+
250+ You can also use ` surveys_df.head() ` to view only the first few rows of the dataset in an output
251+ that is easier to fit in one window. After doing this, you can see that pandas has neatly formatted
252+ the data to fit our screen:
252253
253254~~~
254255surveys_df.head() # The head() method displays the first several lines of a file. It
@@ -309,9 +310,9 @@ dtype: object
309310~~~
310311{: .output}
311312
312- All the values in a column have the same type. For example, months have type
313- ` int64 ` , which is a kind of integer. Cells in the month column cannot have
314- fractional values, but the weight and hindfoot_length columns can, because they
313+ All the values in a single column have the same type. For example, values in the month
314+ column have type ` int64 ` , which is a kind of integer. Cells in the month column cannot have
315+ fractional values, but values in weight and hindfoot_length columns can, because they
315316have type ` float64 ` . The ` object ` type doesn't have a very helpful name, but in
316317this case it represents strings (such as 'M' and 'F' in the case of sex).
317318
@@ -543,17 +544,18 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
543544
544545## Basic Math Functions
545546
546- If we wanted to, we could perform math on an entire column of our data. For
547- example let's multiply all weight values by 2. A more practical use of this might
548- be to normalize the data according to a mean, area, or some other value
549- calculated from our data.
547+ If we wanted to, we could apply a mathmatical operation like addition or division
548+ on an entire column of our data. For example, let's multiply all weight values by 2.
550549
551550~~~
552551# Multiply all weight values by 2
553552surveys_df[ 'weight'] * 2
554553~~~
555554{: .language-python}
556555
556+ A more practical use of this might be to normalize the data according to a mean, area,
557+ or some other value calculated from our data.
558+
557559# Quick & Easy Plotting Data Using Pandas
558560
559561We can plot our summary stats using Pandas, too.
0 commit comments