@@ -30,7 +30,7 @@ keypoints:
3030We can automate the process of performing data manipulations in Python. It's efficient to spend time
3131building the code to perform these tasks because once it's built, we can use it
3232over and over on different datasets that use a similar format. This makes our
33- data manipulation processes easily reproducible. We can also easily share our code with
33+ data manipulation processes reproducible. We can also share our code with
3434colleagues and they can replicate the same analysis starting with the same original data.
3535
3636### Starting in the same spot
@@ -40,8 +40,6 @@ This should help us avoid path and file name issues. At this time please
4040navigate to the workshop directory. If you are working in IPython Notebook be sure
4141that you start your notebook in the workshop directory.
4242
43- Note: while there are Python libraries like [ OS Library] [ os-lib ] that can work with our
44- directory structure, that is not our focus today.
4543
4644### Our Data
4745
@@ -128,7 +126,6 @@ We will begin by locating and reading our survey data which are in CSV format. C
128126Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
129127you might see tab-separated, colon-separated or space separated files. pandas can work with each of these
130128types of separators, as it allows you to specify the appropriate separator for your data.
131- The first line in the file often has headers to explain what is in each column.
132129CSV files (and other -separated value file types) make it easy to share data, and can be imported and exported
133130from many applications, including Microsoft Excel. For more details on CSV
134131files, see the [ Data Organisation in Spreadsheets] [ spreadsheet-lesson5 ] lesson.
@@ -183,8 +180,8 @@ surveys_df = pd.read_csv("data/surveys.csv")
183180~~~
184181{: .language-python}
185182
186- Note: when you assign the imported DataFrame to a variable, Python does not
187- produce any output on the screen. We can view the value of the ` surveys_df `
183+ Note that Python does not produce any output on the screen when you assign the imported DataFrame to a variable.
184+ We can view the value of the ` surveys_df `
188185object by typing its name into the Python command prompt.
189186
190187~~~
@@ -250,7 +247,7 @@ of data:
250247Don't worry: all the data is there! You can confirm this by scrolling upwards, or by
251248looking at the ` [# of rows x # of columns] ` block at the end of the output.
252249
253- If you use ` .head() ` to view only a subset of rows, you will observe an output
250+ You can also use ` surveys_df .head()` to view only the first few rows of the dataset in an output
254251that is easier to fit in one window. After doing this, you can see that pandas has neatly formatted
255252the data to fit our screen:
256253
@@ -548,7 +545,7 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
548545## Basic Math Functions
549546
550547If we wanted to, we could apply a mathmatical operation like addition or division
551- on an entire column of our data. For example let's multiply all weight values by 2.
548+ on an entire column of our data. For example, let's multiply all weight values by 2.
552549
553550~~~
554551# Multiply all weight values by 2
0 commit comments