Skip to content

Commit 1a1ab58

Browse files
authored
Apply suggestions from code review
1 parent 15739dc commit 1a1ab58

1 file changed

Lines changed: 5 additions & 8 deletions

File tree

_episodes/02-starting-with-data.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ keypoints:
3030
We can automate the process of performing data manipulations in Python. It's efficient to spend time
3131
building the code to perform these tasks because once it's built, we can use it
3232
over and over on different datasets that use a similar format. This makes our
33-
data manipulation processes easily reproducible. We can also easily share our code with
33+
data manipulation processes reproducible. We can also share our code with
3434
colleagues and they can replicate the same analysis starting with the same original data.
3535

3636
### Starting in the same spot
@@ -40,8 +40,6 @@ This should help us avoid path and file name issues. At this time please
4040
navigate to the workshop directory. If you are working in IPython Notebook be sure
4141
that you start your notebook in the workshop directory.
4242

43-
Note: while there are Python libraries like [OS Library][os-lib] that can work with our
44-
directory structure, that is not our focus today.
4543

4644
### Our Data
4745

@@ -128,7 +126,6 @@ We will begin by locating and reading our survey data which are in CSV format. C
128126
Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so
129127
you might see tab-separated, colon-separated or space separated files. pandas can work with each of these
130128
types of separators, as it allows you to specify the appropriate separator for your data.
131-
The first line in the file often has headers to explain what is in each column.
132129
CSV files (and other -separated value file types) make it easy to share data, and can be imported and exported
133130
from many applications, including Microsoft Excel. For more details on CSV
134131
files, see the [Data Organisation in Spreadsheets][spreadsheet-lesson5] lesson.
@@ -183,8 +180,8 @@ surveys_df = pd.read_csv("data/surveys.csv")
183180
~~~
184181
{: .language-python}
185182

186-
Note: when you assign the imported DataFrame to a variable, Python does not
187-
produce any output on the screen. We can view the value of the `surveys_df`
183+
Note that Python does not produce any output on the screen when you assign the imported DataFrame to a variable.
184+
We can view the value of the `surveys_df`
188185
object by typing its name into the Python command prompt.
189186

190187
~~~
@@ -250,7 +247,7 @@ of data:
250247
Don't worry: all the data is there! You can confirm this by scrolling upwards, or by
251248
looking at the `[# of rows x # of columns]` block at the end of the output.
252249

253-
If you use `.head()` to view only a subset of rows, you will observe an output
250+
You can also use `surveys_df.head()` to view only the first few rows of the dataset in an output
254251
that is easier to fit in one window. After doing this, you can see that pandas has neatly formatted
255252
the data to fit our screen:
256253

@@ -548,7 +545,7 @@ surveys_df.groupby('species_id')['record_id'].count()['DO']
548545
## Basic Math Functions
549546
550547
If we wanted to, we could apply a mathmatical operation like addition or division
551-
on an entire column of our data. For example let's multiply all weight values by 2.
548+
on an entire column of our data. For example, let's multiply all weight values by 2.
552549
553550
~~~
554551
# Multiply all weight values by 2

0 commit comments

Comments
 (0)