Skip to content

Commit 34fd331

Browse files
authored
Merge pull request #395 from maxim-belkin/fix-02
02-starting-with-data.md: wrap long lines, move links to the end
2 parents da5e49f + 0de7c94 commit 34fd331

1 file changed

Lines changed: 30 additions & 13 deletions

File tree

_episodes/02-starting-with-data.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,19 +36,19 @@ This should help us avoid path and file name issues. At this time please
3636
navigate to the workshop directory. If you working in IPython Notebook be sure
3737
that you start your notebook in the workshop directory.
3838

39-
A quick aside that there are Python libraries like [OS
40-
Library](https://docs.python.org/3/library/os.html) that can work with our
39+
A quick aside that there are Python libraries like [OS Library][os-lib] that can work with our
4140
directory structure, however, that is not our focus today.
4241

4342
### Our Data
4443

4544
For this lesson, we will be using the Portal Teaching data, a subset of the data
4645
from Ernst et al
47-
[Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA](http://www.esapubs.org/archive/ecol/E090/118/default.htm)
46+
[Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal,
47+
Arizona, USA][ernst].
4848

49-
We will be using files from the [Portal Project Teaching Database](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459).
49+
We will be using files from the [Portal Project Teaching Database][pptd].
5050
This section will use the `surveys.csv` file that can be downloaded here:
51-
[https://ndownloader.figshare.com/files/2292172](https://ndownloader.figshare.com/files/2292172)
51+
[https://ndownloader.figshare.com/files/2292172][figshare-ndownloader]
5252

5353
We are studying the species and weight of animals caught in sites in our study
5454
area. The dataset is stored as a `.csv` file: each row holds information for a
@@ -93,10 +93,10 @@ Once a library is set up, it can be used or called to perform many tasks.
9393

9494
## Pandas in Python
9595
One of the best options for working with tabular data in Python is to use the
96-
[Python Data Analysis Library](http://pandas.pydata.org/) (a.k.a. Pandas). The
96+
[Python Data Analysis Library][pandas] (a.k.a. Pandas). The
9797
Pandas library provides data structures, produces high quality plots with
98-
[matplotlib](http://matplotlib.org/) and integrates nicely with other libraries
99-
that use [NumPy](http://www.numpy.org/) (which is another Python library) arrays.
98+
[matplotlib][matplotlib] and integrates nicely with other libraries
99+
that use [NumPy][numpy] (which is another Python library) arrays.
100100

101101
Python doesn't load all of the libraries available to it by default. We have to
102102
add an `import` statement to our code in order to use library functions. To import
@@ -119,9 +119,14 @@ time we call a Pandas function.
119119

120120
# Reading CSV Data Using Pandas
121121

122-
We will begin by locating and reading our survey data which are in CSV format. CSV stands for Comma-Separated Values and is a common way store formatted data. Other symbols may also be used, so you might see tab-separated, colon-separated or space separated files. It is quite easy to replace one separator with another, to match your application. The first line in the file often has headers to explain what is in each column. CSV (and other separators) make it easy to share data, and can be imported and exported from many applications, including Microsoft Excel. For more details on CSV files, see the [Data Organisation in Spreadsheets](http://www.datacarpentry.org/spreadsheet-ecology-lesson/05-exporting-data/) lesson.
123-
We can use Pandas' `read_csv` function to pull the file directly into a
124-
[DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe).
122+
We will begin by locating and reading our survey data which are in CSV format. CSV stands for
123+
Comma-Separated Values and is a common way store formatted data. Other symbols may also be used, so
124+
you might see tab-separated, colon-separated or space separated files. It is quite easy to replace
125+
one separator with another, to match your application. The first line in the file often has headers
126+
to explain what is in each column. CSV (and other separators) make it easy to share data, and can be
127+
imported and exported from many applications, including Microsoft Excel. For more details on CSV
128+
files, see the [Data Organisation in Spreadsheets][spreadsheet-lesson5] lesson.
129+
We can use Pandas' `read_csv` function to pull the file directly into a [DataFrame][pd-dataframe].
125130

126131
## So What's a DataFrame?
127132

@@ -333,7 +338,7 @@ Let's look at the data using these.
333338
> 2. `surveys_df.shape` Take note of the output of `shape` - what format does it
334339
> return the shape of the DataFrame in?
335340
>
336-
> HINT: [More on tuples, here](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences).
341+
> HINT: [More on tuples, here][python-datastructures].
337342
> 3. `surveys_df.head()` Also, what does `surveys_df.head(15)` do?
338343
> 4. `surveys_df.tail()`
339344
{: .challenge}
@@ -580,7 +585,7 @@ total_count.plot(kind='bar');
580585
> being sex. The plot should show total weight by sex for each site. Some
581586
> tips are below to help you solve this challenge:
582587
>
583-
> * [For more on Pandas plots, visit this link.](http://pandas.pydata.org/pandas-docs/stable/visualization.html#basic-plotting-plot)
588+
> * For more on Pandas plots, visit this [link][pandas-plot].
584589
> * You can use the code that follows to create a stacked bar plot but the data to stack
585590
> need to be in individual columns. Here's a simple example with some data where
586591
> 'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups.
@@ -688,5 +693,17 @@ total_count.plot(kind='bar');
688693
> {: .solution}
689694
{: .challenge}
690695
696+
[ernst]: http://www.esapubs.org/archive/ecol/E090/118/default.htm
697+
[figshare-ndownloader]: https://ndownloader.figshare.com/files/2292172
698+
[os-lib]: https://docs.python.org/3/library/os.html
699+
[matplotlib]: https://matplotlib.org
700+
[numpy]: https://www.numpy.org/
701+
[pandas]: https://pandas.pydata.org
702+
[pandas-plot]: http://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#basic-plotting-plot
703+
[pd-dataframe]: https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe
704+
[pptd]: https://figshare.com/articles/Portal_Project_Teaching_Database/1314459
705+
[python-datastructures]: https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
706+
[spreadsheet-lesson5]: http://www.datacarpentry.org/spreadsheet-ecology-lesson/05-exporting-data
707+
691708
{% include links.md %}
692709

0 commit comments

Comments
 (0)