Skip to content

Commit 0750fc7

Browse files
committed
added suggested examples and updated guide
1 parent b4773dd commit 0750fc7

2 files changed

Lines changed: 28 additions & 9 deletions

File tree

_episodes/03-index-slice-subset.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -358,8 +358,18 @@ gives the **output**
358358
Remember that Python indexing begins at 0. So, the index location [2, 6]
359359
selects the element that is 3 rows down and 7 columns over in the DataFrame.
360360
361-
It is worth noting that rows are selected when using `loc` with a single list of labels (or `iloc` with a single list of integers). However, unlike `loc` or `iloc`, indexing a data frame directly with labels will select columns, while ranges of integers will select rows. Direct indexing of rows is redundant with using `iloc`, and will raise a `KeyError` if a single integer or list is used; the error will also occur if index labels are used without `loc` (or column labels used with it).
362-
A useful rule of thumb is the following: integer-based slicing is best done with `iloc` and will avoid errors (and is generally consistent with indexing of Numpy arrays), label-based slicing of rows is done with `loc`, and slicing of columns by directly indexing column names.
361+
It is worth noting that rows are selected when using `loc` with a single list of
362+
labels (or `iloc` with a single list of integers). However, unlike `loc` or `iloc`,
363+
indexing a data frame directly with labels will select columns (e.g.
364+
`surveys_df['species_id', 'plot_id', 'weight']`), while ranges of integers will
365+
select rows (e.g. surveys_df[0:13]). Direct indexing of rows is redundant with
366+
using `iloc`, and will raise a `KeyError` if a single integer or list is used; the
367+
error will also occur if index labels are used without `loc` (or column labels used
368+
with it).
369+
A useful rule of thumb is the following: integer-based slicing is best done with
370+
`iloc` and will avoid errors (and is generally consistent with indexing of Numpy
371+
arrays), label-based slicing of rows is done with `loc`, and slicing of columns by
372+
directly indexing column names.
363373
364374
365375
> ## Challenge - Range

_extras/guide.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -192,19 +192,28 @@ previous steps visible.
192192

193193
* What happens when you execute:
194194

195-
`surveys_df[0:3]`
196-
`surveys_df[0:1]` slicing only the first element
197-
`surveys_df[:5]` slicing from first element makes 0 redundant
198-
`surveys_df[-1:]` you can count backwards
195+
- `surveys_df[0:3]`
196+
- `surveys_df[0]` results in a 'KeyError', since direct indexing of a row is redundant with `iloc`
197+
- `surveys_df[0:1]` slicing only the first element
198+
- `surveys_df[:5]` slicing from first element makes 0 redundant
199+
- `surveys_df[-1:]` you can count backwards
199200

200201
*Suggestion*: You can also select every Nth row: `surveys_df[1:10:2]`. So, how to interpret
201202
`surveys_df[::-1]`?
202203

204+
* What happens when you call:
205+
206+
- `surveys_df.iloc[0:1]` returns the first row
207+
- `surveys_df.iloc[0]` returns the first row as a named list
208+
- `surveys_df.iloc[:4, :]` returns all columns of the first four rows
209+
- `surveys_df.iloc[0:4, 1:4]` selects specified columns of the first four rows
210+
- `surveys_df.loc[0:4, 1:4]` results in a 'TypeError'
211+
203212
* What is the difference between `surveys_df.iloc[0:4, 1:4]` and `surveys_df.loc[0:4, 1:4]`?
204213

205-
Check the position, or the name. Cfr. the second is like it would be in a dictionary, asking for
206-
the key-names. Column names 1:4 do not exist, resulting in an error. Check also the difference
207-
between `surveys_df.loc[0:4]` and `surveys_df.iloc[0:4]`
214+
While `iloc` uses integers as indices and slices accordingly, `loc` works with labels. It is
215+
like would be in a dictionary, asking for the key names. Column names 1:4 do not exist,
216+
resulting in an error. Check also the difference between `surveys_df.loc[0:4]` and `surveys_df.iloc[0:4]`.
208217

209218
### Advanced Selection Challenges
210219

0 commit comments

Comments
 (0)