Skip to content

Commit 85bfd0c

Browse files
committed
Updated readme, citation, and contributing
1 parent d7ea41f commit 85bfd0c

3 files changed

Lines changed: 82 additions & 118 deletions

File tree

CITATION.cff

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,37 @@
1-
# This template CITATION.cff file was generated with cffinit.
2-
# Visit https://bit.ly/cffinit to replace its contents
3-
# with information about your lesson.
4-
# Remember to update this file periodically,
5-
# ensuring that the author list and other fields remain accurate.
1+
# This CITATION.cff file was generated with cffinit.
2+
# Visit https://bit.ly/cffinit to generate yours today!
63

74
cff-version: 1.2.0
8-
title: FIXME
5+
title: Web Scraping with Python
96
message: >-
107
Please cite this lesson using the information in this file
118
when you refer to it in publications, and/or if you
129
re-use, adapt, or expand on the content in your own
1310
training material.
1411
type: dataset
1512
authors:
16-
- given-names: FIXME
17-
family-names: FIXME
13+
- given-names: Jose David
14+
family-names: Niño Muriel
15+
email: jose_nino@ucsb.edu
16+
affiliation: 'University of California, Santa Barbara'
17+
- name: DREAM Lab - UCSB Library
18+
address: Bldg. 525 UCEN Road
19+
city: Santa Barbara
20+
country: US
21+
post-code: '93106'
22+
email: dreamlab@library.ucsb.edu
23+
website: 'https://www.library.ucsb.edu/dreamlab'
1824
abstract: >-
19-
FIXME Replace this with a short abstract describing the
20-
lesson, e.g. its target audience and main intended
21-
learning objectives.
25+
This lesson teaches people with basic Python knowledge the
26+
tools and libraries to do web scraping, which means
27+
extracting data from websites
28+
keywords:
29+
- web scraping
30+
- python
31+
- BeautifulSoup
32+
- b4s
33+
- Selenium
34+
- requests
2235
license: CC-BY-4.0
36+
version: '1.0'
37+
date-released: '2025-06-11'

CONTRIBUTING.md

Lines changed: 9 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -59,32 +59,18 @@ are particularly valuable**: it's easy for people who have been using these
5959
lessons for a while to forget how impenetrable some of this material can be, so
6060
fresh eyes are always welcome.
6161

62-
### What *Not* to Contribute
63-
64-
Our lessons already contain more material than we can cover in a typical
65-
workshop, so we are usually *not* looking for more concepts or tools to add to
66-
them. As a rule, if you want to introduce a new idea, you must (a) estimate how
67-
long it will take to teach and (b) explain what you would take out to make room
68-
for it. The first encourages contributors to be honest about requirements; the
69-
second, to think hard about priorities.
70-
71-
We are also not looking for exercises or other material that only run on one
72-
platform. Our workshops typically contain a mixture of Windows, macOS, and
73-
Linux users; in order to be usable, our lessons must run equally well on all
74-
three.
75-
7662
### Using GitHub
7763

7864
If you choose to contribute via GitHub, you may want to look at [How to
7965
Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we
8066
use [GitHub flow][github-flow] to manage changes:
8167

82-
1. Create a new branch in your desktop copy of this repository for each
83-
significant change.
84-
2. Commit the change in that branch.
85-
3. Push that branch to your fork of this repository on GitHub.
86-
4. Submit a pull request from that branch to the [upstream repository][repo].
87-
5. If you receive feedback, make changes on your desktop and push to your
68+
1. Create a fork of this repo and make a desktop copy of it.
69+
2. Create a new branch in your desktop copy.
70+
3. Commit the change in that branch.
71+
4. Push that branch to your fork of this repository on GitHub.
72+
5. Submit a pull request from that branch to the [upstream repository][repo].
73+
6. If you receive feedback, make changes on your desktop and push to your
8874
branch on GitHub: the pull request will update automatically.
8975

9076
NB: The published copy of the lesson is usually in the `main` branch.
@@ -102,14 +88,13 @@ community listed at <https://carpentries.org/connect/> including via social
10288
media, slack, newsletters, and email lists. You can also [reach us by
10389
email][contact].
10490

105-
[repo]: https://example.com/FIXME
106-
[repo-issues]: https://example.com/FIXME/issues
107-
[contact]: mailto:team@carpentries.org
91+
[repo]: https://github.com/carpentries-incubator/web-scraping-python
92+
[repo-issues]: https://github.com/carpentries-incubator/web-scraping-python/issues
93+
[contact]: mailto:dreamlab@library.ucsb.edu
10894
[cp-site]: https://carpentries.org/
10995
[dc-issues]: https://github.com/issues?q=user%3Adatacarpentry
11096
[dc-lessons]: https://datacarpentry.org/lessons/
11197
[dc-site]: https://datacarpentry.org/
112-
[discuss-list]: https://carpentries.topicbox.com/groups/discuss
11398
[github]: https://github.com
11499
[github-flow]: https://guides.github.com/introduction/flow/
115100
[github-join]: https://github.com/join

README.md

Lines changed: 47 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,47 @@
1-
# The Carpentries Workbench Template Markdown Lesson
2-
3-
This lesson is a template lesson that uses [The Carpentries Workbench][workbench].
4-
5-
## Note about lesson life cycle stage
6-
Although the `config.yaml` states the life cycle stage as pre-alpha, **the template is stable and ready to use**. The life cycle stage is preset to `"pre-alpha"` as this setting is appropriate for new lessons initialised using the template.
7-
8-
## Create a new repository from this template
9-
10-
To use this template to start a new lesson repository,
11-
make sure you're logged into Github.
12-
Visit https://github.com/carpentries/workbench-template-md/generate
13-
and follow the instructions.
14-
Checking the 'Include all branches' option will save some time waiting for the first website build
15-
when your new repository is initialised.
16-
17-
If you have any questions, contact [@tobyhodges](https://github.com/tobyhodges)
18-
19-
## Configure a new lesson
20-
21-
Follow the steps below to
22-
complete the initial configuration of a new lesson repository built from this template:
23-
24-
1. **Make sure GitHub Pages is activated:**
25-
navigate to _Settings_,
26-
select _Pages_ from the left sidebar,
27-
and make sure that `gh-pages` is selected as the branch to build from.
28-
If no `gh-pages` branch is available, check _Actions_ to see if the first
29-
website build workflows are still running.
30-
The branch should become available when those have completed.
31-
1. **Adjust the `config.yaml` file:**
32-
this file contains global parameters for your lesson site.
33-
Individual fields within the file are documented with comments (beginning with `#`)
34-
At minimum, you should adjust all the fields marked 'FIXME':
35-
- `title`
36-
- `created`
37-
- `keywords`
38-
- `life_cycle` (the default, _pre-alpha_, is the appropriate for brand new lessons)
39-
- `contact`
40-
1. **Annotate the repository** with site URL and topic tags:
41-
navigate back to the repository landing page and
42-
click on the gear wheel/cog icon (similar to ⚙️)
43-
at the top-right of the _About_ box.
44-
Check the "Use your GitHub Pages website" option,
45-
and [add some keywords and other annotations to describe your lesson](https://cdh.carpentries.org/the-carpentries-incubator.html#topic-tags)
46-
in the _Topics_ field.
47-
At minimum, these should include:
48-
- `lesson`
49-
- the life cycle of the lesson (e.g. `pre-alpha`)
50-
- the human language the lesson is written in (e.g. `deutsch`)
51-
1. **Adjust the
52-
`CITATION.cff`, `CODE_OF_CONDUCT.md`, `CONTRIBUTING.md`, and `LICENSE.md` files**
53-
as appropriate for your project.
54-
- `CITATION.cff`:
55-
this file contains information that people can use to cite your lesson,
56-
for example if they publish their own work based on it.
57-
You should [update the CFF][cff-sandpaper-docs] now to include information about your lesson,
58-
and remember to return to it periodicallt, keeping it updated as your
59-
author list grows and other details become available or need to change.
60-
The [Citation File Format home page][cff-home] gives more information about the format,
61-
and the [`cffinit` webtool][cffinit] can be used to create new and update existing CFF files.
62-
- `CODE_OF_CONDUCT.md`:
63-
if you are using this template for a project outside The Carpentries,
64-
you should adjust this file to describe
65-
who should be contacted with Code of Conduct reports,
66-
and how those reports will be handled.
67-
- `CONTRIBUTING.md`:
68-
depending on the current state and maturity of your project,
69-
the contents of the template Contributing Guide may not be appropriate.
70-
You should adjust the file to help guide contributors on how best
71-
to get involved and make an impact on your lesson.
72-
- `LICENSE.md`:
73-
in line with the terms of the CC-BY license,
74-
you should ensure that the copyright information
75-
provided in the license file is accurate for your project.
76-
1. **Update this README with
77-
[relevant information about your lesson](https://carpentries.github.io/lesson-development-training/collaborating-newcomers.html#readme)**
78-
and delete this section.
79-
80-
[cff-home]: https://citation-file-format.github.io/
81-
[cff-sandpaper-docs]: https://carpentries.github.io/sandpaper-docs/editing.html#making-your-lesson-citable
82-
[cffinit]: https://citation-file-format.github.io/cff-initializer-javascript/
83-
[workbench]: https://carpentries.github.io/sandpaper-docs/
1+
# Web Scraping with Python
2+
3+
An introduction to web scraping with Python.
4+
5+
## Teaching and contributing
6+
7+
We'd love to know if you are teaching this lesson and the suggestions you have for improving it!
8+
9+
You can do this by submitting an [issue](https://github.com/carpentries-incubator/web-scraping-python/issues) in this repo, or sending an email to [dreamlab\@library.ucsb.edu](mailto:dreamlab@library.ucsb.edu) or [jose_nino\@ucsb.edu](mailto:jose_nino@ucsb.edu).
10+
11+
If you want to know more about contributing to this lesson and other Carpentries efforts, please read the [CONTRIBUTING](./CONTRIBUTING.md) guide.
12+
13+
## About the Lesson
14+
15+
This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.
16+
17+
Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.
18+
19+
In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.
20+
21+
Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.
22+
23+
This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:
24+
25+
- Install and import packages and modules
26+
- Use lists and dictionaries
27+
- Use conditional statements (if, else, elif)
28+
- Use for loops
29+
- Calling functions, understanding parameters/arguments and return values
30+
31+
The rendered version of the lesson is available at: <https://ucsbcarpentry.github.io/web-scraping-python/>
32+
33+
## Maintainer
34+
35+
Current maintainer of this lesson: - [Jose Niño Muriel](https://github.com/josenino95)
36+
37+
## Acknowledgements
38+
39+
Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.
40+
41+
## Citation
42+
43+
Please cite this lesson using the information in the [CITATION.CFF file](./CITATION.cff) when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.
44+
45+
## License
46+
47+
The use and adaptation of this instructional content is made available under the [Creative Commons Attribution license - CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Review the [LICENSE.md](./LICENSE.md) file for additional information.

0 commit comments

Comments
 (0)