|
1 | | -# The Carpentries Workbench Template Markdown Lesson |
2 | | - |
3 | | -This lesson is a template lesson that uses [The Carpentries Workbench][workbench]. |
4 | | - |
5 | | -## Note about lesson life cycle stage |
6 | | -Although the `config.yaml` states the life cycle stage as pre-alpha, **the template is stable and ready to use**. The life cycle stage is preset to `"pre-alpha"` as this setting is appropriate for new lessons initialised using the template. |
7 | | - |
8 | | -## Create a new repository from this template |
9 | | - |
10 | | -To use this template to start a new lesson repository, |
11 | | -make sure you're logged into Github. |
12 | | -Visit https://github.com/carpentries/workbench-template-md/generate |
13 | | -and follow the instructions. |
14 | | -Checking the 'Include all branches' option will save some time waiting for the first website build |
15 | | -when your new repository is initialised. |
16 | | - |
17 | | -If you have any questions, contact [@tobyhodges](https://github.com/tobyhodges) |
18 | | - |
19 | | -## Configure a new lesson |
20 | | - |
21 | | -Follow the steps below to |
22 | | -complete the initial configuration of a new lesson repository built from this template: |
23 | | - |
24 | | -1. **Make sure GitHub Pages is activated:** |
25 | | - navigate to _Settings_, |
26 | | - select _Pages_ from the left sidebar, |
27 | | - and make sure that `gh-pages` is selected as the branch to build from. |
28 | | - If no `gh-pages` branch is available, check _Actions_ to see if the first |
29 | | - website build workflows are still running. |
30 | | - The branch should become available when those have completed. |
31 | | -1. **Adjust the `config.yaml` file:** |
32 | | - this file contains global parameters for your lesson site. |
33 | | - Individual fields within the file are documented with comments (beginning with `#`) |
34 | | - At minimum, you should adjust all the fields marked 'FIXME': |
35 | | - - `title` |
36 | | - - `created` |
37 | | - - `keywords` |
38 | | - - `life_cycle` (the default, _pre-alpha_, is the appropriate for brand new lessons) |
39 | | - - `contact` |
40 | | -1. **Annotate the repository** with site URL and topic tags: |
41 | | - navigate back to the repository landing page and |
42 | | - click on the gear wheel/cog icon (similar to ⚙️) |
43 | | - at the top-right of the _About_ box. |
44 | | - Check the "Use your GitHub Pages website" option, |
45 | | - and [add some keywords and other annotations to describe your lesson](https://cdh.carpentries.org/the-carpentries-incubator.html#topic-tags) |
46 | | - in the _Topics_ field. |
47 | | - At minimum, these should include: |
48 | | - - `lesson` |
49 | | - - the life cycle of the lesson (e.g. `pre-alpha`) |
50 | | - - the human language the lesson is written in (e.g. `deutsch`) |
51 | | -1. **Adjust the |
52 | | - `CITATION.cff`, `CODE_OF_CONDUCT.md`, `CONTRIBUTING.md`, and `LICENSE.md` files** |
53 | | - as appropriate for your project. |
54 | | - - `CITATION.cff`: |
55 | | - this file contains information that people can use to cite your lesson, |
56 | | - for example if they publish their own work based on it. |
57 | | - You should [update the CFF][cff-sandpaper-docs] now to include information about your lesson, |
58 | | - and remember to return to it periodicallt, keeping it updated as your |
59 | | - author list grows and other details become available or need to change. |
60 | | - The [Citation File Format home page][cff-home] gives more information about the format, |
61 | | - and the [`cffinit` webtool][cffinit] can be used to create new and update existing CFF files. |
62 | | - - `CODE_OF_CONDUCT.md`: |
63 | | - if you are using this template for a project outside The Carpentries, |
64 | | - you should adjust this file to describe |
65 | | - who should be contacted with Code of Conduct reports, |
66 | | - and how those reports will be handled. |
67 | | - - `CONTRIBUTING.md`: |
68 | | - depending on the current state and maturity of your project, |
69 | | - the contents of the template Contributing Guide may not be appropriate. |
70 | | - You should adjust the file to help guide contributors on how best |
71 | | - to get involved and make an impact on your lesson. |
72 | | - - `LICENSE.md`: |
73 | | - in line with the terms of the CC-BY license, |
74 | | - you should ensure that the copyright information |
75 | | - provided in the license file is accurate for your project. |
76 | | -1. **Update this README with |
77 | | - [relevant information about your lesson](https://carpentries.github.io/lesson-development-training/collaborating-newcomers.html#readme)** |
78 | | - and delete this section. |
79 | | - |
80 | | -[cff-home]: https://citation-file-format.github.io/ |
81 | | -[cff-sandpaper-docs]: https://carpentries.github.io/sandpaper-docs/editing.html#making-your-lesson-citable |
82 | | -[cffinit]: https://citation-file-format.github.io/cff-initializer-javascript/ |
83 | | -[workbench]: https://carpentries.github.io/sandpaper-docs/ |
| 1 | +# Web Scraping with Python |
| 2 | + |
| 3 | +An introduction to web scraping with Python. |
| 4 | + |
| 5 | +## Teaching and contributing |
| 6 | + |
| 7 | +We'd love to know if you are teaching this lesson and the suggestions you have for improving it! |
| 8 | + |
| 9 | +You can do this by submitting an [issue](https://github.com/carpentries-incubator/web-scraping-python/issues) in this repo, or sending an email to [dreamlab\@library.ucsb.edu](mailto:dreamlab@library.ucsb.edu) or [jose_nino\@ucsb.edu](mailto:jose_nino@ucsb.edu). |
| 10 | + |
| 11 | +If you want to know more about contributing to this lesson and other Carpentries efforts, please read the [CONTRIBUTING](./CONTRIBUTING.md) guide. |
| 12 | + |
| 13 | +## About the Lesson |
| 14 | + |
| 15 | +This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes. |
| 16 | + |
| 17 | +Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package. |
| 18 | + |
| 19 | +In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup. |
| 20 | + |
| 21 | +Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium. |
| 22 | + |
| 23 | +This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with: |
| 24 | + |
| 25 | +- Install and import packages and modules |
| 26 | +- Use lists and dictionaries |
| 27 | +- Use conditional statements (if, else, elif) |
| 28 | +- Use for loops |
| 29 | +- Calling functions, understanding parameters/arguments and return values |
| 30 | + |
| 31 | +The rendered version of the lesson is available at: <https://ucsbcarpentry.github.io/web-scraping-python/> |
| 32 | + |
| 33 | +## Maintainer |
| 34 | + |
| 35 | +Current maintainer of this lesson: - [Jose Niño Muriel](https://github.com/josenino95) |
| 36 | + |
| 37 | +## Acknowledgements |
| 38 | + |
| 39 | +Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB. |
| 40 | + |
| 41 | +## Citation |
| 42 | + |
| 43 | +Please cite this lesson using the information in the [CITATION.CFF file](./CITATION.cff) when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material. |
| 44 | + |
| 45 | +## License |
| 46 | + |
| 47 | +The use and adaptation of this instructional content is made available under the [Creative Commons Attribution license - CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Review the [LICENSE.md](./LICENSE.md) file for additional information. |
0 commit comments