|
2 | 2 | title: Setup |
3 | 3 | --- |
4 | 4 |
|
5 | | -FIXME: |
6 | | -- Python, Jupyter notebook, libraries: b4s, requests, selenium, pandas |
7 | | -- Google Chrome |
| 5 | +In this workshop you will learn how to extract data from websites, what you'd call web scraping, using Python. In Episode 1 we begin by reviewing the structure of websites in HTML and how to retrieve information from it using your browser and the `BeautifulSoup` package. In Episode 2 we'll dive deep on how to get the HTML behind any website using the `requests` package and how to parse and find information with `BeautifulSoup`. At the end,you’ll learn about the differences between static and dynamic webpages, and how to scrape the latter with the `Selenium` package. |
8 | 6 |
|
9 | | -## Data Sets |
| 7 | +This workshop is designed for participants who already have a basic understanding of Python programming. In particular, it's best to know how to: |
10 | 8 |
|
11 | | -<!-- |
12 | | -FIXME: place any data you want learners to use in `episodes/data` and then use |
13 | | - a relative link ( [data zip file](data/lesson-data.zip) ) to provide a |
14 | | - link to it, replacing the example.com link. |
15 | | ---> |
16 | | -Download the [data zip file](https://example.com/FIXME) and unzip it to your Desktop |
| 9 | +- Install and import packages and modules |
| 10 | +- Use lists and dictionaries |
| 11 | +- Use conditional statements (`if`, `else`, `elif`) |
| 12 | +- Use `for` loops |
| 13 | +- Calling functions, understanding parameters/arguments and return values |
17 | 14 |
|
18 | 15 | ## Software Setup |
19 | 16 |
|
20 | | -::::::::::::::::::::::::::::::::::::::: discussion |
21 | | - |
22 | | -### Details |
23 | | - |
24 | | -Setup for different systems can be presented in dropdown menus via a `spoiler` |
25 | | -tag. They will join to this discussion block, so you can give a general overview |
26 | | -of the software used in this lesson here and fill out the individual operating |
27 | | -systems (and potentially add more, e.g. online setup) in the solutions blocks. |
28 | | - |
29 | | -::::::::::::::::::::::::::::::::::::::::::::::::::: |
30 | | - |
31 | | -:::::::::::::::: spoiler |
32 | | - |
33 | | -### Windows |
34 | | - |
35 | | -Use PuTTY |
36 | | - |
37 | | -:::::::::::::::::::::::: |
38 | | - |
39 | | -:::::::::::::::: spoiler |
40 | | - |
41 | | -### MacOS |
42 | | - |
43 | | -Use Terminal.app |
44 | | - |
45 | | -:::::::::::::::::::::::: |
46 | | - |
47 | | - |
48 | | -:::::::::::::::: spoiler |
49 | | - |
50 | | -### Linux |
51 | | - |
52 | | -Use Terminal |
53 | | - |
54 | | -:::::::::::::::::::::::: |
| 17 | +Steps: |
55 | 18 |
|
| 19 | +1. If you already have Anaconda, Jupyter Lab or Jupyter Notebooks installed in your computer, skip to step 2. Follow Miniforge's [download](https://github.com/conda-forge/miniforge?tab=readme-ov-file#download) and [installation](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) instructions for your respective operating system. If you are using a Windows machine, make sure you mark the option to "Add Miniforge3 to my PATH environment variable". |
| 20 | +2. If you are using Mac or Linux, open the 'Terminal'. If you are using Windows, open the 'Command Prompt' or 'Miniforge Prompt'. |
| 21 | +3. Activate the base conda environment by typing and running the 'conda activate' command. |
| 22 | +4. Install the necessary packages running 'pip install requests beautifulsoup4 selenium webdriver-manager pandas tqdm jupyterlab'. |
0 commit comments