You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learners/setup.md
+41-17Lines changed: 41 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,35 +21,59 @@ This workshop is intended for learners who already have a basic understanding of
21
21
22
22
## Software Setup
23
23
24
-
Steps:
24
+
To run the code in this workshop, you will need to install:
25
25
26
-
1. If you already have Anaconda, Jupyter Lab or Jupyter Notebooks installed in your computer, skip to step 2. Follow Miniforge's [download](https://github.com/conda-forge/miniforge?tab=readme-ov-file#download) and [installation](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) instructions for your respective operating system. If you are using a Windows machine, make sure you mark the option to "Add Miniforge3 to my PATH environment variable".
27
-
2. If you are using Mac or Linux, open the 'Terminal'. If you are using Windows, open the 'Command Prompt' or 'Miniforge Prompt'.
28
-
3. Activate the base conda environment by typing and running the code below to activate your environment.
26
+
-**The following Python libraries:**`requests, beautifulsoup4, selenium, webdriver-manager, pandas, tqdm, jupyterlab`.
27
+
-**Google Chrome:** Please install the latest version of the Google Chrome web browser, as we'll use its web developer tools. If you already have it, please check for updates by visiting `chrome://settings/help` in Chrome.
29
28
30
-
```terminal
31
-
conda activate
29
+
If you already have a preferred workflow for managing Python environments (e.g., Conda or venv), you may proceed as you normally do. However, if you are new to this or want a hassle-free setup, we highly recommend using `pixi` instructions below.
30
+
31
+
32
+
### Setting up your environment with `pixi`
33
+
34
+
As described in their website, `pixi` is a cross-platform, multi-language (including Python and R) package manager and workflow tool built on the foundation of the conda ecosystem. In short, it is a tool that simplifies installing software and managing libraries (packages).
35
+
36
+
Steps to configure your workshop environment::
37
+
38
+
1.**Install `pixi`:**Follow the instructions for your operating system here [https://pixi.prefix.dev/latest/installation/](https://pixi.prefix.dev/latest/installation/).
39
+
40
+
- Note: Once the installation finishes, restart your Terminal (close it and open it again) to make sure the `pixi` command is recognized.
41
+
42
+
2.**Navigate to your folder:** In your Terminal, use the `cd` command to move to the folder where you want to keep your workshop files (e.g., `cd Desktop` or `cd Documents`).
43
+
44
+
3.**Initialize the project:** Run the following command to create a new folder named `webscraping` with the necessary configuration files
45
+
46
+
```bash
47
+
pixi init webscraping
48
+
```
49
+
50
+
4.**Enter the folder:** Move into the newly created project folder
6.**Start JupyterLab:** Launch the notebook interface by running
63
+
64
+
```bash
65
+
pixi run jupyter lab
42
66
```
43
67
44
-
6. In a new Jupyter Notebook run the following code in a cell to check the necessary libraries can be loaded:
68
+
7.**Verify your setup:** Inside JupyterLab, create a new Notebook (File > New > Notebook), copy the code below into a cell, and run it by pressing <kbd>Shift</kbd>+<kbd>Enter</kbd>
69
+
45
70
```python
46
-
from bs4 import BeautifulSoup
47
-
import requests
48
71
from selenium import webdriver
49
-
from selenium.webdriver.common.by import By
50
-
import pandas as pd
72
+
driver = webdriver.Chrome()
51
73
```
52
74
75
+
You are now ready for the workshop! Learn more about pixi by reading their [documentation](https://pixi.prefix.dev/latest/).
76
+
53
77
## Additional resources
54
78
- Mitchell, R. (Ryan E. ). (2024). Web scraping with Python : data extraction from the modern web (3rd edition.). O’Reilly Media, Inc.
55
79
- Chapagain, A. (2023). Hands-On Web Scraping with Python : Extract Quality Data from the Web Using Effective Python Techniques (Second edition.). Packt Publishing.
0 commit comments