You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/introduction.md
+29-7Lines changed: 29 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ We'll refresh some of the concepts covered there to have a practical understandi
25
25
26
26
## HTML quick overview
27
27
28
-
All websites have Hypertext Markup Language (HTML) behind them. The following text is HTML for a very simple website, with only three sentences. If you read it, can you imagine how that website looks?
28
+
All websites have a Hypertext Markup Language (HTML) document behind them. The following text is HTML for a very simple website, with only three sentences. If you read it, can you imagine how that website looks?
29
29
30
30
```html
31
31
<!DOCTYPE html>
@@ -34,13 +34,13 @@ All websites have Hypertext Markup Language (HTML) behind them. The following te
34
34
<title>Sample web page</title>
35
35
</head>
36
36
<body>
37
-
<h1>h1 Header #1</h1>
37
+
<h1id="head1">h1 Header #1</h1>
38
38
<p>This is a paragraph tag</p>
39
-
<h2>h2 Sub-header</h2>
39
+
<h2id="subhead1">h2 Sub-header</h2>
40
40
<p>A new paragraph, now in the <b>sub-header</b></p>
41
-
<h1>h1 Header #2</h1>
41
+
<h1id="head2">h1 Header #2</h1>
42
42
<p>
43
-
This other paragraph has two hyperlinks,
43
+
This other paragraph has two hyperlinks,
44
44
one to <ahref="https://carpentries.org/">The Carpentries homepage</a>,
@@ -53,12 +53,34 @@ Well, if you put that text in a file with a .html extension, the job of your web
53
53
54
54
{alt="Screenshot of a simple website with the previews HTML"}
55
55
56
-
HTML is composed of tags
56
+
An HTML document is composed of elements, which can be identified by tags written inside angle brackets (`<` and `>`). For example, the HTML root element, which delimits the beginning and end of an HTML document, is identified by the `<html>` tag.
57
57
58
+
Most elements have both a opening and a closing tag, determining the span of the element. In the previous simple website, we see a head element that goes from the opening tag `<head>` up to the closing tag `</head>`. Given than an element can be inside another element, an HTML document has a tree structure, where every element is a node that can contain child nodes, like the following image shows.
58
59
60
+
{alt="Screenshot of a simple website with the previews HTML"}
59
61
62
+
Finally, we can define or modify the behavior, appeareance, or functionality of an element by using attributes. Attributes are inside the opening tag, and consist of a name and a value, formatted as `name="value"`. For example, we can give an unique id to any element using the `id` attribute.
63
+
64
+
Here is a non-exhaustive list of elements you'll find in HTML and their purpose:
65
+
66
+
-`<hmtl>...</html>` The root, which contains the entirety of the document.
67
+
-`<head>...</head>` Contains metadata, for example, the title that the web browser displays.
68
+
-`<body>...</body>` The content that is going to be displayed.
69
+
-`<h1>...</h1>, <h2>...</h2>, <h3>...</h3>` Defines headers of level 1, 2, 3, etc.
70
+
-`<p>...</p>` A paragraph.
71
+
-`<a href="">...</a>` Creates a hyperlink, and we provide the destination URL with the `href` attribute.
72
+
-`<img src="" alt="">` Embedds an image, giving a source to the image with the `src` attribute and specifying alternate text with `alt`.
73
+
-`<table>...</table>, <th>...</th>, <tr>...</tr>, <td>...</td>` Defines a table, that as children will have a header (defined inside `th`), rows (defined inside `tr`), and a cell inside a row (as `td`).
74
+
-`<div>...</div>` Is used to group sections of HTML content.
75
+
-`<script>...</script>` Embeds or references JavaScript code.
76
+
77
+
To summarize, an *element* is identified by *tags* , and we can assign properties to an element by using *attributes*. Knowing this about HTML will make our lifes easier when trying to get some specific data from a website.
78
+
79
+
80
+
## Parsing HTML with BeautifulSoup
81
+
82
+
Now that we know how a website is structured, we can start extracting information from it
60
83
61
-
## Introduction
62
84
63
85
This is a lesson created via The Carpentries Workbench. It is written in
64
86
[Pandoc-flavored Markdown](https://pandoc.org/MANUAL.txt) for static files and
0 commit comments