Skip to content

Commit 414e4cc

Browse files
Update 09-wordEmbed_train-word2vec.md
1 parent 2adef83 commit 414e4cc

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

episodes/09-wordEmbed_train-word2vec.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,14 +49,14 @@ Mounted at /content/drive
4949
~~~
5050
{: .output}
5151

52-
### Load in the data
53-
Create list of files we'll use for our analysis. We'll start by fitting a word2vec model to just one of the books in our list — Moby Dick.
54-
5552
```python
5653
# pip install necessary to access parse module (called from helpers.py)
5754
!pip install parse
5855
```
5956

57+
### Load in the data
58+
Create list of files we'll use for our analysis. We'll start by fitting a word2vec model to just one of the books in our list — Moby Dick.
59+
6060
Get list of files available to analyze
6161

6262
```python
@@ -428,7 +428,7 @@ tokens_cleaned.shape
428428
{: .output}
429429

430430
### Train Word2Vec model using tokenized text
431-
We can now use this data to train a word2vec model. We'll start by importing the Word2Vec module from gensim. We'll then hand the Word2Vec function our list of tokenized sentences and set sg=0 to use the continuous bag of words (CBOW) training method.
431+
We can now use this data to train a word2vec model. We'll start by importing the Word2Vec module from gensim. We'll then hand the Word2Vec function our list of tokenized sentences and set sg=0 ("skip-gram") to use the continuous bag of words (CBOW) training method.
432432

433433
**Set seed and workers for a fully deterministic run**: Next we'll set some parameters for reproducibility. We'll set the seed so that our vectors get randomly initialized the same way each time this code is run. For a fully deterministically-reproducible run, we'll also limit the model to a single worker thread (workers=1), to eliminate ordering jitter from OS thread scheduling — noted in [gensim's documentation](https://radimrehurek.com/gensim/models/word2vec.html)
434434

0 commit comments

Comments
 (0)