Skip to content

Commit 5dcb901

Browse files
Update 06-lsa.md
1 parent 08dbdc5 commit 5dcb901

1 file changed

Lines changed: 10 additions & 6 deletions

File tree

episodes/06-lsa.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -142,15 +142,19 @@ Let's take a look and see how much data each topic explains. We will visualize i
142142

143143
```python
144144
import matplotlib.pyplot as plt
145+
import numpy as np
145146

146147
#this shows us the amount of dropoff in explanation we have in our sigma matrix.
147148
print(svdmodel.explained_variance_ratio_)
148149

149-
plt.plot(range(maxDimensions), svdmodel.explained_variance_ratio_ * 100)
150-
plt.xlabel("Topic Number")
151-
plt.ylabel("% explained")
152-
plt.title("SVD dropoff")
153-
plt.show() # show first chart
150+
# Calculate cumulative sum of explained variance ratio
151+
cumulative_variance_ratio = np.cumsum(svdmodel.explained_variance_ratio_)
152+
153+
plt.plot(range(1, maxDimensions + 1), cumulative_variance_ratio * 100)
154+
plt.xlabel("Number of Topics")
155+
plt.ylabel("Cumulative % of Information Retained")
156+
plt.ylim(0, 100) # Adjust y-axis limit to 0-100
157+
plt.grid(True) # Add grid lines
154158
```
155159

156160
~~~
@@ -164,7 +168,7 @@ plt.show() # show first chart
164168
~~~
165169
{: .output}
166170

167-
![Image of drop-off of variance explained](../images/05-svd-dropoff.png)
171+
![Image of drop-off of variance explained](../images/cumulative_information_retained_plot.png)
168172

169173
Often a heuristic used by researchers to determine a topic count is to look at the dropoff in percentage of data explained by each topic.
170174

0 commit comments

Comments
 (0)