File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -142,15 +142,19 @@ Let's take a look and see how much data each topic explains. We will visualize i
142142
143143``` python
144144import matplotlib.pyplot as plt
145+ import numpy as np
145146
146147# this shows us the amount of dropoff in explanation we have in our sigma matrix.
147148print (svdmodel.explained_variance_ratio_)
148149
149- plt.plot(range (maxDimensions), svdmodel.explained_variance_ratio_ * 100 )
150- plt.xlabel(" Topic Number" )
151- plt.ylabel(" % e xplained" )
152- plt.title(" SVD dropoff" )
153- plt.show() # show first chart
150+ # Calculate cumulative sum of explained variance ratio
151+ cumulative_variance_ratio = np.cumsum(svdmodel.explained_variance_ratio_)
152+
153+ plt.plot(range (1 , maxDimensions + 1 ), cumulative_variance_ratio * 100 )
154+ plt.xlabel(" Number of Topics" )
155+ plt.ylabel(" Cumulative % o f Information Retained" )
156+ plt.ylim(0 , 100 ) # Adjust y-axis limit to 0-100
157+ plt.grid(True ) # Add grid lines
154158```
155159
156160~~~
@@ -164,7 +168,7 @@ plt.show() # show first chart
164168~~~
165169{: .output}
166170
167- ![ Image of drop-off of variance explained] ( ../images/05-svd-dropoff .png )
171+ ![ Image of drop-off of variance explained] ( ../images/cumulative_information_retained_plot .png )
168172
169173Often a heuristic used by researchers to determine a topic count is to look at the dropoff in percentage of data explained by each topic.
170174
You can’t perform that action at this time.
0 commit comments