Skip to content

Commit 2353f4c

Browse files
authored
v0.16.3 (#2093)
1 parent e07be02 commit 2353f4c

5 files changed

Lines changed: 129 additions & 103 deletions

File tree

bertopic/_bertopic.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2387,6 +2387,7 @@ def visualize_topics(
23872387
self,
23882388
topics: List[int] = None,
23892389
top_n_topics: int = None,
2390+
use_ctfidf: bool = False,
23902391
custom_labels: bool = False,
23912392
title: str = "<b>Intertopic Distance Map</b>",
23922393
width: int = 650,
@@ -2403,6 +2404,7 @@ def visualize_topics(
24032404
For example, if you want to visualize only topics 1 through 5:
24042405
`topics = [1, 2, 3, 4, 5]`.
24052406
top_n_topics: Only select the top n most frequent topics
2407+
use_ctfidf: Whether to use c-TF-IDF representations instead of the embeddings from the embedding model.
24062408
custom_labels: Whether to use custom topic labels that were defined using
24072409
`topic_model.set_topic_labels`.
24082410
title: Title of the plot.
@@ -2428,6 +2430,7 @@ def visualize_topics(
24282430
self,
24292431
topics=topics,
24302432
top_n_topics=top_n_topics,
2433+
use_ctfidf=use_ctfidf,
24312434
custom_labels=custom_labels,
24322435
title=title,
24332436
width=width,

docs/changelog.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,33 @@ hide:
55

66
# Changelog
77

8+
9+
## **Version 0.16.3**
10+
*Release date: 22 July, 2024*
11+
12+
<h3><b>Highlights:</a></b></h3>
13+
14+
* Simplify zero-shot topic modeling by [@ianrandman](https://github.com/ianrandman) in [#2060](https://github.com/MaartenGr/BERTopic/pull/2060)
15+
* Option to choose between c-TF-IDF and Topic Embeddings in many functions by [@azikoss](https://github.com/azikoss) in [#1894](https://github.com/MaartenGr/BERTopic/pull/1894)
16+
* Use the `use_ctfidf` parameter in the following function to choose between c-TF-IDF and topic embeddings:
17+
* `hierarchical_topics`, `reduce_topics`, `visualize_hierarchy`, `visualize_heatmap`, `visualize_topics`
18+
* Linting with Ruff by [@afuetterer](https://github.com/afuetterer) in [#2033](https://github.com/MaartenGr/BERTopic/pull/2033)
19+
* Switch from setup.py to pyproject.toml by [@afuetterer](https://github.com/afuetterer) in [#1978](https://github.com/MaartenGr/BERTopic/pull/1978)
20+
* In multi-aspect context, allow Main model to be chained by [@ddicato](https://github.com/ddicato) in [#2002](https://github.com/MaartenGr/BERTopic/pull/2002)
21+
22+
<h3><b>Fixes:</a></b></h3>
23+
24+
* Added templates for [issues](https://github.com/MaartenGr/BERTopic/tree/master/.github/ISSUE_TEMPLATE) and [pull requests](https://github.com/MaartenGr/BERTopic/blob/master/.github/PULL_REQUEST_TEMPLATE.md)
25+
* Update River documentation example by [@Proteusiq](https://github.com/Proteusiq) in [#2004](https://github.com/MaartenGr/BERTopic/pull/2004)
26+
* Fix PartOfSpeech reproducibility by [@Greenpp](https://github.com/Greenpp) in [#1996](https://github.com/MaartenGr/BERTopic/pull/1996)
27+
* Fix PartOfSpeech ignoring first word by [@Greenpp](https://github.com/Greenpp) in [#2024](https://github.com/MaartenGr/BERTopic/pull/2024)
28+
* Make sklearn embedding backend auto-select more cautious by [@freddyheppell](https://github.com/freddyheppell) in [#1984](https://github.com/MaartenGr/BERTopic/pull/1984)
29+
* Fix typos by [@afuetterer](https://github.com/afuetterer) in [#1974](https://github.com/MaartenGr/BERTopic/pull/1974)
30+
* Fix hierarchical_topics(...) when the distances between three clusters are the same by [@azikoss](https://github.com/azikoss) in [#1929](https://github.com/MaartenGr/BERTopic/pull/1929)
31+
* Fixes to chain strategy example in outlier_reduction.md by [@reuning](https://github.com/reuning) in [#2065](https://github.com/MaartenGr/BERTopic/pull/2065)
32+
* Remove obsolete flake8 config and update line length by [@afuetterer](https://github.com/afuetterer) in [#22066](https://github.com/MaartenGr/BERTopic/pull/2066)
33+
34+
835
## **Version 0.16.2**
936
*Release date: 12 May, 2024*
1037

docs/getting_started/zeroshot/zeroshot.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,17 @@
11
Zero-shot Topic Modeling is a technique that allows you to find topics in large amounts of documents that were predefined. When faced with many documents, you often have an idea of which topics will definitely be in there. Whether that is a result of simply knowing your data or if a domain expert is involved in defining those topics.
22

33
This method allows you to not only find those specific topics but also create new topics for documents that would not fit with your predefined topics.
4-
This allows for extensive flexibility as there are three scenario's to explore.
4+
This allows for extensive flexibility as there are three scenario's to explore:
55

6-
First, both zero-shot topics and clustered topics were detected. This means that some documents would fit with the predefined topics where others would not. For the latter, new topics were found.
7-
8-
Second, only zero-shot topics were detected. Here, we would not need to find additional topics since all original documents were assigned to one of the predefined topics.
9-
10-
Third, no zero-shot topics were detected. This means that none of the documents would fit with the predefined topics and a regular BERTopic would be run.
6+
* First, both zero-shot topics and clustered topics were detected. This means that some documents would fit with the predefined topics where others would not. For the latter, new topics were found.
7+
* Second, only zero-shot topics were detected. Here, we would not need to find additional topics since all original documents were assigned to one of the predefined topics.
8+
* Third, no zero-shot topics were detected. This means that none of the documents would fit with the predefined topics and a regular BERTopic would be run.
119

1210
<div class="svg_image">
1311
--8<-- "docs/getting_started/zeroshot/zeroshot.svg"
1412
</div>
1513

16-
This method works as follows. First, we create a number of labels for our predefined topics and embed them using any embedding model. Then, we compare the embeddings of the documents with the predefined labels using cosine similarity. If they pass a user-defined threshold, the zero-shot topic is assigned to a document. If it does not, then that document, along with others, will be put through a regular BERTopic model.
17-
18-
This creates two models. One for the zero-shot topics and one for the non-zero-shot topics. We combine these two BERTopic models to create a single model that contains both zero-shot and non-zero-shot topics.
14+
This method works as follows. First, we create a number of labels for our predefined topics and embed them using any embedding model. Then, we compare the embeddings of the documents with the predefined labels using cosine similarity. If they pass a user-defined threshold, the zero-shot topic is assigned to a document. If it does not, then that document, along with others, will follow the regular BERTopic pipeline and attempt to find clusters that do not fit with the zero-shot topics.
1915

2016
### **Example**
2117
In order to use zero-shot BERTopic, we create a list of topics that we want to assign to our documents. However,

0 commit comments

Comments
 (0)