pszemraj
/

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text

Text Classification

Model card Files Files and versions Community

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text / README.md

pszemraj's picture

Update README.md

af1b289 over 1 year ago

|

3.16 kB

	---
	tags:
	- bertopic
	- summcomparer
	- document_text
	library_name: bertopic
	pipeline_tag: text-classification
	inference: false
	license: apache-2.0
	datasets:
	- pszemraj/summcomparer-gauntlet-v0p1
	language:
	- en
	---

	# BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	![docs-in-topics](https://i.imgur.com/5SzC0mt.png)

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 16
	* Number of training documents: 630

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| convolutional - images - networks - superpixels - overfitting \| 12 \| -1_convolutional_images_networks_superpixels \|
	\| 0 \| bruno - guy - pdf - screentalk - he \| 26 \| 0_bruno_guy_pdf_screentalk \|
	\| 1 \| elsa - arendelle - kristoff - frozen - anna \| 94 \| 1_elsa_arendelle_kristoff_frozen \|
	\| 2 \| gillis - script - room - ll - artie \| 73 \| 2_gillis_script_room_ll \|
	\| 3 \| interpretation - explanation - theory - structure - merge \| 72 \| 3_interpretation_explanation_theory_structure \|
	\| 4 \| topics - topic - documents - corpus - document \| 63 \| 4_topics_topic_documents_corpus \|
	\| 5 \| nemo - dory - chum - gill - fish \| 56 \| 5_nemo_dory_chum_gill \|
	\| 6 \| films - film - identity - trauma - zinnemann \| 54 \| 6_films_film_identity_trauma \|
	\| 7 \| computational - data - pathology - medical - informatics \| 47 \| 7_computational_data_pathology_medical \|
	\| 8 \| images - captions - representations - embeddings - image \| 26 \| 8_images_captions_representations_embeddings \|
	\| 9 \| zaroff - rainsford - hunt - hunting - general \| 24 \| 9_zaroff_rainsford_hunt_hunting \|
	\| 10 \| cogvideo - interpolation - videos - coglm - frames \| 24 \| 10_cogvideo_interpolation_videos_coglm \|
	\| 11 \| assignment - essays - questions - projects - students \| 17 \| 11_assignment_essays_questions_projects \|
	\| 12 \| things - ll - some - lol - explain \| 16 \| 12_things_ll_some_lol \|
	\| 13 \| videos - arxiv - visual - preprint - generative \| 13 \| 13_videos_arxiv_visual_preprint \|
	\| 14 \| spectrograms - musecoder - melspectrogram - vocoding - spectrogram \| 13 \| 14_spectrograms_musecoder_melspectrogram_vocoding \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: True
	* language: None
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: None
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.22.4
	* HDBSCAN: 0.8.29
	* UMAP: 0.5.3
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.29.2
	* Numba: 0.56.4
	* Plotly: 5.13.1
	* Python: 3.10.11