pszemraj's picture
Update README.md
af1b289
metadata
tags:
  - bertopic
  - summcomparer
  - document_text
library_name: bertopic
pipeline_tag: text-classification
inference: false
license: apache-2.0
datasets:
  - pszemraj/summcomparer-gauntlet-v0p1
language:
  - en

BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

docs-in-topics

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("pszemraj/BERTopic-summcomparer-gauntlet-v0p1-sentence-t5-xl-document_text")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 16
  • Number of training documents: 630
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 convolutional - images - networks - superpixels - overfitting 12 -1_convolutional_images_networks_superpixels
0 bruno - guy - pdf - screentalk - he 26 0_bruno_guy_pdf_screentalk
1 elsa - arendelle - kristoff - frozen - anna 94 1_elsa_arendelle_kristoff_frozen
2 gillis - script - room - ll - artie 73 2_gillis_script_room_ll
3 interpretation - explanation - theory - structure - merge 72 3_interpretation_explanation_theory_structure
4 topics - topic - documents - corpus - document 63 4_topics_topic_documents_corpus
5 nemo - dory - chum - gill - fish 56 5_nemo_dory_chum_gill
6 films - film - identity - trauma - zinnemann 54 6_films_film_identity_trauma
7 computational - data - pathology - medical - informatics 47 7_computational_data_pathology_medical
8 images - captions - representations - embeddings - image 26 8_images_captions_representations_embeddings
9 zaroff - rainsford - hunt - hunting - general 24 9_zaroff_rainsford_hunt_hunting
10 cogvideo - interpolation - videos - coglm - frames 24 10_cogvideo_interpolation_videos_coglm
11 assignment - essays - questions - projects - students 17 11_assignment_essays_questions_projects
12 things - ll - some - lol - explain 16 12_things_ll_some_lol
13 videos - arxiv - visual - preprint - generative 13 13_videos_arxiv_visual_preprint
14 spectrograms - musecoder - melspectrogram - vocoding - spectrogram 13 14_spectrograms_musecoder_melspectrogram_vocoding

Training hyperparameters

  • calculate_probabilities: True
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11