KingKazma's picture
Add BERTopic model
46bea78
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

xsum_108_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_108_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 32
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - people - would - also 7 -1_said_mr_people_would
0 win - game - right - goal - shot 841 0_win_game_right_goal
1 police - said - court - mr - told 815 1_police_said_court_mr
2 party - labour - mr - election - vote 438 2_party_labour_mr_election
3 care - nhs - patient - health - cancer 111 3_care_nhs_patient_health
4 rate - bank - growth - market - price 77 4_rate_bank_growth_market
5 film - song - show - story - one 76 5_film_song_show_story
6 school - education - student - teacher - child 71 6_school_education_student_teacher
7 syria - syrian - said - killed - force 46 7_syria_syrian_said_killed
8 trump - mr - clinton - russian - campaign 45 8_trump_mr_clinton_russian
9 rescue - helicopter - ship - search - crew 37 9_rescue_helicopter_ship_search
10 google - apple - mobile - said - company 37 10_google_apple_mobile_said
11 fire - torch - building - burner - blaze 35 11_fire_torch_building_burner
12 museum - coin - art - museums - work 32 12_museum_coin_art_museums
13 rail - train - network - service - passenger 32 13_rail_train_network_service
14 energy - gas - coal - fracking - industry 26 14_energy_gas_coal_fracking
15 wales - welsh - assembly - uk - government 25 15_wales_welsh_assembly_uk
16 facebook - company - social - said - site 24 16_facebook_company_social_said
17 president - maduro - mr - macri - venezuelan 23 17_president_maduro_mr_macri
18 president - mr - crocodile - boko - haram 22 18_president_mr_crocodile_boko
19 union - strike - rmt - staff - said 21 19_union_strike_rmt_staff
20 earthquake - quake - kathmandu - people - nepal 20 20_earthquake_quake_kathmandu_people
21 migrant - asylum - le - pen - hungary 18 21_migrant_asylum_le_pen
22 virus - disease - health - ebola - malaria 18 22_virus_disease_health_ebola
23 cat - animal - rspca - dog - said 17 23_cat_animal_rspca_dog
24 species - forest - frog - specie - tree 16 24_species_forest_frog_specie
25 space - earth - surface - mars - mission 15 25_space_earth_surface_mars
26 site - council - centre - pool - plan 14 26_site_council_centre_pool
27 mr - gandhi - minister - indias - state 13 27_mr_gandhi_minister_indias
28 plaque - memorial - died - war - akikusa 12 28_plaque_memorial_died_war
29 korea - north - missile - china - us 8 29_korea_north_missile_china
30 tax - rate - 50p - budget - chancellor 8 30_tax_rate_50p_budget

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12