xsum_55555_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_55555_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 54
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - would - mr - year 6 -1_said_people_would_mr
0 party - eu - labour - vote - brexit 1465 0_party_eu_labour_vote
1 trump - mr - president - republican - russia 129 1_trump_mr_president_republican
2 care - health - nhs - patient - hospital 76 2_care_health_nhs_patient
3 syria - syrian - attack - killed - force 75 3_syria_syrian_attack_killed
4 cricket - wicket - england - test - ball 64 4_cricket_wicket_england_test
5 club - league - season - appearance - loan 59 5_club_league_season_appearance
6 wales - rugby - england - game - player 58 6_wales_rugby_england_game
7 film - show - actor - actress - star 55 7_film_show_actor_actress
8 medal - sport - olympic - gold - world 54 8_medal_sport_olympic_gold
9 driving - driver - crash - car - road 48 9_driving_driver_crash_car
10 chelsea - arsenal - city - goal - tottenham 44 10_chelsea_arsenal_city_goal
11 president - mr - petrobras - odebrecht - government 43 11_president_mr_petrobras_odebrecht
12 lifeboat - sea - rnli - ship - boat 41 12_lifeboat_sea_rnli_ship
13 crime - police - child - force - abuse 37 13_crime_police_child_force
14 man - police - men - wearing - arrested 35 14_man_police_men_wearing
15 murray - seed - match - slam - set 34 15_murray_seed_match_slam
16 dog - mountain - animal - avalanche - said 34 16_dog_mountain_animal_avalanche
17 court - sexual - assault - trial - woman 31 17_court_sexual_assault_trial
18 school - education - teacher - academy - pupil 30 18_school_education_teacher_academy
19 fifa - ghana - burkina - african - cup 29 19_fifa_ghana_burkina_african
20 music - album - song - like - im 28 20_music_album_song_like
21 fire - blaze - rescue - said - building 28 21_fire_blaze_rescue_said
22 energy - gas - shale - project - power 27 22_energy_gas_shale_project
23 train - rail - bridge - scotrail - strike 27 23_train_rail_bridge_scotrail
24 growth - rate - oil - market - us 26 24_growth_rate_oil_market
25 town - foul - box - footed - half 26 25_town_foul_box_footed
26 open - round - golf - par - birdie 26 26_open_round_golf_par
27 china - north - chinese - xi - taiwan 22 27_china_north_chinese_xi
28 bond - bank - greek - greece - eurozone 22 28_bond_bank_greek_greece
29 race - lap - second - honda - driver 21 29_race_lap_second_honda
30 president - mr - congolese - africa - african 21 30_president_mr_congolese_africa
31 barcelona - fc - madrid - de - bayern 19 31_barcelona_fc_madrid_de
32 murder - man - postmortem - court - found 18 32_murder_man_postmortem_court
33 welsh - wales - government - assembly - labour 17 33_welsh_wales_government_assembly
34 celtic - game - season - rangers - team 17 34_celtic_game_season_rangers
35 heritage - castle - house - orkney - building 17 35_heritage_castle_house_orkney
36 tax - deficit - debt - economy - financial 16 36_tax_deficit_debt_economy
37 stream - jet - weather - wind - flood 15 37_stream_jet_weather_wind
38 software - security - data - hacker - router 15 38_software_security_data_hacker
39 painting - portrait - art - collection - artist 14 39_painting_portrait_art_collection
40 apple - tablet - hp - firm - android 14 40_apple_tablet_hp_firm
41 robertson - mr - court - knife - murder 12 41_robertson_mr_court_knife
42 unsupported - device - updated - playback - media 12 42_unsupported_device_updated_playback
43 iaaf - doping - athlete - athletics - antidoping 11 43_iaaf_doping_athlete_athletics
44 stolen - theft - burglary - thief - store 11 44_stolen_theft_burglary_thief
45 yn - ar - mae - bod - ei 11 45_yn_ar_mae_bod
46 flight - plane - airport - aircraft - passenger 11 46_flight_plane_airport_aircraft
47 baby - child - infant - mcelhinney - church 10 47_baby_child_infant_mcelhinney
48 party - fillon - mr - socialist - macron 10 48_party_fillon_mr_socialist
49 serbia - scotland - celtic - throwin - kick 9 49_serbia_scotland_celtic_throwin
50 child - childcare - families - mental - nurse 8 50_child_childcare_families_mental
51 turkey - migrant - eu - visa - greece 6 51_turkey_migrant_eu_visa
52 supermarket - store - price - sale - tyrrells 6 52_supermarket_store_price_sale

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.