KingKazma's picture
Add BERTopic model
dd89fe0
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# xsum_123_3000_1500_train
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 47
* Number of training documents: 3000
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | said - mr - police - people - would | 5 | -1_said_mr_police_people |
| 0 | win - game - half - foul - league | 1132 | 0_win_game_half_foul |
| 1 | eu - labour - party - would - uk | 591 | 1_eu_labour_party_would |
| 2 | athlete - sport - gold - olympic - medal | 149 | 2_athlete_sport_gold_olympic |
| 3 | nhs - health - care - patient - hospital | 104 | 3_nhs_health_care_patient |
| 4 | growth - price - market - sale - economy | 84 | 4_growth_price_market_sale |
| 5 | president - mr - government - maduro - rousseff | 71 | 5_president_mr_government_maduro |
| 6 | crash - police - hospital - road - driver | 58 | 6_crash_police_hospital_road |
| 7 | murray - match - set - tennis - seed | 46 | 7_murray_match_set_tennis |
| 8 | syrian - us - syria - rebel - force | 45 | 8_syrian_us_syria_rebel |
| 9 | school - education - pupil - schools - child | 41 | 9_school_education_pupil_schools |
| 10 | animal - zoo - wildlife - bird - specie | 40 | 10_animal_zoo_wildlife_bird |
| 11 | film - actor - star - series - drama | 38 | 11_film_actor_star_series |
| 12 | abuse - court - sexual - police - victim | 38 | 12_abuse_court_sexual_police |
| 13 | trump - mr - clinton - republican - president | 31 | 13_trump_mr_clinton_republican |
| 14 | fire - blaze - building - service - firefighters | 31 | 14_fire_blaze_building_service |
| 15 | suu - party - mr - government - election | 29 | 15_suu_party_mr_government |
| 16 | china - korea - chinese - south - north | 29 | 16_china_korea_chinese_south |
| 17 | album - band - song - music - best | 25 | 17_album_band_song_music |
| 18 | ms - heard - court - death - said | 24 | 18_ms_heard_court_death |
| 19 | wales - welsh - said - train - government | 23 | 19_wales_welsh_said_train |
| 20 | road - police - death - seen - found | 23 | 20_road_police_death_seen |
| 21 | passenger - crew - sea - boat - aircraft | 23 | 21_passenger_crew_sea_boat |
| 22 | russian - ukraine - russia - mr - ukrainian | 22 | 22_russian_ukraine_russia_mr |
| 23 | fight - joshua - title - khan - boxing | 22 | 23_fight_joshua_title_khan |
| 24 | samsung - phone - app - android - user | 20 | 24_samsung_phone_app_android |
| 25 | earthquake - particle - nepal - building - mars | 19 | 25_earthquake_particle_nepal_building |
| 26 | highways - traffic - dartford - council - road | 18 | 26_highways_traffic_dartford_council |
| 27 | vettel - hamilton - lap - race - alonso | 18 | 27_vettel_hamilton_lap_race |
| 28 | park - building - visitor - festival - visitscotland | 16 | 28_park_building_visitor_festival |
| 29 | site - council - street - project - plan | 15 | 29_site_council_street_project |
| 30 | abdeslam - paris - attack - belgian - salah | 15 | 30_abdeslam_paris_attack_belgian |
| 31 | virus - ebola - disease - hiv - sierra | 14 | 31_virus_ebola_disease_hiv |
| 32 | security - data - attack - cyber - malware | 14 | 32_security_data_attack_cyber |
| 33 | dog - dogs - stray - pet - owner | 14 | 33_dog_dogs_stray_pet |
| 34 | birdie - pga - bogey - woods - open | 13 | 34_birdie_pga_bogey_woods |
| 35 | man - police - wearing - incident - anyone | 13 | 35_man_police_wearing_incident |
| 36 | energy - pipeline - waste - renewables - electricity | 13 | 36_energy_pipeline_waste_renewables |
| 37 | silence - bishop - belfast - people - attended | 11 | 37_silence_bishop_belfast_people |
| 38 | painting - art - work - artist - exhibition | 11 | 38_painting_art_work_artist |
| 39 | eyre - gaunt - lyttle - peter - court | 10 | 39_eyre_gaunt_lyttle_peter |
| 40 | crime - police - force - constable - chief | 9 | 40_crime_police_force_constable |
| 41 | flood - river - rain - louisiana - flooded | 9 | 41_flood_river_rain_louisiana |
| 42 | charity - abuse - yentob - porn - batmanghelidjh | 7 | 42_charity_abuse_yentob_porn |
| 43 | india - nidar - gun - yrf - film | 6 | 43_india_nidar_gun_yrf |
| 44 | driving - stirling - winn - fraser - road | 6 | 44_driving_stirling_winn_fraser |
| 45 | boko - haram - shekau - militant - monguno | 5 | 45_boko_haram_shekau_militant |
</details>
## Training hyperparameters
* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False
## Framework versions
* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.57.1
* Plotly: 5.13.1
* Python: 3.10.12