--- tags: - bertopic library_name: bertopic pipeline_tag: text-classification --- # cnn_dailymail_108_50000_25000_validation This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. ## Usage To use this model, please install BERTopic: ``` pip install -U bertopic ``` You can use the model as follows: ```python from bertopic import BERTopic topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_50000_25000_validation") topic_model.get_topic_info() ``` ## Topic overview * Number of topics: 92 * Number of training documents: 13368
Click here for an overview of all topics. | Topic ID | Topic Keywords | Topic Frequency | Label | |----------|----------------|-----------------|-------| | -1 | said - police - one - year - also | 5 | -1_said_police_one_year | | 0 | league - game - player - goal - season | 4918 | 0_league_game_player_goal | | 1 | isis - syria - islamic - group - iraq | 2700 | 1_isis_syria_islamic_group | | 2 | dog - animal - elephant - bear - cat | 415 | 2_dog_animal_elephant_bear | | 3 | labour - mr - party - election - cameron | 386 | 3_labour_mr_party_election | | 4 | flight - plane - aircraft - pilot - crash | 340 | 4_flight_plane_aircraft_pilot | | 5 | hair - fashion - dress - look - model | 248 | 5_hair_fashion_dress_look | | 6 | car - driver - driving - road - police | 227 | 6_car_driver_driving_road | | 7 | food - cent - sugar - health - per | 221 | 7_food_cent_sugar_health | | 8 | police - officer - shooting - shot - said | 215 | 8_police_officer_shooting_shot | | 9 | clinton - email - obama - president - state | 213 | 9_clinton_email_obama_president | | 10 | cricket - england - cup - world - zealand | 191 | 10_cricket_england_cup_world | | 11 | property - house - home - room - price | 184 | 11_property_house_home_room | | 12 | fight - pacquiao - mayweather - manny - floyd | 171 | 12_fight_pacquiao_mayweather_manny | | 13 | hamilton - mercedes - race - prix - rosberg | 135 | 13_hamilton_mercedes_race_prix | | 14 | baby - hospital - birth - mother - child | 127 | 14_baby_hospital_birth_mother | | 15 | murray - wells - tennis - andy - match | 127 | 15_murray_wells_tennis_andy | | 16 | eclipse - earth - solar - sun - planet | 102 | 16_eclipse_earth_solar_sun | | 17 | police - abuse - sex - sexual - child | 98 | 17_police_abuse_sex_sexual | | 18 | apple - watch - device - user - google | 96 | 18_apple_watch_device_user | | 19 | netanyahu - iran - nuclear - israel - israeli | 83 | 19_netanyahu_iran_nuclear_israel | | 20 | putin - russian - nemtsov - moscow - russia | 82 | 20_putin_russian_nemtsov_moscow | | 21 | weight - fat - diet - size - stone | 81 | 21_weight_fat_diet_size | | 22 | race - armstrong - doping - world - tour | 78 | 22_race_armstrong_doping_world | | 23 | court - fraud - money - bank - mr | 76 | 23_court_fraud_money_bank | | 24 | cheltenham - hurdle - horse - race - jockey | 74 | 24_cheltenham_hurdle_horse_race | | 25 | mcilroy - round - masters - woods - golf | 72 | 25_mcilroy_round_masters_woods | | 26 | prince - charles - royal - duchess - camilla | 72 | 26_prince_charles_royal_duchess | | 27 | fraternity - university - sae - chapter - oklahoma | 68 | 27_fraternity_university_sae_chapter | | 28 | chan - sukumaran - bali - indonesian - mack | 65 | 28_chan_sukumaran_bali_indonesian | | 29 | ebola - sierra - virus - leone - disease | 64 | 29_ebola_sierra_virus_leone | | 30 | school - teacher - student - girl - sexual | 58 | 30_school_teacher_student_girl | | 31 | fire - building - explosion - blaze - firefighter | 52 | 31_fire_building_explosion_blaze | | 32 | nfl - borland - football - 49ers - season | 52 | 32_nfl_borland_football_49ers | | 33 | clarkson - bbc - gear - top - jeremy | 50 | 33_clarkson_bbc_gear_top | | 34 | ski - skier - mountain - avalanche - rock | 47 | 34_ski_skier_mountain_avalanche | | 35 | patient - nhs - ae - cancer - hospital | 46 | 35_patient_nhs_ae_cancer | | 36 | india - rape - documentary - indian - singh | 45 | 36_india_rape_documentary_indian | | 37 | mr - death - court - emery - miss | 43 | 37_mr_death_court_emery | | 38 | show - corden - host - stewart - williams | 42 | 38_show_corden_host_stewart | | 39 | car - vehicle - electric - cars - tesla | 40 | 39_car_vehicle_electric_cars | | 40 | school - child - education - porn - sex | 38 | 40_school_child_education_porn | | 41 | boko - haram - nigeria - nigerian - nigerias | 37 | 41_boko_haram_nigeria_nigerian | | 42 | marijuana - drug - cannabis - colorado - lsd | 34 | 42_marijuana_drug_cannabis_colorado | | 43 | law - indiana - gay - marriage - religious | 33 | 43_law_indiana_gay_marriage | | 44 | ferguson - department - police - justice - report | 32 | 44_ferguson_department_police_justice | | 45 | image - photographer - photography - photograph - photo | 31 | 45_image_photographer_photography_photograph | | 46 | snow - inch - winter - ice - storm | 30 | 46_snow_inch_winter_ice | | 47 | basketball - ncaa - coach - tournament - game | 30 | 47_basketball_ncaa_coach_tournament | | 48 | tsarnaev - boston - dzhokhar - tamerlan - tsarnaevs | 30 | 48_tsarnaev_boston_dzhokhar_tamerlan | | 49 | durst - dursts - berman - orleans - robert | 29 | 49_durst_dursts_berman_orleans | | 50 | jesus - ancient - stone - cave - circle | 29 | 50_jesus_ancient_stone_cave | | 51 | zayn - band - direction - singer - dance | 29 | 51_zayn_band_direction_singer | | 52 | film - movie - vivian - hollywood - script | 23 | 52_film_movie_vivian_hollywood | | 53 | korean - korea - kim - north - lippert | 23 | 53_korean_korea_kim_north | | 54 | weather - rain - temperature - snow - today | 23 | 54_weather_rain_temperature_snow | | 55 | robbery - woodger - store - cash - police | 22 | 55_robbery_woodger_store_cash | | 56 | parade - patricks - st - irish - green | 21 | 56_parade_patricks_st_irish | | 57 | secret - clancy - service - agent - white | 20 | 57_secret_clancy_service_agent | | 58 | hernandez - lloyd - jenkins - hernandezs - lloyds | 20 | 58_hernandez_lloyd_jenkins_hernandezs | | 59 | nazi - anne - nazis - war - camp | 20 | 59_nazi_anne_nazis_war | | 60 | snowden - intelligence - gchq - security - agency | 18 | 60_snowden_intelligence_gchq_security | | 61 | huang - chinese - china - mingxi - chen | 17 | 61_huang_chinese_china_mingxi | | 62 | wedding - married - marlee - platt - woodyard | 17 | 62_wedding_married_marlee_platt | | 63 | drug - cocaine - jailed - cannabis - tobacco | 17 | 63_drug_cocaine_jailed_cannabis | | 64 | cnn - transcript - student - news - roll | 17 | 64_cnn_transcript_student_news | | 65 | pope - francis - vatican - naples - pontiff | 17 | 65_pope_francis_vatican_naples | | 66 | richard - iii - leicester - king - iiis | 17 | 66_richard_iii_leicester_king | | 67 | chinese - tourist - temple - thailand - buddhist | 16 | 67_chinese_tourist_temple_thailand | | 68 | china - chinese - internet - chai - stopera | 16 | 68_china_chinese_internet_chai | | 69 | execution - lethal - gissendaner - injection - drug | 16 | 69_execution_lethal_gissendaner_injection | | 70 | woman - marriage - men - attractive - chalmers | 15 | 70_woman_marriage_men_attractive | | 71 | vanuatu - cyclone - vila - port - pam | 15 | 71_vanuatu_cyclone_vila_port | | 72 | poldark - turner - demelza - aidan - drama | 15 | 72_poldark_turner_demelza_aidan | | 73 | point - rebound - scored - points - harden | 14 | 73_point_rebound_scored_points | | 74 | rail - calais - parking - migrant - dickens | 13 | 74_rail_calais_parking_migrant | | 75 | johnson - student - virginia - charlottesville - uva | 13 | 75_johnson_student_virginia_charlottesville | | 76 | cuba - havana - cuban - rousseff - us | 13 | 76_cuba_havana_cuban_rousseff | | 77 | paris - attack - synagogue - hebdo - charlie | 13 | 77_paris_attack_synagogue_hebdo | | 78 | duckenfield - mr - gate - hillsborough - disaster | 12 | 78_duckenfield_mr_gate_hillsborough | | 79 | gordon - bobbi - kristina - phil - dr | 12 | 79_gordon_bobbi_kristina_phil | | 80 | knox - sollecito - kercher - raffaele - amanda | 12 | 80_knox_sollecito_kercher_raffaele | | 81 | coin - medal - war - auction - cross | 12 | 81_coin_medal_war_auction | | 82 | starbucks - schultz - race - racial - campaign | 12 | 82_starbucks_schultz_race_racial | | 83 | cosby - cosbys - thompson - bill - welles | 11 | 83_cosby_cosbys_thompson_bill | | 84 | jeffs - flds - rivette - compound - speer | 10 | 84_jeffs_flds_rivette_compound | | 85 | selma - alabama - march - bridge - civil | 8 | 85_selma_alabama_march_bridge | | 86 | jobs - naomi - fortune - redballoon - bn | 8 | 86_jobs_naomi_fortune_redballoon | | 87 | brain - object - retina - neuron - word | 8 | 87_brain_object_retina_neuron | | 88 | netflix - tv - content - streaming - screen | 8 | 88_netflix_tv_content_streaming | | 89 | social - user - tweet - twitter - tool | 7 | 89_social_user_tweet_twitter | | 90 | cunard - bird - darshan - ship - liner | 6 | 90_cunard_bird_darshan_ship |
## Training hyperparameters * calculate_probabilities: True * language: english * low_memory: False * min_topic_size: 10 * n_gram_range: (1, 1) * nr_topics: None * seed_topic_list: None * top_n_words: 10 * verbose: False ## Framework versions * Numpy: 1.22.4 * HDBSCAN: 0.8.33 * UMAP: 0.5.3 * Pandas: 1.5.3 * Scikit-Learn: 1.2.2 * Sentence-transformers: 2.2.2 * Transformers: 4.31.0 * Numba: 0.57.1 * Plotly: 5.13.1 * Python: 3.10.12