chat_topics / README.md
davanstrien's picture
davanstrien HF staff
add metadata
5163d0a
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification
license: mit
datasets:
  - OpenAssistant/oasst1
language:
  - en

chat_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/chat_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 75
  • Number of training documents: 63530
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 provide - using - information - sure - help 26 -1_provide_using_information_sure
0 openai - ai - chatgpt - assistant - language 7837 Generative AI
1 anytime - welcome - assistance - helpful - thank 1342 1_anytime_welcome_assistance_helpful
2 quantum - particle - physics - particles - relativity 778 Physics
3 story - lived - life - novel - felt 569 3_story_lived_life_novel
4 letter - sincerely - regards - email - dear 516 4_letter_sincerely_regards_email
5 rust - haskell - programming - java - languages 504 programming
6 css - html - style - div - js 494 web programming
7 linux - ubuntu - debian - fedora - install 440 7_linux_ubuntu_debian_fedora
8 recipe - bake - ingredients - baking - dough 425 8_recipe_bake_ingredients_baking
9 websocket - json - socket - api - discord 425 9_websocket_json_socket_api
10 communism - capitalism - marx - economic - economy 424 10_communism_capitalism_marx_economic
11 dog - pet - breed - breeds - pets 408 11_dog_pet_breed_breeds
12 philosophy - theological - philosophical - beliefs - consciousness 394 12_philosophy_theological_philosophical_beliefs
13 git - github - repository - software - commit 381 13_git_github_repository_software
14 music - songs - musical - lyrics - song 370 14_music_songs_musical_lyrics
15 devops - development - developers - industry - develop 323 15_devops_development_developers_industry
16 pythagorean - hypotenuse - triangle - math - sqrt 302 16_pythagorean_hypotenuse_triangle_math
17 eu - europe - economy - economic - war 291 17_eu_europe_economy_economic
18 sleep - asleep - bedtime - procrastination - depression 280 18_sleep_asleep_bedtime_procrastination
19 kramer - seinfeld - jerry - cafe - elaine 279 19_kramer_seinfeld_jerry_cafe
20 printing - prints - printer - print - printers 276 20_printing_prints_printer_print
21 influenza - flu - panic - symptoms - medical 251 21_influenza_flu_panic_symptoms
22 chess - chessboard - practice - strategy - learn 242 22_chess_chessboard_practice_strategy
23 algorithm - primes - array - integers - python 240 23_algorithm_primes_array_integers
24 youtube - viewers - media - google - streaming 240 24_youtube_viewers_media_google
25 poison - chemicals - powder - turpentine - smoke 226 25_poison_chemicals_powder_turpentine
26 monday - sunday - count_weekend_days - calendar - dates 216 26_monday_sunday_count_weekend_days_calendar
27 colors - colour - color - pigments - blue 208 27_colors_colour_color_pigments
28 roman - attila - rome - empire - warfare 205 28_roman_attila_rome_empire
29 investing - investments - investment - stocks - financial 204 29_investing_investments_investment_stocks
30 vocabulary - wordle - words - scrabble - word 201 30_vocabulary_wordle_words_scrabble
31 planets - sun - earth - planet - pluto 198 31_planets_sun_earth_planet
32 renewable - solar - electricity - energy - electrical 190 32_renewable_solar_electricity_energy
33 pygame - ball_radius - draw - circle - canvas 181 33_pygame_ball_radius_draw_circle
34 fishing - fish - boat - hiking - camping 176 34_fishing_fish_boat_hiking
35 gpus - gpu - motherboard - cpu - hardware 162 35_gpus_gpu_motherboard_cpu
36 hvac - remodeling - energy - kwh - housing 159 36_hvac_remodeling_energy_kwh
37 database - graphql - databases - postgresql - sql 159 37_database_graphql_databases_postgresql
38 informaci贸n - significado - c贸mo - como - sistemas 158 38_informaci贸n_significado_c贸mo_como
39 motherboard - pcie - gpu - bios - computer 153 39_motherboard_pcie_gpu_bios
40 crops - produce - planting - peppers - plants 148 40_crops_produce_planting_peppers
41 paintings - art - modernist - artists - modern 148 41_paintings_art_modernist_artists
42 workout - exercises - dumbbells - dumbbell - exercise 147 42_workout_exercises_dumbbells_dumbbell
43 climate - warming - pollution - environmental - emissions 142 43_climate_warming_pollution_environmental
44 coffee - espresso - brewing - tea - beans 137 44_coffee_espresso_brewing_tea
45 velocity - drag - acceleration - density - formula 132 45_velocity_drag_acceleration_density
46 woodchuck - woodchucks - units - kilogram - kilograms 130 46_woodchuck_woodchucks_units_kilogram
47 ascii - glyphs - hiragana - art - font 129 47_ascii_glyphs_hiragana_art
48 guitars - guitar - strings - guitarists - instrument 127 48_guitars_guitar_strings_guitarists
49 tallest - buildings - building - burj - khalifa 114 49_tallest_buildings_building_burj
50 flat - earth - curvature - spherical - tectonic 111 50_flat_earth_curvature_spherical
51 essay - awareness - understanding - being - be 102 51_essay_awareness_understanding_being
52 portals - ender - portal - obsidian - netherite 102 52_portals_ender_portal_obsidian
53 android - apple - phones - devices - vehicles 101 53_android_apple_phones_devices
54 fasting - dietary - diet - eating - metabolic 101 54_fasting_dietary_diet_eating
55 meditation - relief - pain - health - nociception 99 55_meditation_relief_pain_health
56 weather - forecast - forecasts - raining - precipitation 95 56_weather_forecast_forecasts_raining
57 president - presidents - presidency - constitution - biden 94 57_president_presidents_presidency_constitution
58 no - nope - yes - not - maybe 94 58_no_nope_yes_not
59 peregrine - airspeed - falcon - speed - bird 90 59_peregrine_airspeed_falcon_speed
60 crontab - cron - myscript - script - bash 83 60_crontab_cron_myscript_script
61 youtuber - streamer - ceo - musk - founder 83 61_youtuber_streamer_ceo_musk
62 layovers - flights - circumnavigate - layover - travel 83 62_layovers_flights_circumnavigate_layover
63 keyboards - keyboard - switches - qwerty - types 83 63_keyboards_keyboard_switches_qwerty
64 file_path_in_dir1 - file_path1 - csv_file - file_path_in_dir2 - file_path2 80 64_file_path_in_dir1_file_path1_csv_file_file_path_in_dir2
65 pele - maradona - lebron - ronaldo - nba 76 65_pele_maradona_lebron_ronaldo
66 alopecia - hairstyles - hairstyle - hair - scalp 66 66_alopecia_hairstyles_hairstyle_hair
67 nginx - docker - kubernetes - proxy_pass - nodeport 65 67_nginx_docker_kubernetes_proxy_pass
68 directories - directory - sudo - filesystem - folders 62 68_directories_directory_sudo_filesystem
69 gps - map - geocaching - maps - armenia 52 69_gps_map_geocaching_maps
70 meiosis - mitosis - fertilization - reproduction - ovulation 51 70_meiosis_mitosis_fertilization_reproduction
71 colleges - admissions - universities - campus - university 43 71_colleges_admissions_universities_campus
72 unicorns - unicorn - pony - ponies - mythical 32 72_unicorns_unicorn_pony_ponies
73 superpowers - abilities - superhero - superhuman - powers 28 73_superpowers_abilities_superhero_superhuman

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 20
  • n_gram_range: (1, 1)
  • nr_topics: 75
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11