ShivamSrng commited on
Commit
ce458e5
·
verified ·
1 Parent(s): fdc2421

Fine-tuned Topic Model for aspects_to_improve column

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ctfidf_config.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # before_covid_face_to_face_aspects_to_improve
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("ShivamSrng/before_covid_face_to_face_aspects_to_improve")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 46
34
+ * Number of training documents: 143043
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | 0 | lecture - lectures - classes - class time - teaching | 116120 | 0_lecture_lectures_classes_class time |
42
+ | 1 | improvements needed - improvements needs - improvement needs - improvements needs improved - improvement needs improved | 1568 | 1_improvements needed_improvements needs_improvement needs_improvements needs improved |
43
+ | 2 | feel - entire week model feels - week model feels - magnitude plot - character | 1279 | 2_feel_entire week model feels_week model feels_magnitude plot |
44
+ | 3 | needs addressed - executive team needs - understand situation covid - think unfair - understand situation | 919 | 3_needs addressed_executive team needs_understand situation covid_think unfair |
45
+ | 4 | way material presented - way material presented material - material way material presented - material presented material - material better explanation material | 913 | 4_way material presented_way material presented material_material way material presented_material presented material |
46
+ | 5 | research roadmap - research roadmaps - topics - topic - engineering ethics | 881 | 5_research roadmap_research roadmaps_topics_topic |
47
+ | 6 | ta teaches - teach ta - ta students - tas teaching - teacher ta | 868 | 6_ta teaches_teach ta_ta students_tas teaching |
48
+ | 7 | schedule timing - office hours - mandatory office hours - scheduling - time slots | 846 | 7_schedule timing_office hours_mandatory office hours_scheduling |
49
+ | 8 | group work groups - groups group work - group work difficult - group work class - group work | 793 | 8_group work groups_groups group work_group work difficult_group work class |
50
+ | 9 | software - software home - program - webex - access | 773 | 9_software_software home_program_webex |
51
+ | 10 | teacher - rude students - students class - attitude students - teaching | 769 | 10_teacher_rude students_students class_attitude students |
52
+ | 11 | technical issues - program work - software - work properly - working properly | 744 | 11_technical issues_program work_software_work properly |
53
+ | 12 | teaching write essays - essays writing - class writing - writing assignments - writing class | 713 | 12_teaching write essays_essays writing_class writing_writing assignments |
54
+ | 13 | rutgers students - best directed educational platform - students - problems chapter worthless study - tried best directed educational | 712 | 13_rutgers students_best directed educational platform_students_problems chapter worthless study |
55
+ | 14 | moodle students - moodle syllabus - moodle class - moodle homeworks - moodle assignments | 701 | 14_moodle students_moodle syllabus_moodle class_moodle homeworks |
56
+ | 15 | cad software lab - cad software lab introduced - cad class - engineering drawings - cad drawings | 674 | 15_cad software lab_cad software lab introduced_cad class_engineering drawings |
57
+ | 16 | taking notes class - class notes - lecture notes - read notes - notes class | 670 | 16_taking notes class_class notes_lecture notes_read notes |
58
+ | 17 | business major - design majors - internships - help careers - digital design majors | 662 | 17_business major_design majors_internships_help careers |
59
+ | 18 | enjoyed class - structure taught professor interesting - structure taught professor - everything class poorly - best classes taken | 647 | 18_enjoyed class_structure taught professor interesting_structure taught professor_everything class poorly |
60
+ | 19 | students points - student mistake - points students - mistake points - points mistake | 582 | 19_students points_student mistake_points students_mistake points |
61
+ | 20 | work load needs - work difficult - work overwhelming - work load - workload work | 558 | 20_work load needs_work difficult_work overwhelming_work load |
62
+ | 21 | final research paper - research paper research - semester research paper - paper research - paper research paper | 550 | 21_final research paper_research paper research_semester research paper_paper research |
63
+ | 22 | plant tours - plant tour - class field trips - trips campus - field trips construction | 549 | 22_plant tours_plant tour_class field trips_trips campus |
64
+ | 23 | enjoyed aspects - tad difficult different critics - difficult different critics - content felt - different critics | 542 | 23_enjoyed aspects_tad difficult different critics_difficult different critics_content felt |
65
+ | 24 | chapters covered class - class chapters - chapters spend time - learning chapters - covering chapters | 526 | 24_chapters covered class_class chapters_chapters spend time_learning chapters |
66
+ | 25 | assigned readings - readings readings difficult understand - reading assignments - readings class readings - assigning readings | 511 | 25_assigned readings_readings readings difficult understand_reading assignments_readings class readings |
67
+ | 26 | equipment outdated - outdated equipment - replaced equipment - equipment replaced - broken equipment | 510 | 26_equipment outdated_outdated equipment_replaced equipment_equipment replaced |
68
+ | 27 | wastes time - useful waste time - waste time money - massive waste time - total waste time | 507 | 27_wastes time_useful waste time_waste time money_massive waste time |
69
+ | 28 | think everything great - everything great think - everything great - everything great everything - everything great think everything | 500 | 28_think everything great_everything great think_everything great_everything great everything |
70
+ | 29 | modern films - documentaries - certain films - movies shown - films | 485 | 29_modern films_documentaries_certain films_movies shown |
71
+ | 30 | dbms - advanced database - learning sql - database software - database systems | 483 | 30_dbms_advanced database_learning sql_database software |
72
+ | 31 | studio courses - schools understanding collaborative studio - studio classes - studio experience - understand studio | 477 | 31_studio courses_schools understanding collaborative studio_studio classes_studio experience |
73
+ | 32 | pace little slower - bit slower pace - slightly slower pace - pace bit slower - slower pace | 474 | 32_pace little slower_bit slower pace_slightly slower pace_pace bit slower |
74
+ | 33 | recitation extra work - recitation work - recitation useless - recitation class - recitation pointless | 440 | 33_recitation extra work_recitation work_recitation useless_recitation class |
75
+ | 34 | everything - everything aspects - complaints - everything great - everything say | 435 | 34_everything_everything aspects_complaints_everything great |
76
+ | 35 | feels awkward online opposed - feels awkward online - awkward online opposed - possible feels awkward online - transition possible feels awkward | 435 | 35_feels awkward online opposed_feels awkward online_awkward online opposed_possible feels awkward online |
77
+ | 36 | teach matlab - matlab students - teaching matlab - students matlab - matlab taught | 428 | 36_teach matlab_matlab students_teaching matlab_students matlab |
78
+ | 37 | difficult mentally exhausting - excessively difficult - learning curve - difficult annoying - easy difficult | 406 | 37_difficult mentally exhausting_excessively difficult_learning curve_difficult annoying |
79
+ | 38 | practice matlab - learn matlab - matlab using - matlabs - instead matlab | 382 | 38_practice matlab_learn matlab_matlab using_matlabs |
80
+ | 39 | way think way great - think way great - think way way great - way think great - perfectly way | 378 | 39_way think way great_think way great_think way way great_way think great |
81
+ | 40 | class recitations - recitation class - recitation needs - recitation classes - class recitation | 361 | 40_class recitations_recitation class_recitation needs_recitation classes |
82
+ | 41 | microphones said argument - feel microphones said argument - microphones said argument help - participants microphone - professor microphone | 343 | 41_microphones said argument_feel microphones said argument_microphones said argument help_participants microphone |
83
+ | 42 | commons difficult - commons unnecessarily difficult - commons unnecessarily - study commons - grading commons unfair | 310 | 42_commons difficult_commons unnecessarily difficult_commons unnecessarily_study commons |
84
+ | 43 | suggestions - think think suggestions - think suggestions - think think think suggestions - think suggestions time think | 264 | 43_suggestions_think think suggestions_think suggestions_think think think suggestions |
85
+ | 44 | think list far - think list - think think list far - presently - think presently | 209 | 44_think list far_think list_think think list far_presently |
86
+ | 45 | personally change - change personally - honest want changed - change everything - change personally change | 146 | 45_personally change_change personally_honest want changed_change everything |
87
+
88
+ </details>
89
+
90
+ ## Training hyperparameters
91
+
92
+ * calculate_probabilities: False
93
+ * language: None
94
+ * low_memory: False
95
+ * min_topic_size: 10
96
+ * n_gram_range: (1, 1)
97
+ * nr_topics: auto
98
+ * seed_topic_list: None
99
+ * top_n_words: 7
100
+ * verbose: True
101
+ * zeroshot_min_similarity: 0.7
102
+ * zeroshot_topic_list: None
103
+
104
+ ## Framework versions
105
+
106
+ * Numpy: 1.26.4
107
+ * HDBSCAN: 0.8.39
108
+ * UMAP: 0.5.7
109
+ * Pandas: 2.2.3
110
+ * Scikit-Learn: 1.5.2
111
+ * Sentence-transformers: 3.2.1
112
+ * Transformers: 4.46.2
113
+ * Numba: 0.60.0
114
+ * Plotly: 5.24.1
115
+ * Python: 3.10.11
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": "auto",
11
+ "seed_topic_list": null,
12
+ "top_n_words": 7,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cac49b9beb6bd36445c4fa183e7b5f21e37a3d5868564efa0c9d452a1c2d275
3
+ size 40845856
ctfidf_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70881ce0dc2751c5416fafbba9c333715fd5821a125bdcf7210978bcbcd4cb8d
3
+ size 85806124
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:decf7cb62e47ae2fc0f4d5eacc454f4b028318f64e46ae3204814fcf8c2d166b
3
+ size 141400
topics.json ADDED
The diff for this file is too large to render. See raw diff