Fine-tuned Topic Model for aspects_to_improve column
Browse files- .gitattributes +1 -0
- README.md +115 -0
- config.json +16 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +3 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ctfidf_config.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# before_covid_face_to_face_aspects_to_improve
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("ShivamSrng/before_covid_face_to_face_aspects_to_improve")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 46
|
34 |
+
* Number of training documents: 143043
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| 0 | lecture - lectures - classes - class time - teaching | 116120 | 0_lecture_lectures_classes_class time |
|
42 |
+
| 1 | improvements needed - improvements needs - improvement needs - improvements needs improved - improvement needs improved | 1568 | 1_improvements needed_improvements needs_improvement needs_improvements needs improved |
|
43 |
+
| 2 | feel - entire week model feels - week model feels - magnitude plot - character | 1279 | 2_feel_entire week model feels_week model feels_magnitude plot |
|
44 |
+
| 3 | needs addressed - executive team needs - understand situation covid - think unfair - understand situation | 919 | 3_needs addressed_executive team needs_understand situation covid_think unfair |
|
45 |
+
| 4 | way material presented - way material presented material - material way material presented - material presented material - material better explanation material | 913 | 4_way material presented_way material presented material_material way material presented_material presented material |
|
46 |
+
| 5 | research roadmap - research roadmaps - topics - topic - engineering ethics | 881 | 5_research roadmap_research roadmaps_topics_topic |
|
47 |
+
| 6 | ta teaches - teach ta - ta students - tas teaching - teacher ta | 868 | 6_ta teaches_teach ta_ta students_tas teaching |
|
48 |
+
| 7 | schedule timing - office hours - mandatory office hours - scheduling - time slots | 846 | 7_schedule timing_office hours_mandatory office hours_scheduling |
|
49 |
+
| 8 | group work groups - groups group work - group work difficult - group work class - group work | 793 | 8_group work groups_groups group work_group work difficult_group work class |
|
50 |
+
| 9 | software - software home - program - webex - access | 773 | 9_software_software home_program_webex |
|
51 |
+
| 10 | teacher - rude students - students class - attitude students - teaching | 769 | 10_teacher_rude students_students class_attitude students |
|
52 |
+
| 11 | technical issues - program work - software - work properly - working properly | 744 | 11_technical issues_program work_software_work properly |
|
53 |
+
| 12 | teaching write essays - essays writing - class writing - writing assignments - writing class | 713 | 12_teaching write essays_essays writing_class writing_writing assignments |
|
54 |
+
| 13 | rutgers students - best directed educational platform - students - problems chapter worthless study - tried best directed educational | 712 | 13_rutgers students_best directed educational platform_students_problems chapter worthless study |
|
55 |
+
| 14 | moodle students - moodle syllabus - moodle class - moodle homeworks - moodle assignments | 701 | 14_moodle students_moodle syllabus_moodle class_moodle homeworks |
|
56 |
+
| 15 | cad software lab - cad software lab introduced - cad class - engineering drawings - cad drawings | 674 | 15_cad software lab_cad software lab introduced_cad class_engineering drawings |
|
57 |
+
| 16 | taking notes class - class notes - lecture notes - read notes - notes class | 670 | 16_taking notes class_class notes_lecture notes_read notes |
|
58 |
+
| 17 | business major - design majors - internships - help careers - digital design majors | 662 | 17_business major_design majors_internships_help careers |
|
59 |
+
| 18 | enjoyed class - structure taught professor interesting - structure taught professor - everything class poorly - best classes taken | 647 | 18_enjoyed class_structure taught professor interesting_structure taught professor_everything class poorly |
|
60 |
+
| 19 | students points - student mistake - points students - mistake points - points mistake | 582 | 19_students points_student mistake_points students_mistake points |
|
61 |
+
| 20 | work load needs - work difficult - work overwhelming - work load - workload work | 558 | 20_work load needs_work difficult_work overwhelming_work load |
|
62 |
+
| 21 | final research paper - research paper research - semester research paper - paper research - paper research paper | 550 | 21_final research paper_research paper research_semester research paper_paper research |
|
63 |
+
| 22 | plant tours - plant tour - class field trips - trips campus - field trips construction | 549 | 22_plant tours_plant tour_class field trips_trips campus |
|
64 |
+
| 23 | enjoyed aspects - tad difficult different critics - difficult different critics - content felt - different critics | 542 | 23_enjoyed aspects_tad difficult different critics_difficult different critics_content felt |
|
65 |
+
| 24 | chapters covered class - class chapters - chapters spend time - learning chapters - covering chapters | 526 | 24_chapters covered class_class chapters_chapters spend time_learning chapters |
|
66 |
+
| 25 | assigned readings - readings readings difficult understand - reading assignments - readings class readings - assigning readings | 511 | 25_assigned readings_readings readings difficult understand_reading assignments_readings class readings |
|
67 |
+
| 26 | equipment outdated - outdated equipment - replaced equipment - equipment replaced - broken equipment | 510 | 26_equipment outdated_outdated equipment_replaced equipment_equipment replaced |
|
68 |
+
| 27 | wastes time - useful waste time - waste time money - massive waste time - total waste time | 507 | 27_wastes time_useful waste time_waste time money_massive waste time |
|
69 |
+
| 28 | think everything great - everything great think - everything great - everything great everything - everything great think everything | 500 | 28_think everything great_everything great think_everything great_everything great everything |
|
70 |
+
| 29 | modern films - documentaries - certain films - movies shown - films | 485 | 29_modern films_documentaries_certain films_movies shown |
|
71 |
+
| 30 | dbms - advanced database - learning sql - database software - database systems | 483 | 30_dbms_advanced database_learning sql_database software |
|
72 |
+
| 31 | studio courses - schools understanding collaborative studio - studio classes - studio experience - understand studio | 477 | 31_studio courses_schools understanding collaborative studio_studio classes_studio experience |
|
73 |
+
| 32 | pace little slower - bit slower pace - slightly slower pace - pace bit slower - slower pace | 474 | 32_pace little slower_bit slower pace_slightly slower pace_pace bit slower |
|
74 |
+
| 33 | recitation extra work - recitation work - recitation useless - recitation class - recitation pointless | 440 | 33_recitation extra work_recitation work_recitation useless_recitation class |
|
75 |
+
| 34 | everything - everything aspects - complaints - everything great - everything say | 435 | 34_everything_everything aspects_complaints_everything great |
|
76 |
+
| 35 | feels awkward online opposed - feels awkward online - awkward online opposed - possible feels awkward online - transition possible feels awkward | 435 | 35_feels awkward online opposed_feels awkward online_awkward online opposed_possible feels awkward online |
|
77 |
+
| 36 | teach matlab - matlab students - teaching matlab - students matlab - matlab taught | 428 | 36_teach matlab_matlab students_teaching matlab_students matlab |
|
78 |
+
| 37 | difficult mentally exhausting - excessively difficult - learning curve - difficult annoying - easy difficult | 406 | 37_difficult mentally exhausting_excessively difficult_learning curve_difficult annoying |
|
79 |
+
| 38 | practice matlab - learn matlab - matlab using - matlabs - instead matlab | 382 | 38_practice matlab_learn matlab_matlab using_matlabs |
|
80 |
+
| 39 | way think way great - think way great - think way way great - way think great - perfectly way | 378 | 39_way think way great_think way great_think way way great_way think great |
|
81 |
+
| 40 | class recitations - recitation class - recitation needs - recitation classes - class recitation | 361 | 40_class recitations_recitation class_recitation needs_recitation classes |
|
82 |
+
| 41 | microphones said argument - feel microphones said argument - microphones said argument help - participants microphone - professor microphone | 343 | 41_microphones said argument_feel microphones said argument_microphones said argument help_participants microphone |
|
83 |
+
| 42 | commons difficult - commons unnecessarily difficult - commons unnecessarily - study commons - grading commons unfair | 310 | 42_commons difficult_commons unnecessarily difficult_commons unnecessarily_study commons |
|
84 |
+
| 43 | suggestions - think think suggestions - think suggestions - think think think suggestions - think suggestions time think | 264 | 43_suggestions_think think suggestions_think suggestions_think think think suggestions |
|
85 |
+
| 44 | think list far - think list - think think list far - presently - think presently | 209 | 44_think list far_think list_think think list far_presently |
|
86 |
+
| 45 | personally change - change personally - honest want changed - change everything - change personally change | 146 | 45_personally change_change personally_honest want changed_change everything |
|
87 |
+
|
88 |
+
</details>
|
89 |
+
|
90 |
+
## Training hyperparameters
|
91 |
+
|
92 |
+
* calculate_probabilities: False
|
93 |
+
* language: None
|
94 |
+
* low_memory: False
|
95 |
+
* min_topic_size: 10
|
96 |
+
* n_gram_range: (1, 1)
|
97 |
+
* nr_topics: auto
|
98 |
+
* seed_topic_list: None
|
99 |
+
* top_n_words: 7
|
100 |
+
* verbose: True
|
101 |
+
* zeroshot_min_similarity: 0.7
|
102 |
+
* zeroshot_topic_list: None
|
103 |
+
|
104 |
+
## Framework versions
|
105 |
+
|
106 |
+
* Numpy: 1.26.4
|
107 |
+
* HDBSCAN: 0.8.39
|
108 |
+
* UMAP: 0.5.7
|
109 |
+
* Pandas: 2.2.3
|
110 |
+
* Scikit-Learn: 1.5.2
|
111 |
+
* Sentence-transformers: 3.2.1
|
112 |
+
* Transformers: 4.46.2
|
113 |
+
* Numba: 0.60.0
|
114 |
+
* Plotly: 5.24.1
|
115 |
+
* Python: 3.10.11
|
config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": "auto",
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 7,
|
13 |
+
"verbose": true,
|
14 |
+
"zeroshot_min_similarity": 0.7,
|
15 |
+
"zeroshot_topic_list": null
|
16 |
+
}
|
ctfidf.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8cac49b9beb6bd36445c4fa183e7b5f21e37a3d5868564efa0c9d452a1c2d275
|
3 |
+
size 40845856
|
ctfidf_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:70881ce0dc2751c5416fafbba9c333715fd5821a125bdcf7210978bcbcd4cb8d
|
3 |
+
size 85806124
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:decf7cb62e47ae2fc0f4d5eacc454f4b028318f64e46ae3204814fcf8c2d166b
|
3 |
+
size 141400
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|