bertopic-base-finetuned-ecommerce-ergonomics-chair
This a Bertopic base model , with embedding_model "sentence-transformers/all-mpnet-base-v2", finetuned by embedding on ergonomics chair product reviews in English language. This model is intended for direct use as a topic/cluster model for product reviews in ergonomics chair minor field, or for further finetuning on related clustering analysis tasks. I replaced its head with our customer reviews to fine-tune it on 50,000+ rows of training set while validating it on 35,000 rows of dev set.
- Developed by: ilaria Huang
- Model type: Language model
- Language(s) (NLP): en
- License: mit
- Parent Model: sentence-transformers/all-mpnet-base-v2; BERTopic; phrasemachine
- Resources for more information:
How to Get Started with the Model
Use the code below to get started with the model.
loaded_model = BERTopic.load("bertopic-base-finetuned-ecommerce-ergonomics-chair")
Training Procedure
Preprocessing
sentence_model encoded with remove duplicated phrases of ergonomics chair customer review data, preprocessed by phrasemachine (https://github.com/slanglab/phrasemachine)
Training Data
I use the customer review data from multiple e-commerce websites to fine-tune our model. You download all the raw datasets from: dataset
Pre-training
I use the pretrained sentence-transformers/all-mpnet-base-v2 model and BERTopic model. Please refer to the model card(https://huggingface.co/sentence-transformers/all-mpnet-base-v2),(https://huggingface.co/blog/bertopic) for more detailed information about the pre-training procedure.
Hyperparameters
The following hyperparameters were used during training:
- representation_model = MaximalMarginalRelevance(diversity=0.3) # for diversity in topic name
- vectorizer_model = CountVectorizer
- sentence_model = SentenceTransformer("all-mpnet-base-v2")
- hdbscan_model = HDBSCAN(min_cluster_size=50, metric='euclidean', cluster_selection_method='eom', prediction_data=True)
- umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)
Framework versions
- BERTopic version: 0.16.0
- Python version: 3.11.1