Edit model card


bertopic-base-finetuned-ecommerce-ergonomics-chair

This a Bertopic base model , with embedding_model "sentence-transformers/all-mpnet-base-v2", finetuned by embedding on ergonomics chair product reviews in English language. This model is intended for direct use as a topic/cluster model for product reviews in ergonomics chair minor field, or for further finetuning on related clustering analysis tasks. I replaced its head with our customer reviews to fine-tune it on 50,000+ rows of training set while validating it on 35,000 rows of dev set.

  • Developed by: ilaria Huang
  • Model type: Language model
  • Language(s) (NLP): en
  • License: mit
  • Parent Model: sentence-transformers/all-mpnet-base-v2; BERTopic; phrasemachine
  • Resources for more information:

How to Get Started with the Model

Use the code below to get started with the model.

loaded_model = BERTopic.load("bertopic-base-finetuned-ecommerce-ergonomics-chair")

Training Procedure

Preprocessing

sentence_model encoded with remove duplicated phrases of ergonomics chair customer review data, preprocessed by phrasemachine (https://github.com/slanglab/phrasemachine)

Training Data

I use the customer review data from multiple e-commerce websites to fine-tune our model. You download all the raw datasets from: dataset

Pre-training

I use the pretrained sentence-transformers/all-mpnet-base-v2 model and BERTopic model. Please refer to the model card(https://huggingface.co/sentence-transformers/all-mpnet-base-v2),(https://huggingface.co/blog/bertopic) for more detailed information about the pre-training procedure.

Hyperparameters

The following hyperparameters were used during training:

  • representation_model = MaximalMarginalRelevance(diversity=0.3) # for diversity in topic name
  • vectorizer_model = CountVectorizer
  • sentence_model = SentenceTransformer("all-mpnet-base-v2")
  • hdbscan_model = HDBSCAN(min_cluster_size=50, metric='euclidean', cluster_selection_method='eom', prediction_data=True)
  • umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)

Framework versions

  • BERTopic version: 0.16.0
  • Python version: 3.11.1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Spaces using liaHa/bertopic-base-finetuned-ecommerce-ergonomics-chair 2