Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,11 @@ tags:
|
|
7 |
model-index:
|
8 |
- name: sentiment-thai-text-model
|
9 |
results: []
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -18,18 +23,42 @@ This model is a fine-tuned version of [poom-sci/WangchanBERTa-finetuned-sentimen
|
|
18 |
|
19 |
## Model description
|
20 |
|
21 |
-
|
22 |
|
23 |
## Intended uses & limitations
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## Training and evaluation data
|
28 |
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
## Training procedure
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
### Training hyperparameters
|
34 |
|
35 |
The following hyperparameters were used during training:
|
@@ -46,4 +75,4 @@ The following hyperparameters were used during training:
|
|
46 |
- Transformers 4.44.2
|
47 |
- Pytorch 2.4.1+cu121
|
48 |
- Datasets 3.0.1
|
49 |
-
- Tokenizers 0.19.1
|
|
|
7 |
model-index:
|
8 |
- name: sentiment-thai-text-model
|
9 |
results: []
|
10 |
+
datasets:
|
11 |
+
- pythainlp/wisesight_sentiment
|
12 |
+
language:
|
13 |
+
- th
|
14 |
+
pipeline_tag: text-classification
|
15 |
---
|
16 |
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
23 |
|
24 |
## Model description
|
25 |
|
26 |
+
This model is a fine-tuned version of poom-sci/WangchanBERTa-finetuned-sentiment, specifically tailored for sentiment analysis on Thai-language texts. The fine-tuning was performed to improve performance on a custom Thai dataset for sentiment classification. The model is based on WangchanBERTa, a powerful transformer-based language model developed for Thai by the National Electronics and Computer Technology Center (NECTEC) in Thailand.
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
+
This model is designed to perform sentiment analysis, categorizing input text into three classes: positive, neutral, and negative. It can be used in a variety of natural language processing (NLP) applications such as:
|
31 |
+
|
32 |
+
Social media sentiment analysis
|
33 |
+
Product or service reviews sentiment classification
|
34 |
+
Customer feedback processing
|
35 |
+
|
36 |
+
Limitations:
|
37 |
+
Language: The model is specialized for Thai text and may not perform well with other languages.
|
38 |
+
Generalization: The model's performance depends on the quality and diversity of the dataset used for fine-tuning. It may not generalize well to domains that differ significantly from the training data.
|
39 |
+
Ambiguity: Handling of highly ambiguous or sarcastic sentences may still be challenging.
|
40 |
|
41 |
## Training and evaluation data
|
42 |
|
43 |
+
The model was fine-tuned on a sentiment classification dataset composed of Thai-language text. The dataset includes sentences and texts from multiple domains, such as social media, product reviews, and general user feedback, labeled into three categories:
|
44 |
+
|
45 |
+
Positive: Indicates that the text expresses positive sentiment.
|
46 |
+
Neutral: Indicates that the text is neutral or objective in sentiment.
|
47 |
+
Negative: Indicates that the text expresses negative sentiment.
|
48 |
+
More details on the dataset used can be provided upon request.
|
49 |
|
50 |
## Training procedure
|
51 |
|
52 |
+
The model was trained using the following hyperparameters:
|
53 |
+
|
54 |
+
Learning rate: 2e-05
|
55 |
+
Batch size: 32 for both training and evaluation
|
56 |
+
Seed: 42 (for reproducibility)
|
57 |
+
Optimizer: Adam (with betas=(0.9, 0.999) and epsilon=1e-08)
|
58 |
+
Scheduler: Linear learning rate scheduler
|
59 |
+
Number of epochs: 2
|
60 |
+
The training used a combination of cross-entropy loss for multi-class classification and early stopping based on evaluation metrics.
|
61 |
+
|
62 |
### Training hyperparameters
|
63 |
|
64 |
The following hyperparameters were used during training:
|
|
|
75 |
- Transformers 4.44.2
|
76 |
- Pytorch 2.4.1+cu121
|
77 |
- Datasets 3.0.1
|
78 |
+
- Tokenizers 0.19.1
|