File size: 1,769 Bytes
a0d052c 76f5303 281407f 76f5303 281407f 76f5303 20f0a1c 76f5303 6fb65a8 76f5303 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
license: mit
language:
- km
base_model:
- FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
library_name: transformers
tags:
- sentiment
---
**This is a fine-tuned version of the XLM-RoBERTa model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.**
**It can process texts up to 512 tokens and performs well on khmer text inputs.**
- **Task**: Sentiment analysis (binary classification).
- **Languages Supported**: Khmer.
- **Intended Use Cases**:
- Analyzing customer reviews.
- Social media sentiment detection.
- **Limitations**:
- Performance may degrade on languages or domains not present in the training data.
- Does not handle sarcasm or highly ambiguous inputs well.
-
The model was evaluated on a test set of 400 samples, achieving the following performance:
- **Test Accuracy**: 83.25%
- **Precision**: 83.55%
- **Recall**: 83.25%
- **F1 Score**: 83.25%
Confusion Matrix:
| Predicted\Actual | Negative | Positive |
|-------------------|----------|----------|
| **Negative** | 166 | 42 |
| **Positive** | 25 | 167 |
The model supports a maximum sequence length of 512 tokens.
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
model = AutoModelForSequenceClassification.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
text = "អគុណCADT"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=1)
labels_mapping = {0: 'negative', 1: 'positive'}
print("Predicted Class:", labels_mapping[predictions.item()]) |