This is a fine-tuned version of the XLM-RoBERTa model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.

It can process texts up to 512 tokens and performs well on khmer text inputs.

  • Task: Sentiment analysis (binary classification).

  • Languages Supported: Khmer.

  • Intended Use Cases:

    • Analyzing customer reviews.
    • Social media sentiment detection.
  • Limitations: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well.

    The model was evaluated on a test set of 400 samples, achieving the following performance:

  • Test Accuracy: 83.25%

  • Precision: 83.55%

  • Recall: 83.25%

  • F1 Score: 83.25%

Confusion Matrix:

Predicted\Actual Negative Positive
Negative 166 42
Positive 25 167
The model supports a maximum sequence length of 512 tokens.

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
model = AutoModelForSequenceClassification.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")

text = "អគុណCADT"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=1)
labels_mapping = {0: 'negative', 1: 'positive'}
print("Predicted Class:", labels_mapping[predictions.item()])
Downloads last month
122
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tykea/khmer-text-sentiment-analysis-roberta

Finetuned
(2723)
this model