tykea
/

khmer-text-sentiment-analysis-roberta

Text Classification

Inference Endpoints

Model card Files Files and versions Community

khmer-text-sentiment-analysis-roberta / README.md

tykea's picture

Update README.md

20f0a1c verified 21 days ago

|

history blame contribute delete

1.77 kB

	---
	license: mit
	language:
	- km
	base_model:
	- FacebookAI/xlm-roberta-base
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- sentiment
	---

	This is a fine-tuned version of the XLM-RoBERTa model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.

	It can process texts up to 512 tokens and performs well on khmer text inputs.

	- Task: Sentiment analysis (binary classification).
	- Languages Supported: Khmer.
	- Intended Use Cases:
	- Analyzing customer reviews.
	- Social media sentiment detection.
	- Limitations:
	- Performance may degrade on languages or domains not present in the training data.
	- Does not handle sarcasm or highly ambiguous inputs well.
	-
	The model was evaluated on a test set of 400 samples, achieving the following performance:

	- Test Accuracy: 83.25%
	- Precision: 83.55%
	- Recall: 83.25%
	- F1 Score: 83.25%

	Confusion Matrix:
	\| Predicted\Actual \| Negative \| Positive \|
	\|-------------------\|----------\|----------\|
	\| Negative \| 166 \| 42 \|
	\| Positive \| 25 \| 167 \|
	The model supports a maximum sequence length of 512 tokens.
	## How to Use
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
	model = AutoModelForSequenceClassification.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")

	text = "អគុណCADT"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
	outputs = model(**inputs)
	predictions = outputs.logits.argmax(dim=1)
	labels_mapping = {0: 'negative', 1: 'positive'}
	print("Predicted Class:", labels_mapping[predictions.item()])