StevenLimcorn's picture
Update README.md
638a504
---
language: id
tags:
- roberta
license: mit
datasets:
- indonli
widget:
- text: "Amir Sjarifoeddin Harahap lahir di Kota Medan, Sumatera Utara, 27 April 1907. Ia meninggal di Surakarta, Jawa Tengah, pada 19 Desember 1948 dalam usia 41 tahun. </s></s> Amir Sjarifoeddin Harahap masih hidup."
---
## Indo-roberta-indonli
Indo-roberta-indonli is natural language inference classifier based on [Indo-roberta](https://huggingface.co/flax-community/indonesian-roberta-base) model. It was trained on the trained on [IndoNLI](https://github.com/ir-nlp-csui/indonli/tree/main/data/indonli) dataset. The model used was [Indo-roberta](https://huggingface.co/flax-community/indonesian-roberta-base) and was transfer-learned to a natural inference classifier model. The model are tested using the validation, test_layer and test_expert dataset given in the github repository. The results are shown below.
### Result
| Dataset | Accuracy | F1 | Precision | Recall |
|-------------|----------|---------|-----------|---------|
| Test Lay | 0.74329 | 0.74075 | 0.74283 | 0.74133 |
| Test Expert | 0.6115 | 0.60543 | 0.63924 | 0.61742 |
## Model
The model was trained on with 5 epochs, batch size 16, learning rate 2e-5 and weight decay 0.01. Achieved different metrics as shown below.
| Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall |
|-------|---------------|-----------------|----------|----------|-----------|----------|
| 1 | 0.942500 | 0.658559 | 0.737369 | 0.735552 | 0.735488 | 0.736679 |
| 2 | 0.649200 | 0.645290 | 0.761493 | 0.759593 | 0.762784 | 0.759642 |
| 3 | 0.437100 | 0.667163 | 0.766045 | 0.763979 | 0.765740 | 0.763792 |
| 4 | 0.282000 | 0.786683 | 0.764679 | 0.761802 | 0.762011 | 0.761684 |
| 5 | 0.193500 | 0.925717 | 0.765134 | 0.763127 | 0.763560 | 0.763489 |
## How to Use
### As NLI Classifier
```python
from transformers import pipeline
pretrained_name = "StevenLimcorn/indonesian-roberta-indonli"
nlp = pipeline(
"zero-shot-classification",
model=pretrained_name,
tokenizer=pretrained_name
)
nlp("Amir Sjarifoeddin Harahap lahir di Kota Medan, Sumatera Utara, 27 April 1907. Ia meninggal di Surakarta, Jawa Tengah, pada 19 Desember 1948 dalam usia 41 tahun. </s></s> Amir Sjarifoeddin Harahap masih hidup.")
```
## Disclaimer
Do consider the biases which come from both the pre-trained RoBERTa model and the `INDONLI` dataset that may be carried over into the results of this model.
## Author
Indonesian RoBERTa Base IndoNLI was trained and evaluated by [Steven Limcorn](https://github.com/stevenlimcorn). All computation and development are done on Google Colaboratory using their free GPU access.
## Reference
The dataset we used is by IndoNLI.
```
@inproceedings{indonli,
title = "IndoNLI: A Natural Language Inference Dataset for Indonesian",
author = "Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
publisher = "Association for Computational Linguistics",
}
```