File size: 2,141 Bytes

965c7e3

## Purpose:

This model is a query classifer for the Arabic Language. It returns a 0 for a query of words and 1 for a fully-formed question.

It was built in three steps.

1. Take the same useful Kaggle training data that Sharukh used, and only take the 'dev.csv' data, which is more than sufficient. Split that later into a new set of trian, val, and test sets. Translate it into Arabic using the Seq2Seq translation model "facebook/m2m100_1.2B". The priority was to have syntactially correct translations, and not necessarily semantically correct. In that sense, for word queries the words were translated individually and recombined into one string. The questions were translated as-is, and sometimes the results were a mix of Arabic and English (this is, I think, due to the details of the m2m model's vocab size and tokenizer). About 28% of the training data had question marks written explicitly.

2. Use the model [ARBERT](https://huggingface.co/UBC-NLP/ARBERT) as the base, and finetune on the above data.

3. Distill the above model into a smaller size. I was not very succesful in reducing the size significaly, although I reduced the hidden layers from 12 to 4. 


Results of testing on distilled model:

{'accuracy': 0.9812329107631121,  
'precision': 0.9833664349553128,  
'recall': 0.9792336217552534,  
'roc_auc': 0.98124390410432,  
'f1': 0.9812956769478509,  
'matthews': 0.9624741598127332,  
'mse': 0.018767089236887895,  
'brier': 0.018767089236887895}


## Thanks:

This model was inspired by this Github [thread]https://github.com/deepset-ai/haystack/issues/611) wherein making a query classifer model is discussed, and also [Sharukh Khan's] (https://github.com/shahrukhx01) resulting English model based on DistilBert.

Regarding the model distillation, I owe thanks to the following source:

	[Knowledge Distillation article by Phil Schmid](https://www.philschmid.de/knowledge-distillation-bert-transformers)

	Articles by Remi Reboul:

	https://towardsdatascience.com/distillation-of-bert-like-models-the-theory-32e19a02641f

	https://towardsdatascience.com/distillation-of-bert-like-models-the-code-73c31e8c2b0a