Verified Claim Reterieval
This repo demonstrates how to perform retrieval over a collection of verified claims for a given query. The provided model is a tuned version of AraBERT on AraFACTs dataset. Tuning details can be found in this repo.
First, you need to fill in the required paths for BERT model and Index. After that, you just need two steps:
- Create an object of ClaimRetrieval class and pass the suitable parameters.
- Invoke retrieve_relevant_vclaims and pass the tweets as parameters.
Example
Here is a full example to do that. First, prepare the input:
tweets = [{'id_str': '1433976054562045952',
'full_text': 'مرتضى منصور : قررت ايقاف الولد امام عاشور وبيعه ولو هيجيبلي كأس العالم.. "مينفعش لاعيية تبقى بتصلي ولاعب صابغ شعره زي البنات.. ايه القرف ده .. وشاطر بس يطلب زيادة عقده و مش قادر يجري و يروح نايم لي على بطنه ويتسبب ان يخش فينا اجوان ، ما تسترجل يا ولد انت موقوف ومتحول للتحقيق " https://t.co/df2QvC0Zu9'},] # input tweet
Then, initialize an object only once.
lang = "ar"
index_path = "path/to/pyterrier_index"
bert_name = "aubmindlab/bert-base-arabertv02"
trained_model_weights = "tuned_model_weights.bin" # AraBERT weights
claim_retrieval = ClaimRetrieval(index_path=index_path, lang='ar', bert_name, trained_model_weights, random_seed=42, depth=20, batch_size= 8,num_classes=2, dropout=0.3, is_output_probability=True, num_layers=2, max_len=256)
Pass the input tweet to retrieve the relevat vclaims
queries_and_relevant_vclaims = claim_retrieval.retrieve_relevant_vclaims(tweets)
Citation
If you used any piece of this repository, please consider citing our work :
@inproceedings{mansour2022did,
title={Did I See It Before? Detecting Previously-Checked Claims over Twitter},
author={Mansour, Watheq and Elsayed, Tamer and Al-Ali, Abdulaziz},
booktitle={European Conference on Information Retrieval},
pages={367--381},
year={2022},
organization={Springer}
}
license: cc-by-4.0