|
--- |
|
license: mit |
|
language: |
|
- ar |
|
--- |
|
|
|
# akhooli/arabic-colbertv2-711k-norm |
|
This is a ColBERT V2 model trained on [Arabic mMARCO dataset sample](https://huggingface.co/datasets/akhooli/ar-mmarco-sample) after removing queries with Latin words (711K queries). |
|
It is not fully trained (22000 steps only), but is good for many tasks especially ranking and information retrieval (semantic search). |
|
The dataset was normalized before training, so please normalize your query and docs before using it. |
|
```python |
|
from unicodedata import normalize |
|
query_n = normalize('NFKC', query) |
|
``` |