akhooli
/

arabic-colbertv2-711k-norm

Model card Files Files and versions Community

arabic-colbertv2-711k-norm / README.md

akhooli's picture

Update README.md

1ceccc5 verified 4 months ago

|

history blame contribute delete

578 Bytes

	---
	license: mit
	language:
	- ar
	---

	# akhooli/arabic-colbertv2-711k-norm
	This is a ColBERT V2 model trained on [Arabic mMARCO dataset sample](https://huggingface.co/datasets/akhooli/ar-mmarco-sample) after removing queries with Latin words (711K queries).
	It is not fully trained (22000 steps only), but is good for many tasks especially ranking and information retrieval (semantic search).
	The dataset was normalized before training, so please normalize your query and docs before using it.
	```python
	from unicodedata import normalize
	query_n = normalize('NFKC', query)
	```