akhooli commited on
Commit
35b8534
1 Parent(s): 2ebd659

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: sentence-similarity
6
  tags:
7
  - ColBERT
8
  base_model:
9
- - aubmindlab/bert-base-arabertv2
10
  license: mit
11
  library_name: RAGatouille
12
  ---
@@ -14,6 +14,8 @@ library_name: RAGatouille
14
 
15
  # Arabic-ColBERT-100k
16
 
17
- First version of Arabic ColBERT. This version uses the bert-base-arabertv2 which is pre-segmented text using Farasa.
18
- A new version based on bert-base-arabertv0.2 will be trained and this repo will be updated.
 
 
19
  See https://www.linkedin.com/posts/akhooli_this-is-probably-the-first-arabic-colbert-activity-7217969205197848576-l8Cy
 
6
  tags:
7
  - ColBERT
8
  base_model:
9
+ - aubmindlab/bert-base-arabertv02
10
  license: mit
11
  library_name: RAGatouille
12
  ---
 
14
 
15
  # Arabic-ColBERT-100k
16
 
17
+ First version of Arabic ColBERT.
18
+ This model was trained on 100K random triplets of the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco) which has around 39M Arabic (translated) triplets.
19
+ mMARCO is the multiligual version of [Microsoft's MARCO dataset](https://microsoft.github.io/msmarco/).
20
+
21
  See https://www.linkedin.com/posts/akhooli_this-is-probably-the-first-arabic-colbert-activity-7217969205197848576-l8Cy