antoinelouis
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -99,8 +99,8 @@ with BM25 negatives.
|
|
99 |
#### Implementation
|
100 |
|
101 |
The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
|
102 |
-
ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q
|
103 |
-
|
104 |
of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
|
105 |
Relevance scores are computed with the cosine similarity.
|
106 |
|
|
|
99 |
#### Implementation
|
100 |
|
101 |
The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
|
102 |
+
ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q=3e-4
|
103 |
+
and lambda_d=1e-4. The model is fine-tuned on one 80GB NVIDIA H100 GPU for 100k steps using the AdamW optimizer with a batch size of 128, a peak learning rate
|
104 |
of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
|
105 |
Relevance scores are computed with the cosine similarity.
|
106 |
|