antoinelouis commited on
Commit
77c4a1f
·
verified ·
1 Parent(s): 37ae2c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -99,8 +99,8 @@ with BM25 negatives.
99
  #### Implementation
100
 
101
  The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
102
- ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q
103
- = 3e-4 and lambda_d = 1e-4. The model is fine-tuned on one 80GB NVIDIA H100 GPU for 100k steps using the AdamW optimizer with a batch size of 128, a peak learning rate
104
  of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
105
  Relevance scores are computed with the cosine similarity.
106
 
 
99
  #### Implementation
100
 
101
  The model is initialized from the [almanach/camembert-base](https://huggingface.co/almanach/camembert-base) checkpoint and optimized via a combination of the InfoNCE
102
+ ranking loss with a temperature of 0.05 and the FLOPS regularization loss with quadratic increase of lambda until step 33k after which it remains constant with lambda_q=3e-4
103
+ and lambda_d=1e-4. The model is fine-tuned on one 80GB NVIDIA H100 GPU for 100k steps using the AdamW optimizer with a batch size of 128, a peak learning rate
104
  of 2e-5 with warm up along the first 4000 steps and linear scheduling. The maximum sequence lengths for questions and passages length were fixed to 32 and 128 tokens.
105
  Relevance scores are computed with the cosine similarity.
106