tattrongvu
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,7 @@ license: apache-2.0
|
|
3 |
datasets:
|
4 |
- tattrongvu/vqa_de_en_batch1
|
5 |
- vidore/colpali_train_set
|
|
|
6 |
language:
|
7 |
- en
|
8 |
- de
|
@@ -45,4 +46,4 @@ The dataset was extended from the original colpali train set with the gemini 1.5
|
|
45 |
We train models use low-rank adapters ([LoRA](https://arxiv.org/abs/2106.09685))
|
46 |
with `alpha=64` and `r=64` on the transformer layers from the language model,
|
47 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
48 |
-
We train on an 8xH100 GPU setup with distriuted data parallelism (via accelerate), a learning rate of 2e-4 with linear decay with 1% warmup steps, batch size per device is 64, in `bfloat16` format
|
|
|
3 |
datasets:
|
4 |
- tattrongvu/vqa_de_en_batch1
|
5 |
- vidore/colpali_train_set
|
6 |
+
- tattrongvu/sharegpt4v_vqa_200k_batch1
|
7 |
language:
|
8 |
- en
|
9 |
- de
|
|
|
46 |
We train models use low-rank adapters ([LoRA](https://arxiv.org/abs/2106.09685))
|
47 |
with `alpha=64` and `r=64` on the transformer layers from the language model,
|
48 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
49 |
+
We train on an 8xH100 GPU setup with distriuted data parallelism (via accelerate), a learning rate of 2e-4 with linear decay with 1% warmup steps, batch size per device is 64, in `bfloat16` format
|