|
--- |
|
datasets: |
|
- unicamp-dl/mmarco |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
license: mit |
|
widget: [] |
|
base_model: |
|
- BAAI/bge-m3 |
|
--- |
|
|
|
# BGE-M3 Lingustic Transfer (Catalan-French) |
|
|
|
This is a [BGE-M3](https://huggingface.co/BAAI/bge-m3) model post-trained on French translated to Catalan Queries and French Documents from MMARCO/v2. |
|
|
|
This model was fine-tuned for the "Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer" ECIR2025 paper. The source code for the paper can be found [here](https://github.com/andreaschari/linguistic-transfer) |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) --> |
|
- **Maximum Sequence Length:** 8192 tokens |
|
- **Output Dimensionality:** 1024 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
## Training Details |
|
|
|
### Framework Versions |
|
- Python: 3.10.14 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.2 |
|
- PyTorch: 2.4.0.post301 |
|
- Accelerate: 0.32.1 |
|
- Datasets: 2.19.1 |
|
- Tokenizers: 0.19.1 |