--- datasets: - unicamp-dl/mmarco library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - feature-extraction - sentence-similarity license: mit widget: [] base_model: - BAAI/bge-m3 --- # BGE-M3 Lingustic Transfer (Catalan-French) This is a [BGE-M3](https://huggingface.co/BAAI/bge-m3) model post-trained on French translated to Catalan Queries and French Documents from MMARCO/v2. This model was fine-tuned for the "Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer" ECIR2025 paper. The source code for the paper can be found [here](https://github.com/andreaschari/linguistic-transfer) ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 1024 tokens - **Similarity Function:** Cosine Similarity ## Training Details ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.1 - Transformers: 4.41.2 - PyTorch: 2.4.0.post301 - Accelerate: 0.32.1 - Datasets: 2.19.1 - Tokenizers: 0.19.1