Update README.md
Browse files
README.md
CHANGED
@@ -7,17 +7,8 @@ tags:
|
|
7 |
|
8 |
# *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
|
9 |
|
10 |
-
[![ArXiv](https://img.shields.io/badge/ArXiv-2025-fb1b1b.svg)](https://arxiv.org/abs/2501.00874)
|
11 |
-
[![HF Paper](https://img.shields.io/badge/HF%20Paper-2025-b31b1b.svg)](https://huggingface.co/papers/2501.00874)
|
12 |
-
[![HF Link](https://img.shields.io/badge/HF%20Model-LUSIFER-FFD21E.svg)](https://huggingface.co/Hieuman/LUSIFER)
|
13 |
-
[![License](https://img.shields.io/badge/License-MIT-FD21E.svg)](LICENSE)
|
14 |
-
|
15 |
LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
|
16 |
|
17 |
-
<p align="center">
|
18 |
-
<img src="https://github.com/hieum98/lusifer/blob/main/asserts/Model_overview.png" width="85%" alt="LUSIFER_figure1"/>
|
19 |
-
</p>
|
20 |
-
|
21 |
## Installation
|
22 |
To use LUSFIER, install evironment from ```environment.yaml``` (optional)
|
23 |
```bash
|
@@ -112,10 +103,6 @@ To be concise, we suggest the following training process: reconstruction task on
|
|
112 |
## Evaluation
|
113 |
We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks: Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
|
114 |
|
115 |
-
<p align="center">
|
116 |
-
<img src="https://github.com/hieum98/lusifer/blob/main/asserts/Benchmark.png" width="85%" alt="Benchmark"/>
|
117 |
-
</p>
|
118 |
-
|
119 |
We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
|
120 |
```bash
|
121 |
python -m lusifer.eval.eval \
|
@@ -124,11 +111,7 @@ python -m lusifer.eval.eval \
|
|
124 |
```
|
125 |
|
126 |
## Results
|
127 |
-
We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets.
|
128 |
-
|
129 |
-
<p align="center">
|
130 |
-
<img src="https://github.com/hieum98/lusifer/blob/main/asserts/Results.png" width="85%" alt="results"/>
|
131 |
-
</p>
|
132 |
|
133 |
## Citation
|
134 |
If you use LUSIFER in your research, please cite the following paper:
|
|
|
7 |
|
8 |
# *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
|
9 |
|
|
|
|
|
|
|
|
|
|
|
10 |
LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
|
11 |
|
|
|
|
|
|
|
|
|
12 |
## Installation
|
13 |
To use LUSFIER, install evironment from ```environment.yaml``` (optional)
|
14 |
```bash
|
|
|
103 |
## Evaluation
|
104 |
We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks: Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
|
105 |
|
|
|
|
|
|
|
|
|
106 |
We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
|
107 |
```bash
|
108 |
python -m lusifer.eval.eval \
|
|
|
111 |
```
|
112 |
|
113 |
## Results
|
114 |
+
We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets. Please refer the paper for the full results
|
|
|
|
|
|
|
|
|
115 |
|
116 |
## Citation
|
117 |
If you use LUSIFER in your research, please cite the following paper:
|