Hieuman commited on
Commit
831cafb
·
verified ·
1 Parent(s): b7071bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -18
README.md CHANGED
@@ -7,17 +7,8 @@ tags:
7
 
8
  # *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
9
 
10
- [![ArXiv](https://img.shields.io/badge/ArXiv-2025-fb1b1b.svg)](https://arxiv.org/abs/2501.00874)
11
- [![HF Paper](https://img.shields.io/badge/HF%20Paper-2025-b31b1b.svg)](https://huggingface.co/papers/2501.00874)
12
- [![HF Link](https://img.shields.io/badge/HF%20Model-LUSIFER-FFD21E.svg)](https://huggingface.co/Hieuman/LUSIFER)
13
- [![License](https://img.shields.io/badge/License-MIT-FD21E.svg)](LICENSE)
14
-
15
  LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
16
 
17
- <p align="center">
18
- <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Model_overview.png" width="85%" alt="LUSIFER_figure1"/>
19
- </p>
20
-
21
  ## Installation
22
  To use LUSFIER, install evironment from ```environment.yaml``` (optional)
23
  ```bash
@@ -112,10 +103,6 @@ To be concise, we suggest the following training process: reconstruction task on
112
  ## Evaluation
113
  We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks: Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
114
 
115
- <p align="center">
116
- <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Benchmark.png" width="85%" alt="Benchmark"/>
117
- </p>
118
-
119
  We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
120
  ```bash
121
  python -m lusifer.eval.eval \
@@ -124,11 +111,7 @@ python -m lusifer.eval.eval \
124
  ```
125
 
126
  ## Results
127
- We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets.
128
-
129
- <p align="center">
130
- <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Results.png" width="85%" alt="results"/>
131
- </p>
132
 
133
  ## Citation
134
  If you use LUSIFER in your research, please cite the following paper:
 
7
 
8
  # *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
9
 
 
 
 
 
 
10
  LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
11
 
 
 
 
 
12
  ## Installation
13
  To use LUSFIER, install evironment from ```environment.yaml``` (optional)
14
  ```bash
 
103
  ## Evaluation
104
  We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks: Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
105
 
 
 
 
 
106
  We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
107
  ```bash
108
  python -m lusifer.eval.eval \
 
111
  ```
112
 
113
  ## Results
114
+ We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets. Please refer the paper for the full results
 
 
 
 
115
 
116
  ## Citation
117
  If you use LUSIFER in your research, please cite the following paper: