Hieuman
/

LUSIFER

Safetensors

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

Hieuman commited on 17 days ago

Commit

831cafb

verified ·

1 Parent(s): b7071bf

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -18

README.md CHANGED Viewed

@@ -7,17 +7,8 @@ tags:
 # *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
-[![ArXiv](https://img.shields.io/badge/ArXiv-2025-fb1b1b.svg)](https://arxiv.org/abs/2501.00874)
-[![HF Paper](https://img.shields.io/badge/HF%20Paper-2025-b31b1b.svg)](https://huggingface.co/papers/2501.00874)
-[![HF Link](https://img.shields.io/badge/HF%20Model-LUSIFER-FFD21E.svg)](https://huggingface.co/Hieuman/LUSIFER)
-[![License](https://img.shields.io/badge/License-MIT-FD21E.svg)](LICENSE)
 LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
-<p align="center">
-  <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Model_overview.png" width="85%" alt="LUSIFER_figure1"/>
-</p>
 ## Installation
 To use LUSFIER, install evironment from ```environment.yaml``` (optional)
 ```bash
@@ -112,10 +103,6 @@ To be concise, we suggest the following training process: reconstruction task on
 ## Evaluation
 We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks:  Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
-<p align="center">
-  <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Benchmark.png" width="85%" alt="Benchmark"/>
-</p>
 We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
 ```bash
 python -m lusifer.eval.eval \
@@ -124,11 +111,7 @@ python -m lusifer.eval.eval \
 ```
 ## Results
-We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets.
-<p align="center">
-  <img src="https://github.com/hieum98/lusifer/blob/main/asserts/Results.png" width="85%" alt="results"/>
-</p>
 ## Citation
 If you use LUSIFER in your research, please cite the following paper:

 # *LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models*
 LUSIFER is framework for bridging the gap between multilingual understanding and task-specific text embeddings without relying on explicit multilingual supervision. It does this by combining a multilingual encoder (providing a universal language foundation) with an LLM-based embedding model (optimized for embedding tasks), connected through a minimal set of trainable parameters. LUSIFER also introduces two stages of training process: 1) Alignment Training and 2) Representation Fine-tuning to optimize the model for zero-shot multilingual embeddings.
 ## Installation
 To use LUSFIER, install evironment from ```environment.yaml``` (optional)
 ```bash
 ## Evaluation
 We propose a new benchmark for evaluating the model on the multilingual text embedding task. The benchmark includes 5 primary embedding tasks:  Classification, Clustering, Reranking, Retrieval, and Semantic Textual Similarity (STS) across 123 diverse datasets spanning 14 languages
 We support to evaluate model on various datasets by intergrating [`mteb`](https://github.com/embeddings-benchmark/mteb) library. To evaluate the model, run the following command:
 ```bash
 python -m lusifer.eval.eval \
 ```
 ## Results
+We provide the results of LUSIFER on the multilingual text embedding benchmark in the following table. The results are reported in terms of the average main metric across all tasks and datasets. Please refer the paper for the full results
 ## Citation
 If you use LUSIFER in your research, please cite the following paper: