Model Info

This is a model that applies LLM2Vec to Llama-2. Only the PEFT Adapter is distributed. LLM2Vec is fine-tuned on two tasks: MNTP and SimCSE, and this repository contains the results of applying SimCSE after MNTP. For the MNTP Adapter, please refer to this link.

Model Details

Model Description

Model type: PEFT
Language(s) (NLP): English
License: Apache2.0
Finetuned from model: Llama-2-7b-hf

Model Sources [optional]

Repository: https://github.com/McGill-NLP/llm2vec
Paper: https://arxiv.org/abs/2404.05961

Usage

Please see original LLM2Vec repo

BenchMark

Followings are summaries. Details are here

MTEB(Japansese)

| | Classification | Clustering | PairClassification | Reranking | BitextMining | Retrieval | Sts | AVG | | --- | ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| | Llama2-Llm2vec-eng (This repo) | 0.527 | 0.258 | 0.501 | 0.217 | 0.275 | 0.296 | 0.765 | 0.408 | | Llama2-Llm2vec-jpn | 0.570 | 0.365 | 0.510 | 0.349 | 0.470 | 0.417 | 0.795 | 0.498 | | Swallow-Llm2vec-jpn | 0.621 | 0.391 | 0.510 | 0.475 | 0.475 | 0.491 | 0.832 | 0.523 |

MTEB(English)

	Classification	Clustering	Pair_Classification	Reranking	Retrieval	STS	平均
Llama2-Llm2vec-eng (this repo)	0.709	0.386	0.780	0.588	0.329	0.723	0.586
Llama2-Llm2vec-jpn	0.722	0.428	0.785	0.594	0.371	0.717	0.603
Swallow-Llm2vec-jpn	0.695	0.385	0.751	0.576	0.318	0.710	0.572

Training Details

Training Data

Corpus for SimCSE from Wikipedia

Training Hyperparameter

simcse_dropout: 0.3
bidirectional: true
pooling_mode: "mean"
remove_unused_columns: false
learning_rate: 3e-5
loss_scale: 20
batch_size: 256
gradient_accumulation_steps: 1
max_seq_length: 128
lora_r: 16
torch_dtype: "bfloat16"
attn_implementation: "flash_attention_2"
seed: 42
bf16: true
gradient_checkpointing: true

Accelerator Settings

deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
quse_cpu: false

Framework versions

Python: 3.12.3
PEFT 0.11.1
Sentence Transformers: 3.0.1
Transformers: 4.41.0
PyTorch: 2.3.0
Accelerate: 0.30.1
Datasets: 2.20.0
Tokenizers: 0.19.1
MTEB: 1.13.0

uzabase
/

LLM2Vec-Llama-2-7b-hf-mntp-unsup-simcse