PEFT
Safetensors
English
Japanese
File size: 2,072 Bytes
b471491
 
 
77696b4
 
 
 
 
 
b471491
 
77696b4
b471491
77696b4
b471491
 
 
 
 
 
 
77696b4
 
 
 
b471491
161bb66
77696b4
 
b471491
e6aaa22
b471491
77696b4
b471491
e6aaa22
b471491
e6aaa22
b471491
77696b4
 
 
e6aaa22
77696b4
 
 
 
 
 
 
 
 
 
 
e6aaa22
77696b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b471491
 
e6aaa22
b471491
77696b4
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
datasets:
- Salesforce/wikitext
language:
- en
- ja
---

# Model Info

This is a model that applies LLM2Vec to Llama2. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Model type:** PEFT
- **Language(s) (NLP):** Japanese
- **License:** Apache2.0
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)

## Sources
- **Repository:**  https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961

# Usage

- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)

# Training Details

## Training Data

- [wikitext](https://huggingface.co/datasets/Salesforce/wikitext)


## Training Hyperparameter
- batch_size: 64,
- gradient_accumulation_steps: 1
- max_seq_length": 512,
- mask_token_type: "blank"
- mlm_probability: 0.2
- lora_r: 16
- torch_dtype "bfloat16"
- attn_implementation "flash_attention_2"
- bf16: true
- gradient_checkpointing: true

## Accelerator Settings
- deepspeed_config:
  - gradient_accumulation_steps: 1
  - gradient_clipping: 1.0
  - offload_optimizer_device: nvme
  - offload_optimizer_nvme_path: /nvme
  - zero3_save_16bit_model: true
  - zero_stage: 2 
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
  - dynamo_backend: INDUCTOR
  - dynamo_mode: default
  - dynamo_use_dynamic: true
  - dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false


## Framework versions

- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0