PEFT
Safetensors
English
Japanese
h-iida's picture
Update README.md
e6aaa22 verified
---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
datasets:
- Salesforce/wikitext
language:
- en
- ja
---
# Model Info
This is a model that applies LLM2Vec to Llama2. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Model type:** PEFT
- **Language(s) (NLP):** Japanese
- **License:** Apache2.0
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
## Sources
- **Repository:** https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961
# Usage
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
# Training Details
## Training Data
- [wikitext](https://huggingface.co/datasets/Salesforce/wikitext)
## Training Hyperparameter
- batch_size: 64,
- gradient_accumulation_steps: 1
- max_seq_length": 512,
- mask_token_type: "blank"
- mlm_probability: 0.2
- lora_r: 16
- torch_dtype "bfloat16"
- attn_implementation "flash_attention_2"
- bf16: true
- gradient_checkpointing: true
## Accelerator Settings
- deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false
## Framework versions
- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0