|
--- |
|
base_model: meta-llama/Llama-2-7b-hf |
|
library_name: peft |
|
license: apache-2.0 |
|
datasets: |
|
- Salesforce/wikitext |
|
language: |
|
- en |
|
- ja |
|
--- |
|
|
|
# Model Info |
|
|
|
This is a model that applies LLM2Vec to Llama2. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Model type:** PEFT |
|
- **Language(s) (NLP):** Japanese |
|
- **License:** Apache2.0 |
|
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
|
|
## Sources |
|
- **Repository:** https://github.com/McGill-NLP/llm2vec |
|
- **Paper:** https://arxiv.org/abs/2404.05961 |
|
|
|
# Usage |
|
|
|
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage) |
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
- [wikitext](https://huggingface.co/datasets/Salesforce/wikitext) |
|
|
|
|
|
## Training Hyperparameter |
|
- batch_size: 64, |
|
- gradient_accumulation_steps: 1 |
|
- max_seq_length": 512, |
|
- mask_token_type: "blank" |
|
- mlm_probability: 0.2 |
|
- lora_r: 16 |
|
- torch_dtype "bfloat16" |
|
- attn_implementation "flash_attention_2" |
|
- bf16: true |
|
- gradient_checkpointing: true |
|
|
|
## Accelerator Settings |
|
- deepspeed_config: |
|
- gradient_accumulation_steps: 1 |
|
- gradient_clipping: 1.0 |
|
- offload_optimizer_device: nvme |
|
- offload_optimizer_nvme_path: /nvme |
|
- zero3_save_16bit_model: true |
|
- zero_stage: 2 |
|
- distributed_type: DEEPSPEED |
|
- downcast_bf16: 'no' |
|
- dynamo_config: |
|
- dynamo_backend: INDUCTOR |
|
- dynamo_mode: default |
|
- dynamo_use_dynamic: true |
|
- dynamo_use_fullgraph: true |
|
- enable_cpu_affinity: false |
|
- machine_rank: 0 |
|
- main_training_function: main |
|
- mixed_precision: bf16 |
|
- num_machines: 1 |
|
- num_processes: 2 |
|
- rdzv_backend: static |
|
- same_network: true |
|
- quse_cpu: false |
|
|
|
|
|
## Framework versions |
|
|
|
- Python: 3.12.3 |
|
- PEFT 0.11.1 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.0 |
|
- PyTorch: 2.3.0 |
|
- Accelerate: 0.30.1 |
|
- Datasets: 2.20.0 |
|
- Tokenizers: 0.19.1 |
|
- MTEB: 1.13.0 |