File size: 2,072 Bytes
b471491 77696b4 b471491 77696b4 b471491 77696b4 b471491 77696b4 b471491 161bb66 77696b4 b471491 e6aaa22 b471491 77696b4 b471491 e6aaa22 b471491 e6aaa22 b471491 77696b4 e6aaa22 77696b4 e6aaa22 77696b4 b471491 e6aaa22 b471491 77696b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
datasets:
- Salesforce/wikitext
language:
- en
- ja
---
# Model Info
This is a model that applies LLM2Vec to Llama2. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Model type:** PEFT
- **Language(s) (NLP):** Japanese
- **License:** Apache2.0
- **Finetuned from model:** [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
## Sources
- **Repository:** https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961
# Usage
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
# Training Details
## Training Data
- [wikitext](https://huggingface.co/datasets/Salesforce/wikitext)
## Training Hyperparameter
- batch_size: 64,
- gradient_accumulation_steps: 1
- max_seq_length": 512,
- mask_token_type: "blank"
- mlm_probability: 0.2
- lora_r: 16
- torch_dtype "bfloat16"
- attn_implementation "flash_attention_2"
- bf16: true
- gradient_checkpointing: true
## Accelerator Settings
- deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false
## Framework versions
- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0 |