uzabase
/

LLM2Vec-Llama-2-7b-hf-mntp

Model card Files Files and versions Community

LLM2Vec-Llama-2-7b-hf-mntp / README.md

h-iida's picture

Update README.md

e6aaa22 verified 3 months ago

|

history blame contribute delete

2.07 kB

	---
	base_model: meta-llama/Llama-2-7b-hf
	library_name: peft
	license: apache-2.0
	datasets:
	- Salesforce/wikitext
	language:
	- en
	- ja
	---

	# Model Info

	This is a model that applies LLM2Vec to Llama2. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Model type: PEFT
	- Language(s) (NLP): Japanese
	- License: Apache2.0
	- Finetuned from model: [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)

	## Sources
	- Repository: https://github.com/McGill-NLP/llm2vec
	- Paper: https://arxiv.org/abs/2404.05961

	# Usage

	- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)

	# Training Details

	## Training Data

	- [wikitext](https://huggingface.co/datasets/Salesforce/wikitext)


	## Training Hyperparameter
	- batch_size: 64,
	- gradient_accumulation_steps: 1
	- max_seq_length": 512,
	- mask_token_type: "blank"
	- mlm_probability: 0.2
	- lora_r: 16
	- torch_dtype "bfloat16"
	- attn_implementation "flash_attention_2"
	- bf16: true
	- gradient_checkpointing: true

	## Accelerator Settings
	- deepspeed_config:
	- gradient_accumulation_steps: 1
	- gradient_clipping: 1.0
	- offload_optimizer_device: nvme
	- offload_optimizer_nvme_path: /nvme
	- zero3_save_16bit_model: true
	- zero_stage: 2
	- distributed_type: DEEPSPEED
	- downcast_bf16: 'no'
	- dynamo_config:
	- dynamo_backend: INDUCTOR
	- dynamo_mode: default
	- dynamo_use_dynamic: true
	- dynamo_use_fullgraph: true
	- enable_cpu_affinity: false
	- machine_rank: 0
	- main_training_function: main
	- mixed_precision: bf16
	- num_machines: 1
	- num_processes: 2
	- rdzv_backend: static
	- same_network: true
	- quse_cpu: false


	## Framework versions

	- Python: 3.12.3
	- PEFT 0.11.1
	- Sentence Transformers: 3.0.1
	- Transformers: 4.41.0
	- PyTorch: 2.3.0
	- Accelerate: 0.30.1
	- Datasets: 2.20.0
	- Tokenizers: 0.19.1
	- MTEB: 1.13.0