|
--- |
|
library_name: peft |
|
datasets: |
|
- databricks/databricks-dolly-15k |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# ctrltokyo/llama-2-7b-hf-dolly-flash-attention |
|
|
|
This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the databricks/databricks-dolly-15k dataset with all training performed using Flash Attention 2. |
|
|
|
No further testing or optimisation has been performed. |
|
|
|
## Model description |
|
|
|
Just like [ctrltokyo/llm_prompt_mask_fill_model](https://huggingface.co/ctrltokyo/llm_prompt_mask_fill_model), this model could be used for live autocompletion of PROMPTS, but is more designed for a generalized chatbot (hence the usage of the Dolly 15k dataset). Don't try this on code, because it won't work. |
|
I plan to release a further fine-tuned version using the [code_instructions_120k](https://huggingface.co/datasets/sahil2801/code_instructions_120k) dataset. |
|
|
|
## Intended uses & limitations |
|
|
|
Use as intended. |
|
|
|
## Training and evaluation data |
|
|
|
No evaluation was performed. Trained on NVIDIA A100, but appears to use around 20GB of VRAM when performing inference on the raw model. |
|
|
|
## Training procedure |
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: fp4 |
|
- bnb_4bit_use_double_quant: False |
|
- bnb_4bit_compute_dtype: float32 |
|
### Framework versions |
|
|
|
|
|
- PEFT 0.4.0 |