---
language:
- bn
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-8b-bnb-4bit
---
# LLama-3 Bangla LoRA
- **Developed by:** KillerShoaib
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit
- **Datset used for fine-tuning :** iamshnoo/alpaca-cleaned-bengali
# LoRA Adapter
**This is not the entire model, but rather only the LoRA adapter.**
# Llama-3 Bangla Different Formats
- `4-bit quantized(QLoRA)` - [**KillerShoaib/llama-3-8b-bangla-4bit**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-4bit)
- `GGUF q4_k_m` - [**KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M**](https://huggingface.co/KillerShoaib/llama-3-8b-bangla-GGUF-Q4_K_M)
# Model Details
Llama 3 8 billion model was finetuned using **unsloth** package on a **cleaned Bangla alpaca** dataset. The model is finetuned for **2 epoch** on a single T4 GPU.
# Pros & Cons of the Model
## Pros
- **The model can comprehend the Bangla language, including its semantic nuances**
- **Given context model can answer the question based on the context**
## Cons
- **Model is unable to do creative or complex work. i.e: creating a poem or solving a math problem in Bangla**
- **Since the size of the dataset was small, the model lacks lot of general knowledge in Bangla**
# Run The Model
## FastLanguageModel from unsloth for 2x faster inference
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "KillerShoaib/llama-3-8b-bangla-lora",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
# alpaca_prompt for the model
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
# input with instruction and input
inputs = tokenizer(
[
alpaca_prompt.format(
"সুস্থ থাকার তিনটি উপায় বলুন", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
# generating the output and decoding it
outputs = model.generate(**inputs, max_new_tokens = 2048, use_cache = True)
tokenizer.batch_decode(outputs)
```
## AutoModelForPeftCausalLM from Hugginface
```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
"KillerShoaib/llama-3-8b-bangla-lora",
load_in_4bit = True,
)
tokenizer = AutoTokenizer.from_pretrained("KillerShoaib/llama-3-8b-bangla-lora")
alpaca_prompt = """Below is an instruction in bangla that describes a task, paired with an input also in bangla that provides further context. Write a response in bangla that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
alpaca_prompt.format(
"সুস্থ থাকার তিনটি উপায় বলুন", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
tokenizer.batch_decode(outputs)
```
**AutoModelForPeftCausalLM can be hopelessly slow, since `4bit` model downloading is not supported. Use this only if you don't have unsloth installed**
# Inference Script & Github Repo
- `Google Colab` - [**Llama-3 8b Bangla Inference Script**](https://colab.research.google.com/drive/1jZaDmmamOoFiy-ZYRlbfwU0HaP3S48ER?usp=sharing)
- `Github Repo` - [**Llama-3 Bangla**](https://github.com/KillerShoaib/Llama-3-Bangla)