Edit model card

Model Details

Haaaaarsh/Lora 🤗 is a fine-tuned version of the "unsloth/llama-3-8b-bnb-4bit" language model.

Model Description

Haaaaarsh/Lora 🤗 is optimized using parameter-efficient fine-tuning (PEFT) techniques. This model is designed to generate coherent and contextually relevant text based on given instructions and inputs. It leverages advanced quantization methods to reduce memory usage, making it accessible for broader applications, including those with limited computational resources.

  • Developed by: Harsh Bande
  • Funded by [optional]:
  • Shared by [optional]:
  • Model type:
  • Language(s) (NLP):
  • License:
  • Finetuned from model [optional]: unsloth/llama-3-8b-bnb-4bit

Model Architecture

  • Base Model: unsloth/llama-3-8b-bnb-4bit
  • Parameter Efficient Fine-Tuning (PEFT): LoRA (Low-Rank Adaptation)
  • Quantization: 4-bit

Uses

  • Instruction Following: Generate responses to specific instructions with context.
  • Text Continuation: Continue sequences of text in a coherent manner.
  • Creative Writing: Assist in generating creative content based on given prompts.
  • Educational Tools: Provide answers and explanations to educational queries.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer from Hugging Face Hub
model_name = "Haaaaarsh/Lora"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define a prompt for the model
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

# Tokenize the prompt and generate a response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64)

# Decode the generated tokens
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:", generated_text)

Training Details

  • Max Sequence Length: 2048 tokens
  • Learning Rate: 2e-4
  • Batch Size: 2 (with gradient accumulation steps of 4)
  • Optimizer: AdamW with 8-bit optimization
  • Training Steps: 60
  • Mixed Precision: FP16/BF16 based on hardware support

Bias, Risks, and Limitations

  • Biases: Like all language models, this model may reflect biases present in the training data.
  • Context Length: The model can handle up to 2048 tokens per input sequence.

Citation

@misc{Haaaaarsh2024Lora,
  author = {Harsh Bande},
  title = {Haaaaarsh/Lora: A Fine-Tuned 4-bit Quantized Language Model},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Haaaaarsh/Lora}},
}
Downloads last month
2
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train Haaaaarsh/Lora