Edit model card

Model Specifications

  • Max Sequence Length: 16384 (with auto support for RoPE Scaling)
  • Data Type: Auto detection, with options for Float16 and Bfloat16
  • Quantization: 4bit, to reduce memory usage

Training Data

Used a private dataset with hundreds of technical tutorials and associated summaries.

Implementation Highlights

  • Efficiency: Emphasis on reducing memory usage and accelerating download speeds through 4bit quantization.
  • Adaptability: Auto detection of data types and support for advanced configuration options like RoPE scaling, LoRA, and gradient checkpointing.

Uploaded Model

  • Developed by: ndebuhr
  • License: apache-2.0
  • Finetuned from model : unsloth/mistral-7b-instruct-v0.2-bnb-4bit

Configuration and Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

input_text = ""

# Set device based on CUDA availability
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model and tokenizer
model_name = "ndebuhr/Mistral-7B-Technical-Tutorial-Summarization-QLoRA"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

instruction = "Clarify and summarize this tutorial transcript"
prompt = """{}

### Raw Transcript:
{}

### Summary:
"""

# Tokenize the input text
inputs = tokenizer(
    prompt.format(instruction, input_text),
    return_tensors="pt",
    truncation=True,
    max_length=16384
).to(device)

# Generate outputs
outputs = model.generate(
    **inputs,
    max_length=16384,
    num_return_sequences=1,
    use_cache=True
)

# Decode the generated text
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

Compute Infrastructure

  • Fine-tuning: used 1xA100 (40GB)
  • Inference: recommend 1xL4 (24GB)

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
44
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ndebuhr/Mistral-7B-Technical-Tutorial-Summarization-QLoRA

Finetuned
(362)
this model