T-lite-it-1.0_Q4_0
T-lite-it-1.0_Q4_0 is a quantized version of the T-lite-it-1.0 model, originally based on the Qwen 2.5 7B architecture and fine-tuned for Russian-language tasks. This version is optimized for memory-constrained environments, making it suitable for fine-tuning and inference on GPUs with as little as 8GB VRAM. The quantization was performed using BitsAndBytes, reducing the model to 4-bit precision.
Model Description
- Language: Russian
- Base Model: T-Lite-IT-1.0 (derived from Qwen 2.5 7B)
- Quantization: 4-bit precision using
BitsAndBytes
- Tasks: Text generation, conversation, question answering, and chain-of-thought reasoning
- Fine-Tuning Ready: Ideal for further fine-tuning in low-resource environments.
- VRAM Requirements: Fine-tuning and inference possible with 8GB VRAM or more.
Usage
To load the model, ensure you have the required dependencies installed:
pip install transformers bitsandbytes
Then, load the model with the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MilyaShams/T-lite-it-1.0_Q4_0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
device_map="auto"
)
Fine-Tuning
The model is designed for fine-tuning with resource constraints. Use tools like Hugging Face's Trainer
or peft
(Parameter-Efficient Fine-Tuning) to adapt the model to specific tasks.
Example configuration for fine-tuning:
- Batch Size: Adjust to fit within 8GB VRAM (e.g., batch_size=2).
- Gradient Accumulation: Use to simulate larger batch sizes.
Model Card Authors
- Downloads last month
- 407
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for MilyaShams/T-lite-it-1.0_Q4_0
Base model
t-tech/T-lite-it-1.0