Intellecta

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset.

Model description

The model is based on LLaMA (Large Language Model Meta AI), a family of state-of-the-art language models developed for natural language understanding and generation. This specific implementation uses the LLaMA 3.2-1B model, which is fine-tuned for general-purpose conversational AI tasks.

Architecture: Transformer-based causal language model. Tokenization: Uses the AutoTokenizer compatible with the LLaMA model, with adjustments to ensure proper padding. Pre-trained Foundation: The model builds on the pre-trained weights of LLaMA, focusing on improving performance for conversational and instruction-based tasks. Implementation: Developed with Hugging Face’s Transformers library for extensibility and ease of use.

Intended uses & limitations

Intended Uses Instruction-following tasks: Can perform tasks such as answering questions, summarizing, and text generation. Conversational agents: Suitable for chatbots and virtual assistants, including those in specialized domains like healthcare or education. Research and Development: Fine-tuning and benchmarking against datasets for downstream tasks.

Training and evaluation data

Datasets Used fka/awesome-chatgpt-prompts: General-purpose instruction-following and conversational dataset based on GPT-like interactions. BAAI/Infinity-Instruct (3M): A large instruction dataset containing a wide variety of tasks and instructions. allenai/WildChat-1M: Focused on open-ended conversational data. lavita/ChatDoctor-HealthCareMagic-100k: Healthcare-specific dataset for medical conversational agents. zjunlp/Mol-Instructions: Molecular biology-related instructions. garage-bAInd/Open-Platypus: Dataset aimed at general-purpose, open-domain reasoning. Data Preprocessing Text prompts and responses are tokenized with padding and truncation. Labels are derived from input tokens, masking padding tokens with -100 to exclude them from loss computation.

Training procedure

The training procedure for the model fine-tunes the pre-trained LLaMA 3.2-1B model on various datasets with a focus on instruction-following and conversational tasks. Below are the key aspects of the training process:

Preprocessing Tokenization:

The input prompts and their responses are tokenized using the AutoTokenizer configured for LLaMA. Special considerations: Padding tokens are explicitly handled using the pad_token (set to the eos_token if undefined). Inputs are truncated to a maximum length of 512 tokens to fit model constraints. Label Preparation:

Input IDs are cloned to create labels for supervised learning. Padding tokens in labels are masked with -100 to ensure they are ignored during loss computation. Dataset Mapping:

Each dataset's prompt field is tokenized and reformatted into the model’s required input-output structure. Non-standard datasets without a prompt column are skipped to avoid errors.

Model Setup Pre-trained Model:

The base model, meta-llama/Llama-3.2-1B, is loaded with pre-trained weights. It is fine-tuned for causal language modeling, focusing on instruction-based outputs. Tokenizer Setup:

The tokenizer ensures consistency in encoding and decoding for the model. Padding is fixed (using eos_token as a fallback).

Training Configuration TrainingArguments:

The Hugging Face TrainingArguments object is used to configure the training process: Output Directory: llama_output stores the model checkpoints and logs. Epochs: 4 epochs for a balance between training time and generalization. Batch Size: 4 examples per device to handle memory constraints. Gradient Accumulation: 4 steps to simulate a larger effective batch size. Learning Rate: 1e-4 with a warmup phase of 500 steps for stable optimization. Weight Decay: 0.01 to mitigate overfitting. Mixed Precision: FP16 (half-precision) is used for faster training and reduced memory usage. Logging Steps: Logs are generated every 10 steps to monitor training progress. Checkpointing: Model checkpoints are saved at the end of each epoch. Push to Hub: The fine-tuned model is uploaded to Hugging Face’s Hub (kssrikar4/Intellecta). Data Collator:

The DataCollatorForSeq2Seq ensures that batches are dynamically padded for efficiency during training.

Fine-Tuning Process Trainer:

The Hugging Face Trainer class orchestrates the training process, combining the model, data, and training configuration. Loss is computed for each batch using the model's outputs (e.g., logits) and the prepared labels. The optimizer and learning rate scheduler are managed internally by the Trainer. Training Loop:

During each epoch: The model processes batches of tokenized prompts and computes the causal language modeling (CLM) loss. Gradients are accumulated over multiple steps to simulate a larger batch size. Optimizer updates are applied after gradient accumulation. Validation:

While validation data is not explicitly defined in the code, the Trainer supports evaluation if an eval_dataset is provided. Saving checkpoints at each epoch allows model evaluation post-training. 5. Post-Training Push to Hub:

The trained model, along with its tokenizer and configuration, is pushed to the Hugging Face Hub under the ID kssrikar4/Intellecta. Usage:

The fine-tuned model can be downloaded and directly used for inference or further fine-tuning.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 4
mixed_precision_training: Native AMP

Training results

Framework versions

Transformers 4.48.0
Pytorch 2.5.1+cpu
Datasets 3.2.0
Tokenizers 0.21.0

kssrikar4
/

Intellecta