Built with Axolotl

See axolotl config

axolotl version: 0.6.0

# axolotl_config.yaml

# Model configuration
base_model: Qwen/Qwen2.5-Coder-3B-Instruct
hub_model_id: mrcuddle/Qwen2.5-Coder-3B-Instruct-TS

# Training parameters
learning_rate: 0.0001  # Adjusted for potential stability improvement
train_batch_size: 4  # Increased for better gradient estimates
eval_batch_size: 4  # Increased for better evaluation stability
num_epochs: 1
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
gradient_accumulation_steps: 2
micro_batch_size: 1


# Distributed training settings
distributed_type: GPU
num_devices: 2  # Adjusted to utilize multiple GPUs if available
total_train_batch_size: 8  # Adjusted to match train_batch_size * num_devices * gradient_accumulation_steps
total_eval_batch_size: 8  # Adjusted to match eval_batch_size * num_devices * gradient_accumulation_steps

# Random seed for reproducibility
seed: 42

datasets:
  - path: mhhmm/typescript-instruct-20k
    type: alpaca
    field_instruction: instruction
    field_output: output
    format: "[INST] {instruction} [/INST]\n{output}"
    no_input_format: "[INST] {instruction} [/INST]"
    roles:
      input: ["USER"]
      output: ["ASSISTANT"]

Qwen2.5-Coder-3B-Instruct-TS

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-3B-Instruct on the mhhmm/typescript-instruct-20k dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Use adamw_hf with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1

Training results

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
6
Safetensors
Model size
3.09B params
Tensor type
F32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mrcuddle/Typescript-QWen2.5-Coder-3B-Instruct

Base model

Qwen/Qwen2.5-3B
Finetuned
(13)
this model
Quantizations
1 model

Dataset used to train mrcuddle/Typescript-QWen2.5-Coder-3B-Instruct

Collection including mrcuddle/Typescript-QWen2.5-Coder-3B-Instruct