Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: Qwen/Qwen3-0.6B

# Automatically upload checkpoint and final model to HF
hub_model_id: abdullahmeda/listwise-rerank-qwen3-600m-ds1-9fh4jd8e6

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: qwen3
datasets:
  - path: kaggle-map/listwise-rerank
    type: chat_template
    split: train

test_datasets:
  - path: kaggle-map/listwise-rerank
    type: chat_template
    split: val

streaming: true

dataset_processes: 32
dataset_prepared_path: last_run_prepared
output_dir: ./outputs/listwise-rerank-qwen3-600m-ds1-9fh4jd8e6

sequence_len: 1280
sample_packing: true
eval_sample_packing: false

deepspeed: deepspeed_configs/zero1.json

wandb_project: map-math-misconceptions
wandb_entity:
wandb_watch:
wandb_name: listwise-rerank-qwen3-600m-ds1-9fh4jd8e6
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 5e-6

bf16: true
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 10
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 21
saves_per_epoch: 21
weight_decay: 0.01

save_first_step: true

listwise-rerank-qwen3-600m-ds1-9fh4jd8e6

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the kaggle-map/listwise-rerank dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1884
  • Memory/max Mem Active(gib): 43.13
  • Memory/max Mem Allocated(gib): 43.13
  • Memory/device Mem Reserved(gib): 50.69

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 24
  • training_steps: 248

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 10.3241 31.76 31.76 32.02
7.3879 0.0483 12 0.9112 43.13 43.13 50.69
0.716 0.0967 24 0.4508 43.13 43.13 50.69
0.4393 0.1450 36 0.3112 43.13 43.13 50.69
0.3042 0.1934 48 0.2664 43.13 43.13 50.69
0.2363 0.2417 60 0.2359 43.13 43.13 50.69
0.1976 0.2900 72 0.2041 43.13 43.13 50.69
0.1743 0.3384 84 0.1951 43.13 43.13 50.69
0.161 0.3867 96 0.1836 43.13 43.13 50.69
0.1528 0.4350 108 0.1788 43.13 43.13 50.69
0.1367 0.4834 120 0.1721 43.13 43.13 50.69
0.1169 0.5317 132 0.1740 43.13 43.13 50.69
0.1136 0.5801 144 0.1701 43.13 43.13 50.69
0.1066 0.6284 156 0.1699 43.13 43.13 50.69
0.1079 0.6767 168 0.1811 43.13 43.13 50.69
0.0897 0.7251 180 0.1827 43.13 43.13 50.69
0.0883 0.7734 192 0.1869 43.13 43.13 50.69
0.0818 0.8218 204 0.1847 43.13 43.13 50.69
0.0807 0.8701 216 0.1859 43.13 43.13 50.69
0.0764 0.9184 228 0.1873 43.13 43.13 50.69
0.0722 0.9668 240 0.1884 43.13 43.13 50.69

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.6.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
10
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdullahmeda/listwise-rerank-qwen3-600m-ds1-9fh4jd8e6

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(468)
this model

Dataset used to train abdullahmeda/listwise-rerank-qwen3-600m-ds1-9fh4jd8e6

Evaluation results