Edit model card

Llama-3.2-1B-ultrachat200k

Model Details

Training Details

Training Hyperparameters

attn_implementation: flash_attention_2
bf16: True
learning_rate: 2e-5
lr_scheduler_type: cosine
per_device_train_batch_size: 2
gradient_accumulation_steps: 16
torch_dtype: bfloat16
num_train_epochs: 1
max_seq_length: 2048
warmup_ratio: 0.1

Results

init_train_loss: 1.726
final_train_loss: 1.22 \

Training script

import multiprocessing

from datasets import load_dataset
from tqdm.rich import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import (
    ModelConfig,
    SFTTrainer,
    get_peft_config,
    get_quantization_config,
    get_kbit_device_map,
    SFTConfig,
    ScriptArguments,
    TrlParser
)

tqdm.pandas()

if __name__ == "__main__":
    parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig))
    args, training_args, model_config = parser.parse_args_and_config()

    quantization_config = get_quantization_config(model_config)
    model_kwargs = dict(
        revision=model_config.model_revision,
        trust_remote_code=model_config.trust_remote_code,
        attn_implementation=model_config.attn_implementation,
        torch_dtype=model_config.torch_dtype,
        use_cache=False if training_args.gradient_checkpointing else True,
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
    )

    model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path,
                                                 **model_kwargs)
    tokenizer = AutoTokenizer.from_pretrained(
        model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, use_fast=True
    )
    tokenizer.pad_token = '<|end_of_text|>'

    train_dataset = load_dataset(args.dataset_name,
                                 split=args.dataset_train_split,
                                 num_proc=multiprocessing.cpu_count())

    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        processing_class=tokenizer,
        peft_config=get_peft_config(model_config),
    )

    trainer.train()

    trainer.save_model(training_args.output_dir)

Test Script

from vllm import LLM
from datasets import load_dataset
from vllm.sampling_params import SamplingParams
from transformers import AutoTokenizer

MODEL_PATH = "autodl-tmp/saves/Llama-3.2-1B-ultrachat200k"

model = LLM(MODEL_PATH,
            tensor_parallel_size=1,
            dtype='bfloat16')
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

input = tokenizer.apply_chat_template([{"role": "user", "content": "Where is Harbin?"}],
                                    tokenize=False,
                                    add_generation_prompt=True)
sampling_params = SamplingParams(max_tokens=1024,
                                 temperature=0.7,
                                 logprobs=1,
                                 stop_token_ids=[tokenizer.eos_token_id])

vllm_generations = model.generate(input,
                                  sampling_params)

print(vllm_generations[0].outputs[0].text)
# print result: Harbin is located in northeastern China in the Heilongjiang province. It is the capital of Heilongjiang province in the Northeast Asia.
Downloads last month
6
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AIR-hl/Llama-3.2-1B-ultrachat200k

Finetuned
(125)
this model

Dataset used to train AIR-hl/Llama-3.2-1B-ultrachat200k