Model Summary
Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our Github repo.
- Base Model: Meta-Llama-3-8B-Instruct
- SFT Dataset: KaifengGGG/WenYanWen_English_Parallel
- Fine-tune Method: QLoRA
Version
Usage
Fine-tuning Details
Below are detailed descriptions of the various parameters and technologies used.
LoRA Parameters
- lora_r: 64
- lora_alpha: 16
- lora_dropout: 0.1
Quantization
The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:
- use_4bit:
True
- Enables the use of 4-bit quantization. - bnb_4bit_compute_dtype: "float16" - The datatype used for computation in quantized state.
- bnb_4bit_quant_type: "nf4" - Specifies the quantization type.
- use_nested_quant:
False
- Nested quantization is not used.
Training Arguments
Settings for training the model are as follows:
- num_train_epochs: 10
- fp16:
False
- bf16:
True
- Optimized for use with A100 GPUs, employing Brain Floating Point (bf16). - per_device_train_batch_size: 2
- per_device_eval_batch_size: 2
- gradient_accumulation_steps: 4
- gradient_checkpointing:
True
- max_grad_norm: 0.3
- learning_rate: 0.0002
- weight_decay: 0.001
- optim: "paged_adamw_32bit"
- lr_scheduler_type: "cosine"
- max_steps: -1
- warmup_ratio: 0.03
- group_by_length:
True
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.