Alpesteibock-Llama-3-8B-Alpha

Alpesteibock-Llama-3-8B-Alpha is an experimental QLoRA fine-tune of NousResearch/Hermes-2-Pro-Llama-3-8B on a dataset of 34.7 million tokens of Swiss German text from multiple sources for two epochs.

License

This model is released under the Llama 3 Community License.

Usage

The model uses ChatML as an instruction template and was trained using "You are Alpesteibock, a helpful assistant who speaks Swiss German." as a system message:

<|im_start|>system
You are Alpesteibock, a helpful assistant who speaks Swiss German.<|im_end|>
<|im_start|>user
Hoi. Wie heissisch du?<|im_end|>
<|im_start|>assistant
Ich bi de Alpesteibock und ich freu mi uf di.<|im_end|>

Dataset

The dataset used for training consists of the following sources:

Dataset File Size Description Phase
Glot500 Corpus (gsw_Latn, Leipzig_web) 21.7 MB Text, usually sentences, crawled from the web 1
Alemannic Wikipedia (Subset) 50.5 MB Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out 2
Schweizerdeutscher Mundartkorpus (Copyright Free Subset) 28.4 MB Copyright free books written in Swiss German 2
GlotCC-V1.0 (gsw-Latn) 7.5 MB Document-level general domain monolingual dataset derived from CommonCrawl 2
Synthetic Instruction Data 1.7 MB Different datasets of synthetically generated Swiss German text 2

Training Details

Hardware: 1x RTX 4090
Duration: 40 hours in total (2 hours for first phase and 38 hours for second phase)

Hyperparameters

Adapter: QLoRA
Precision: 4-bit
Optimizer: adamw_bnb_8bit
LoRA Rank: 256
LoRA Alpha: 256
Learning Rate: 1e-5
Scheduler: Cosine
Context Length: 4096
Batch Size: 1
Gradient Accumulation Steps: 1
Sample Packing: On for first phase, Off for second phase
Epochs: 2

Downloads last month
21
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kaizuberbuehler/Alpesteibock-Llama-3-8B-Alpha

Finetuned
(74)
this model
Quantizations
1 model

Datasets used to train kaizuberbuehler/Alpesteibock-Llama-3-8B-Alpha