kaizuberbuehler
/

Alpesteibock-Llama-3-8B-Alpha

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Alpesteibock-Llama-3-8B-Alpha / README.md

kaizuberbuehler's picture

kaizuberbuehler

Update README.md

cea56e4 verified 7 months ago

|

1.59 kB

	---
	license: llama3
	language:
	- gsw
	datasets:
	- cis-lmu/GlotCC-V1
	pipeline_tag: text-generation
	base_model: NousResearch/Hermes-2-Pro-Llama-3-8B
	---

	# Alpesteibock-Llama-3-8B-Alpha

	Alpesteibock-Llama-3-8B-Alpha is an experimental QLoRA fine-tune of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on a dataset of more than 28 million tokens of Swiss German text from multiple sources.

	## License

	This model is release under the [Llama 3 Community License](https://llama.meta.com/llama3/license/).

	## Dataset

	\| Dataset \| File Size \| Description \| Phase \|
	\|---------\|-----------\|-------------\|-------\|
	\| [Alemannic Wikipedia](https://dumps.wikimedia.org/alswiki/) (Subset) \| 50.5 MB \| Articles in the Alemannic Wikipedia with most of those written in Alsatian filtered out \| 2 \|
	\| [Schweizerdeutscher Mundartkorpus](https://chmk.ch/) (Copyright Free Subset) \| 28.4 MB \| Copyright free books written in Swiss German \| 2 \|
	\| [GlotCC-V1.0](https://huggingface.co/datasets/cis-lmu/GlotCC-V1) (gsw-Latn) \| 7.5 MB \| Document-level general domain monolingual dataset derived from CommonCrawl \| 2 \|

	## Training Details

	Hardware: 1x RTX 4090
	Duration: 40 hours in total (2 hours for first phase and 38 hours for second phase)

	### Hyperparameters

	Adapter: QLoRA
	Precision: 4-bit
	Optimizer: adamw_bnb_8bit
	LoRA Rank: 256
	LoRA Alpha: 256
	Learning Rate: 1e-5
	Scheduler: Cosine
	Context Length: 4096
	Batch Size: 1
	Gradient Accumulation Steps: 1
	Sample Packing: On for first phase, Off for second phase
	Epochs: 2