Model Card

Model Details

Model Type: Fine-tuned LLaMA3-8B-Instruct
Task: German text classification by CEFR level
Base Model: meta-llama/Meta-Llama-3-8B-Instruct
Training Approach: Supervised Fine-Tuning (SFT)
Framework: Transformers (Hugging Face)
Fine-tuning Method: Low-Rank Adaptation (LoRA)
License: CC-BY-SA-4

For more details regarding prompting etc..., refer to my bachelor thesis.

Intended Use

This model is designed to classify German texts according to the Common European Framework of Reference for Languages (CEFR) levels (A1, A2, B1, B2, C1, C2). It can be used for:

Automated assessment of German language proficiency
Placement testing in language learning environments
Research in computational linguistics and language education

Training Data

The model was fine-tuned on a dataset of approximately 1,500 German texts across all six CEFR levels. The dataset includes:

Texts from the FALKO Corpus
Texts from the MERLIN Corpus
Synthetically generated A1 level texts

The dataset distribution is as follows:

A1: 179 samples
A2: 306 samples
B1: 331 samples
B2: 376 samples
C1: 179 samples
C2: 196 samples

Training Procedure

Fine-tuning Method: Low-Rank Adaptation (LoRA) with a rank of 64
Optimizer: AdamW (8-bit variant)
Learning Rate: 2e-4
Number of Epochs: 5
Batch Size: 1
Max Sequence Length: 4096 tokens
Hardware: NVIDIA RTX A6000 GPU

Evaluation Results

The model achieved the following performance on the test set:

Accuracy: 77.3%
Group Accuracy: 100%
Weighted F1 Score: 0.7686

Performance varies across CEFR levels:

A1: F1 score of 0.8571
A2: F1 score of 0.7347
B1: F1 score of 0.7778
B2: F1 score of 0.6809
C1: F1 score of 0.7241
C2: F1 score of 0.8372

Limitations and Biases

The model may have biases due to the imbalanced distribution of the training data across CEFR levels.
Performance is weaker on intermediate levels (especially B2) compared to extreme levels (A1 and C2).
The model's performance may not generalize well to texts from domains or styles not represented in the training data.
As with all language models, it may reflect biases present in the training data.

Ethical Considerations

The model should not be used as the sole determinant of a person's language proficiency level, especially in high-stakes situations.
Users should be aware of potential biases and limitations when interpreting the model's outputs.
Care should be taken to ensure the model is not used in ways that could unfairly disadvantage language learners or perpetuate linguistic biases.

Citation and Contact

If you use this model in your research, please contect me regarding citation.

For questions or feedback, please contact me at elias@ahlers.click

EliasAhl
/

llama-3-8b-Instruct-cefr-tuned-v2