Edit model card

Model Card

Model Details

  • Model Type: Fine-tuned LLaMA3-8B-Instruct
  • Task: German text classification by CEFR level
  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct
  • Training Approach: Supervised Fine-Tuning (SFT)
  • Framework: Transformers (Hugging Face)
  • Fine-tuning Method: Low-Rank Adaptation (LoRA)
  • License: CC-BY-SA-4

For more details regarding prompting etc..., refer to my bachelor thesis.

Intended Use

This model is designed to classify German texts according to the Common European Framework of Reference for Languages (CEFR) levels (A1, A2, B1, B2, C1, C2). It can be used for:

  • Automated assessment of German language proficiency
  • Placement testing in language learning environments
  • Research in computational linguistics and language education

Training Data

The model was fine-tuned on a dataset of approximately 1,500 German texts across all six CEFR levels. The dataset includes:

The dataset distribution is as follows:

  • A1: 179 samples
  • A2: 306 samples
  • B1: 331 samples
  • B2: 376 samples
  • C1: 179 samples
  • C2: 196 samples

Training Procedure

  • Fine-tuning Method: Low-Rank Adaptation (LoRA) with a rank of 64
  • Optimizer: AdamW (8-bit variant)
  • Learning Rate: 2e-4
  • Number of Epochs: 5
  • Batch Size: 1
  • Max Sequence Length: 4096 tokens
  • Hardware: NVIDIA RTX A6000 GPU

Evaluation Results

The model achieved the following performance on the test set:

  • Accuracy: 77.3%
  • Group Accuracy: 100%
  • Weighted F1 Score: 0.7686

Performance varies across CEFR levels:

  • A1: F1 score of 0.8571
  • A2: F1 score of 0.7347
  • B1: F1 score of 0.7778
  • B2: F1 score of 0.6809
  • C1: F1 score of 0.7241
  • C2: F1 score of 0.8372

Limitations and Biases

  • The model may have biases due to the imbalanced distribution of the training data across CEFR levels.
  • Performance is weaker on intermediate levels (especially B2) compared to extreme levels (A1 and C2).
  • The model's performance may not generalize well to texts from domains or styles not represented in the training data.
  • As with all language models, it may reflect biases present in the training data.

Ethical Considerations

  • The model should not be used as the sole determinant of a person's language proficiency level, especially in high-stakes situations.
  • Users should be aware of potential biases and limitations when interpreting the model's outputs.
  • Care should be taken to ensure the model is not used in ways that could unfairly disadvantage language learners or perpetuate linguistic biases.

Citation and Contact

If you use this model in your research, please contect me regarding citation.

For questions or feedback, please contact me at elias@ahlers.click

Downloads last month
18
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for EliasAhl/llama-3-8b-Instruct-cefr-tuned-v2

Finetuned
(441)
this model

Dataset used to train EliasAhl/llama-3-8b-Instruct-cefr-tuned-v2