|
--- |
|
license: openrail |
|
language: |
|
- de |
|
metrics: |
|
- f1 |
|
- accuracy |
|
- precision |
|
- recall |
|
pipeline_tag: token-classification |
|
tags: |
|
- recipe |
|
- cooking |
|
- entity_recognition |
|
--- |
|
Weakly supervised token classification model for German recipe texts based on bert-base-german-cased. |
|
|
|
Code available: https://github.com/chefkoch24/weak-ingredient-recognition |
|
|
|
Dataset: https://www.kaggle.com/datasets/sterby/german-recipes-dataset |
|
|
|
Recognizes the following entities: |
|
'O': 0, <br> |
|
'B-INGREDIENT': 1,<br> |
|
'I-INGREDIENT': 2,<br> |
|
'B-UNIT': 3,<br> |
|
'I-UNIT': 4,<br> |
|
'B-QUANTITY': 5,<br> |
|
'I-QUANTITY': 6<br> |
|
|
|
**Training:** |
|
epochs: 2<br> |
|
optimizer: Adam<br> |
|
learning rate: 2e-5<br> |
|
max length: 512<br> |
|
recipes: 7801<br> |
|
|
|
The model was trained on single Geforce RTX2080 with 11GB GPU |
|
|
|
|
|
**Metrics on test set (weakly supervised)**: |
|
accuracy_token 0.9965656995773315<br> |
|
f1_token 0.9965656995773315<br> |
|
precision_token 0.9965656995773315<br> |
|
recall_token 0.9965656995773315<br> |