|
--- |
|
license: mit |
|
language: |
|
- el |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# GreekDeBERTa-base |
|
|
|
**GreekDeBERTa-base** is a language model specifically pre-trained for Greek Natural Language Processing (NLP) tasks. It is based on the DeBERTa architecture and is pre-trained on Masked Language Modeling (MLM). |
|
|
|
## Model Details |
|
|
|
- **Model Architecture**: DeBERTa-base |
|
- **Language**: Greek |
|
- **Pre-training Objectives**: - Masked Language Modeling (MLM) |
|
- **Tokenizer**: SentencePiece Model (`spm.model`) |
|
|
|
## Model Files |
|
|
|
The following files are included in the repository: |
|
|
|
- `config.json`: The model configuration file used by the DeBERTa-base architecture. |
|
- `pytorch_model.bin`: The pre-trained model weights in PyTorch format. |
|
- `spm.model`: The SentencePiece model file used for tokenization. |
|
- `vocab.txt`: A human-readable vocabulary file that contains the list of tokens used by the model. |
|
- `tokenizer_config.json`: Configuration file for the tokenizer. |
|
|
|
## How to Use |
|
|
|
You can easily load and use the model in Python with the Hugging Face `transformers` library. Below is an example to get started with token classification: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("AI-team-UoA/GreekDeBERTa-base") |
|
model = AutoModelForTokenClassification.from_pretrained("AI-team-UoA/GreekDeBERTa-base") |