|
|
|
|
|
# Custom BERT Model for Text Classification |
|
|
|
## Model Description |
|
|
|
This is a custom BERT model fine-tuned for text classification. The model was trained using a subset of a publicly available dataset and is capable of classifying text into 3 classes. |
|
|
|
## Training Details |
|
|
|
- **Architecture**: BERT Base Multilingual Cased |
|
- **Training data**: Custom dataset |
|
- **Preprocessing**: Tokenized using BERT's tokenizer, with a max sequence length of 80. |
|
- **Fine-tuning**: The model was trained for 1 epoch with a learning rate of 2e-5, using AdamW optimizer and Cross-Entropy Loss. |
|
- **Evaluation Metrics**: Accuracy on a held-out validation set. |
|
|
|
## How to Use |
|
|
|
### Dependencies |
|
- Transformers 4.x |
|
- Torch 1.x |
|
|
|
### Code Snippet |
|
|
|
For classification: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("billfass/my_bert_model") |
|
model = AutoModelForSequenceClassification.from_pretrained("billfass/my_bert_model") |
|
|
|
text = "Your example text here." |
|
|
|
inputs = tokenizer(text, padding=True, truncation=True, max_length=80, return_tensors="pt") |
|
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 |
|
|
|
outputs = model(**inputs, labels=labels) |
|
loss = outputs.loss |
|
logits = outputs.logits |
|
|
|
# To get probabilities: |
|
probs = torch.softmax(logits, dim=-1) |
|
``` |
|
|
|
## Limitations and Bias |
|
|
|
- Trained on a specific dataset, so may not generalize well to other kinds of text. |
|
- Uses multilingual cased BERT, so it's not optimized for any specific language. |
|
|
|
## Authors |
|
|
|
- **Fassinou Bile** |
|
- **billfass2010@gmail.com** |
|
|
|
## Acknowledgments |
|
|
|
Special thanks to Hugging Face for providing the Transformers library that made this project possible. |
|
|
|
--- |
|
|