File size: 1,721 Bytes

715c3cc



# Custom BERT Model for Text Classification

## Model Description

This is a custom BERT model fine-tuned for text classification. The model was trained using a subset of a publicly available dataset and is capable of classifying text into 3 classes.

## Training Details

- **Architecture**: BERT Base Multilingual Cased
- **Training data**: Custom dataset
- **Preprocessing**: Tokenized using BERT's tokenizer, with a max sequence length of 80.
- **Fine-tuning**: The model was trained for 1 epoch with a learning rate of 2e-5, using AdamW optimizer and Cross-Entropy Loss.
- **Evaluation Metrics**: Accuracy on a held-out validation set.
  
## How to Use

### Dependencies
- Transformers 4.x
- Torch 1.x

### Code Snippet

For classification:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("billfass/my_bert_model")
model = AutoModelForSequenceClassification.from_pretrained("billfass/my_bert_model")

text = "Your example text here."

inputs = tokenizer(text, padding=True, truncation=True, max_length=80, return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1

outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits

# To get probabilities:
probs = torch.softmax(logits, dim=-1)
```

## Limitations and Bias

- Trained on a specific dataset, so may not generalize well to other kinds of text.
- Uses multilingual cased BERT, so it's not optimized for any specific language.

## Authors

- **Fassinou Bile**
- **billfass2010@gmail.com**
  
## Acknowledgments

Special thanks to Hugging Face for providing the Transformers library that made this project possible.

---