Text Classification
Transformers
Safetensors
English
distilbert
goodreads
genre-classification
fine-tuned
text-embeddings-inference
Instructions to use Prakhar54-byte/distilbert-goodreads-genre-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Prakhar54-byte/distilbert-goodreads-genre-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Prakhar54-byte/distilbert-goodreads-genre-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Prakhar54-byte/distilbert-goodreads-genre-classifier") model = AutoModelForSequenceClassification.from_pretrained("Prakhar54-byte/distilbert-goodreads-genre-classifier") - Notebooks
- Google Colab
- Kaggle
DistilBERT Goodreads Genre Classifier
A fine-tuned DistilBERT (distilbert-base-cased) model for classifying Goodreads book reviews into 8 genres.
Model Details
- Developed by: Prakhar
- Model type: Transformer (DistilBERT) for Sequence Classification
- Language: English
- Finetuned from:
distilbert-base-cased - Number of labels: 8
Genres (Labels)
| ID | Genre |
|---|---|
| 0 | children |
| 1 | comics_graphic |
| 2 | fantasy_paranormal |
| 3 | history_biography |
| 4 | mystery_thriller_crime |
| 5 | poetry |
| 6 | romance |
| 7 | young_adult |
Training Details
Training Data
- Source: UCSD Goodreads Book Graph
- Size: 6,400 training samples (800 per genre), 1,600 test samples (200 per genre)
- Preprocessing: Tokenized with DistilBertTokenizerFast, max_length=512
Training Hyperparameters
- Epochs: 3
- Batch size: 2 (effective 10 with gradient accumulation of 5)
- Learning rate: 5e-5
- Weight decay: 0.01
- Warmup steps: 100
- Training regime: fp16 mixed precision
- Training time: ~18 minutes on NVIDIA GPU
Evaluation Results
| Metric | Score |
|---|---|
| Accuracy | 0.6081 |
| Weighted F1 | 0.6054 |
| Weighted Precision | 0.6034 |
| Loss | 1.2691 |
Per-Genre Performance
| Genre | Precision | Recall | F1-Score |
|---|---|---|---|
| children | 0.62 | 0.66 | 0.64 |
| comics_graphic | 0.76 | 0.77 | 0.77 |
| fantasy_paranormal | 0.43 | 0.43 | 0.43 |
| history_biography | 0.59 | 0.60 | 0.60 |
| mystery_thriller_crime | 0.60 | 0.62 | 0.61 |
| poetry | 0.78 | 0.81 | 0.79 |
| romance | 0.62 | 0.61 | 0.61 |
| young_adult | 0.43 | 0.36 | 0.39 |
How to Use
from transformers import pipeline
classifier = pipeline("text-classification", model="Prakhar54-byte/distilbert-goodreads-genre-classifier")
result = classifier("A thrilling mystery novel with unexpected twists")
print(result)
Environmental Impact
- Hardware Type: NVIDIA GPU (3.68 GiB VRAM)
- Hours used: ~0.3 hours
- Carbon Emitted: Minimal (local training)
- Downloads last month
- 1