DistilBERT Goodreads Genre Classifier

A fine-tuned DistilBERT (distilbert-base-cased) model for classifying Goodreads book reviews into 8 genres.

Model Details

  • Developed by: Prakhar
  • Model type: Transformer (DistilBERT) for Sequence Classification
  • Language: English
  • Finetuned from: distilbert-base-cased
  • Number of labels: 8

Genres (Labels)

ID Genre
0 children
1 comics_graphic
2 fantasy_paranormal
3 history_biography
4 mystery_thriller_crime
5 poetry
6 romance
7 young_adult

Training Details

Training Data

  • Source: UCSD Goodreads Book Graph
  • Size: 6,400 training samples (800 per genre), 1,600 test samples (200 per genre)
  • Preprocessing: Tokenized with DistilBertTokenizerFast, max_length=512

Training Hyperparameters

  • Epochs: 3
  • Batch size: 2 (effective 10 with gradient accumulation of 5)
  • Learning rate: 5e-5
  • Weight decay: 0.01
  • Warmup steps: 100
  • Training regime: fp16 mixed precision
  • Training time: ~18 minutes on NVIDIA GPU

Evaluation Results

Metric Score
Accuracy 0.6081
Weighted F1 0.6054
Weighted Precision 0.6034
Loss 1.2691

Per-Genre Performance

Genre Precision Recall F1-Score
children 0.62 0.66 0.64
comics_graphic 0.76 0.77 0.77
fantasy_paranormal 0.43 0.43 0.43
history_biography 0.59 0.60 0.60
mystery_thriller_crime 0.60 0.62 0.61
poetry 0.78 0.81 0.79
romance 0.62 0.61 0.61
young_adult 0.43 0.36 0.39

How to Use

from transformers import pipeline

classifier = pipeline("text-classification", model="Prakhar54-byte/distilbert-goodreads-genre-classifier")
result = classifier("A thrilling mystery novel with unexpected twists")
print(result)

Environmental Impact

  • Hardware Type: NVIDIA GPU (3.68 GiB VRAM)
  • Hours used: ~0.3 hours
  • Carbon Emitted: Minimal (local training)
Downloads last month
1
Safetensors
Model size
65.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support