DistilBERT Goodreads Genre Classifier

A fine-tuned DistilBERT (distilbert-base-cased) model for classifying Goodreads book reviews into 8 genres.

Model Details

Developed by: Prakhar
Model type: Transformer (DistilBERT) for Sequence Classification
Language: English
Finetuned from: distilbert-base-cased
Number of labels: 8

Genres (Labels)

ID	Genre
0	children
1	comics_graphic
2	fantasy_paranormal
3	history_biography
4	mystery_thriller_crime
5	poetry
6	romance
7	young_adult

Training Details

Training Data

Source: UCSD Goodreads Book Graph
Size: 6,400 training samples (800 per genre), 1,600 test samples (200 per genre)
Preprocessing: Tokenized with DistilBertTokenizerFast, max_length=512

Training Hyperparameters

Epochs: 3
Batch size: 2 (effective 10 with gradient accumulation of 5)
Learning rate: 5e-5
Weight decay: 0.01
Warmup steps: 100
Training regime: fp16 mixed precision
Training time: ~18 minutes on NVIDIA GPU

Evaluation Results

Metric	Score
Accuracy	0.6081
Weighted F1	0.6054
Weighted Precision	0.6034
Loss	1.2691

Per-Genre Performance

Genre	Precision	Recall	F1-Score
children	0.62	0.66	0.64
comics_graphic	0.76	0.77	0.77
fantasy_paranormal	0.43	0.43	0.43
history_biography	0.59	0.60	0.60
mystery_thriller_crime	0.60	0.62	0.61
poetry	0.78	0.81	0.79
romance	0.62	0.61	0.61
young_adult	0.43	0.36	0.39

How to Use

from transformers import pipeline

classifier = pipeline("text-classification", model="Prakhar54-byte/distilbert-goodreads-genre-classifier")
result = classifier("A thrilling mystery novel with unexpected twists")
print(result)

Environmental Impact

Hardware Type: NVIDIA GPU (3.68 GiB VRAM)
Hours used: ~0.3 hours
Carbon Emitted: Minimal (local training)

Downloads last month: 1

Safetensors

Model size

65.8M params

Tensor type

F32