Cheese Texture Classifier (DistilBERT)

Model Creator: Rumi Loghmani (@rlogh)
Original Dataset: aslan-ng/cheese-text (by Aslan Noorghasemi)

This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT.

Model Description

Architecture: DistilBERT-base-uncased fine-tuned for sequence classification
Task: 4-class texture classification (hard, semi-hard, semi-soft, soft)
Input: Cheese description text (up to 512 tokens)
Output: 4-class probability distribution

Training Details

Data

Dataset: aslan-ng/cheese-text (original split: 100 samples)
Train/Val/Test Split: 70/15/15 (stratified)
Text Source: Cheese descriptions from the dataset
Labels: Texture categories (hard, semi-hard, semi-soft, soft)

Preprocessing

Tokenization: DistilBERT tokenizer with 512 max length
Padding: Max length padding
Truncation: Long descriptions truncated to 512 tokens

Training Setup

Model: distilbert-base-uncased
Epochs: 10
Batch Size: 8 (train/val)
Learning Rate: 2e-5
Warmup Steps: 10
Weight Decay: 0.01
Optimizer: AdamW
Scheduler: Linear warmup + linear decay
Mixed Precision: FP16 (if GPU available)
Seed: 42 (for reproducibility)

Hardware/Compute

Training Device: CPU
Training Time: ~5-10 minutes on GPU
Model Size: ~67M parameters
Memory Usage: ~2-4GB GPU memory

Performance

Test Accuracy: 0.400
Test Loss: 1.290

Class-wise Performance

          precision    recall  f1-score   support

    hard       0.50      0.33      0.40         3

semi-hard 0.29 0.50 0.36 4 semi-soft 0.40 0.50 0.44 4 soft 1.00 0.25 0.40 4

accuracy                           0.40        15

macro avg 0.55 0.40 0.40 15 weighted avg 0.55 0.40 0.40 15

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "rlogh/cheese-texture-classifier-distilbert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example prediction
text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

class_names = ["hard", "semi-hard", "semi-soft", "soft"]
print(f"Predicted texture: {class_names[predicted_class]}")

Class Definitions

Hard: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar)
Semi-hard: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss)
Semi-soft: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese)
Soft: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese)

Limitations and Ethics

Limitations

Small Dataset: Trained on only 100 samples, limiting generalization
Text Quality: Performance depends on description quality and consistency
Subjective Labels: Texture classification has inherent subjectivity
Domain Specific: Only applicable to cheese texture classification
Language: English-only model

Ethical Considerations

Bias: Model may reflect biases in the original dataset
Cultural Context: Cheese descriptions may be culturally specific
Commercial Use: Not intended for commercial cheese production decisions
Accuracy: Should not be used for critical food safety applications

Recommendations

Use for educational/research purposes only
Validate predictions with domain experts
Consider cultural context when applying to different regions
Retrain with larger, more diverse datasets for production use

AI Usage Disclosure

This model was developed using:

Base Model: DistilBERT (distilbert-base-uncased)
Training Framework: Hugging Face Transformers
Fine-tuning: Standard BERT fine-tuning techniques
The AI acted as a collaborative partner throughout the development process, accelerating the coding workflow and providing helpful guidance.

Citation

Model Citation:

@model{rlogh/cheese-texture-classifier-distilbert,
  title={Cheese Texture Classifier (DistilBERT)},
  author={Rumi Loghmani},
  year={2024},
  url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert}
}

Dataset Citation:

@dataset{aslan-ng/cheese-text,
  title={Cheese Text Dataset},
  author={Aslan Noorghasemi},
  year={2024},
  url={https://huggingface.co/datasets/aslan-ng/cheese-text}
}

License

MIT License - See LICENSE file for details.

Downloads last month: 2

Safetensors

Model size

67M params

Tensor type

F32

Dataset used to train rlogh/cheese-texture-classifier-distilbert

Evaluation results

Test Accuracy on Cheese Text Dataset
self-reported

0.400

View on Papers With Code