Cheese Texture Classifier (DistilBERT)

Model Creator: Rumi Loghmani (@rlogh)
Original Dataset: aslan-ng/cheese-text (by Aslan Noorghasemi)

This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT.

Model Description

  • Architecture: DistilBERT-base-uncased fine-tuned for sequence classification
  • Task: 4-class texture classification (hard, semi-hard, semi-soft, soft)
  • Input: Cheese description text (up to 512 tokens)
  • Output: 4-class probability distribution

Training Details

Data

  • Dataset: aslan-ng/cheese-text (original split: 100 samples)
  • Train/Val/Test Split: 70/15/15 (stratified)
  • Text Source: Cheese descriptions from the dataset
  • Labels: Texture categories (hard, semi-hard, semi-soft, soft)

Preprocessing

  • Tokenization: DistilBERT tokenizer with 512 max length
  • Padding: Max length padding
  • Truncation: Long descriptions truncated to 512 tokens

Training Setup

  • Model: distilbert-base-uncased
  • Epochs: 10
  • Batch Size: 8 (train/val)
  • Learning Rate: 2e-5
  • Warmup Steps: 10
  • Weight Decay: 0.01
  • Optimizer: AdamW
  • Scheduler: Linear warmup + linear decay
  • Mixed Precision: FP16 (if GPU available)
  • Seed: 42 (for reproducibility)

Hardware/Compute

  • Training Device: CPU
  • Training Time: ~5-10 minutes on GPU
  • Model Size: ~67M parameters
  • Memory Usage: ~2-4GB GPU memory

Performance

  • Test Accuracy: 0.400
  • Test Loss: 1.290

Class-wise Performance

          precision    recall  f1-score   support

    hard       0.50      0.33      0.40         3

semi-hard 0.29 0.50 0.36 4 semi-soft 0.40 0.50 0.44 4 soft 1.00 0.25 0.40 4

accuracy                           0.40        15

macro avg 0.55 0.40 0.40 15 weighted avg 0.55 0.40 0.40 15

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "rlogh/cheese-texture-classifier-distilbert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example prediction
text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

class_names = ["hard", "semi-hard", "semi-soft", "soft"]
print(f"Predicted texture: {class_names[predicted_class]}")

Class Definitions

  • Hard: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar)
  • Semi-hard: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss)
  • Semi-soft: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese)
  • Soft: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese)

Limitations and Ethics

Limitations

  • Small Dataset: Trained on only 100 samples, limiting generalization
  • Text Quality: Performance depends on description quality and consistency
  • Subjective Labels: Texture classification has inherent subjectivity
  • Domain Specific: Only applicable to cheese texture classification
  • Language: English-only model

Ethical Considerations

  • Bias: Model may reflect biases in the original dataset
  • Cultural Context: Cheese descriptions may be culturally specific
  • Commercial Use: Not intended for commercial cheese production decisions
  • Accuracy: Should not be used for critical food safety applications

Recommendations

  • Use for educational/research purposes only
  • Validate predictions with domain experts
  • Consider cultural context when applying to different regions
  • Retrain with larger, more diverse datasets for production use

AI Usage Disclosure

This model was developed using:

  • Base Model: DistilBERT (distilbert-base-uncased)
  • Training Framework: Hugging Face Transformers
  • Fine-tuning: Standard BERT fine-tuning techniques
  • The AI acted as a collaborative partner throughout the development process, accelerating the coding workflow and providing helpful guidance.

Citation

Model Citation:

@model{rlogh/cheese-texture-classifier-distilbert,
  title={Cheese Texture Classifier (DistilBERT)},
  author={Rumi Loghmani},
  year={2024},
  url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert}
}

Dataset Citation:

@dataset{aslan-ng/cheese-text,
  title={Cheese Text Dataset},
  author={Aslan Noorghasemi},
  year={2024},
  url={https://huggingface.co/datasets/aslan-ng/cheese-text}
}

License

MIT License - See LICENSE file for details.

Downloads last month
2
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rlogh/cheese-texture-classifier-distilbert

Evaluation results