Veganism & Vegetarianism Classifier (Distilbert)
This model classifies content related to veganism and vegetarianism on climate change subreddits.
Model Details
- Model Type: Distilbert
- Task: Multilabel text classification
- Sector: Veganism & Vegetarianism
- Base Model: Distilbert base uncased
- Labels: 7
- Training Data: Sample from 1000 GPT 4o-mini-labeled Reddit posts from climate subreddits (2010-2023)
Labels
The model predicts 7 labels simultaneously:
- Animal Welfare: Cites animal suffering, cruelty, or ethics as motivation.
- Environmental Impact: Links diet choice to climate change, land, water, or emissions.
- Health: Claims physical health benefits or risks of eating less meat / going vegan.
- Lab Grown And Alt Proteins: References cultivated meat, precision fermentation, insect protein or plant-based substitutes.
- Psychology And Identity: Diet as part of personal identity, moral virtue signalling or tribal politics.
- Systemic Vs Individual Action: Calls for policy, corporate reform or large-scale funding instead of just personal diet shifts.
- Taste And Convenience: Talks about flavour, texture, cooking ease, availability of vegan options, or social convenience.
Note: Label order in predictions matches the order above.
Usage
import torch, sys, os, tempfile
from transformers import DistilBertTokenizer
from huggingface_hub import snapshot_download
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def print_sorted_label_scores(label_scores):
# Sort label_scores dict by score descending
sorted_items = sorted(label_scores.items(), key=lambda x: x[1], reverse=True)
for label, score in sorted_items:
print(f" {label}: {score:.6f}")
# Model link and examples for this specific model
model_link = 'sanchow/veganism_and_vegetarianism-distilbert-classifier'
examples = [
"Plant-based diets have a much lower carbon footprint than meat-heavy diets."
]
print(f"\n{'='*60}")
print("MODEL: VEGANISM & VEGETARIANISM SECTOR")
print(f"{'='*60}")
print(f"Downloading model: {model_link}")
with tempfile.TemporaryDirectory() as temp_dir:
snapshot_download(
repo_id=model_link,
local_dir=temp_dir,
local_dir_use_symlinks=False
)
model_class_path = os.path.join(temp_dir, 'model_class.py')
if not os.path.exists(model_class_path):
print(f"model_class.py not found in downloaded files")
print(f" Available files: {os.listdir(temp_dir)}")
else:
sys.path.insert(0, temp_dir)
from model_class import MultilabelClassifier
tokenizer = DistilBertTokenizer.from_pretrained(temp_dir)
checkpoint = torch.load(os.path.join(temp_dir, 'model.pt'), map_location='cpu', weights_only=False)
model = MultilabelClassifier(checkpoint['model_name'], len(checkpoint['label_names']))
model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)
model.eval()
print("Model loaded successfully")
print(f" Labels: {checkpoint['label_names']}")
print("\nVeganism & Vegetarianism classifier results:\n")
for i, test_text in enumerate(examples):
inputs = tokenizer(
test_text,
return_tensors="pt",
truncation=True,
max_length=512,
padding=True
).to(device)
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.cpu().numpy() if isinstance(outputs, (tuple, list)) else outputs.cpu().numpy()
label_scores = {label: float(score) for label, score in zip(checkpoint['label_names'], predictions[0])}
print(f"Example {i+1}: '{test_text}'")
print("Predictions (all label scores, highest first):")
print_sorted_label_scores(label_scores)
print("-" * 40)
Performance
Best model performance:
- Micro Jaccard: 0.5584
- Macro Jaccard: 0.6710
- F1 Score: 0.8906
- Accuracy: 0.8906
Dataset: ~900 GPT-labeled samples per sector (600 train, 150 validation, 150 test)
Optimal Thresholds
optimal_thresholds = {'Animal Welfare': 0.48107979620047003, 'Environmental Impact': 0.45919171852850427, 'Health': 0.20115313966833437, 'Lab Grown And Alt Proteins': 0.3414601502146817, 'Psychology And Identity': 0.5246278637433214, 'Systemic Vs Individual Action': 0.37517437676211585, 'Taste And Convenience': 0.6635140143644325}
for label, score in zip(label_names, predictions[0]):
threshold = optimal_thresholds.get(label, 0.5)
if score > threshold:
print(f"{label}: {score:.3f}")
Training
Trained on GPT-labeled Reddit data:
- Data collection from climate subreddits
- keyword based filtering for sector-specific content
- GPT labeling for multilabel classification
- 80/10/10 train/validation/test split
- Fine-tuning with threshold optimization
Citation
If you use this model in your research, please cite:
@misc{veganism_and_vegetarianism_distilbert_classifier,
title={Veganism & Vegetarianism Classifier for Climate Change Analysis},
author={Sandeep Chowdhary},
year={2025},
publisher={Hugging Face},
journal={Hugging Face Hub},
howpublished={\url{https://huggingface.co/echoboi/veganism_and_vegetarianism-distilbert-classifier}},
}
Limitations
- Trained on data from specific climate change subreddits and limited to English content
- Performance depends on GPT-generated labels