submission-template / README.md
nhankins's picture
Update README.md
8ac073e verified
metadata
title: Submission Template
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: docker
pinned: false

Fine-tuned Emotion Model Checkpoint for Climate Disinformation Classification

Model Description

This is a lightwight RoBERTa model fine-tuned for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor, randomly assigning labels to text inputs without any learning.

Intended Use

  • Primary intended uses: Baseline comparison for climate disinformation classification models
  • Primary intended users: Researchers and developers participating in the Frugal AI Challenge
  • Out-of-scope use cases: Not intended for production use or real-world classification tasks

Training Data

The model uses the QuotaClimat/frugalaichallenge-text-train dataset:

  • Size: ~6000 examples
  • Split: 80% train, 20% test
  • 8 categories of climate disinformation claims

Labels

  1. No relevant claim detected
  2. Global warming is not happening
  3. Not caused by humans
  4. Not bad or beneficial
  5. Solutions harmful/unnecessary
  6. Science is unreliable
  7. Proponents are biased
  8. Fossil fuels are needed

Performance

This model is a fine-tuned version of michellejieli/emotion_text_classifier on the provided dataset for the competition. It achieves the following results on the evaluation set:

Loss: 0.2828 F1: 0.7879 Roc Auc: nan Hamming: 0.1039 Model description This model uses a lightweight RoBERTa checkpoint that has been fine-tuned on evaluating emotions to further be trained on recognizing climate disinformation.

Training procedure

Used a binarizer to tokenize the text and found a seemingly suitable model checkpoint as a good place to start!

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05 train_batch_size: 8 eval_batch_size: 8 seed: 42 optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments lr_scheduler_type: linear num_epochs: 4 Training results Framework versions Transformers 4.47.1 Pytorch 2.5.1+cu121 Datasets 3.2.0 Tokenizers 0.21.0

Metrics

  • Accuracy: ~12.5% (random chance with 8 classes)
  • Environmental Impact:
    • Emissions tracked in gCO2eq
    • Energy consumption tracked in Wh

Model Architecture

The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.

Environmental Impact

Environmental impact is tracked using CodeCarbon, measuring:

  • Carbon emissions during inference
  • Energy consumption during inference

This tracking helps establish a baseline for the environmental impact of model deployment and inference.

Limitations

  • Makes completely random predictions
  • No learning or pattern recognition
  • No consideration of input text
  • Serves only as a baseline reference
  • Not suitable for any real-world applications

Ethical Considerations

  • Dataset contains sensitive topics related to climate disinformation
  • Model makes random predictions and should not be used for actual classification
  • Environmental impact is tracked to promote awareness of AI's carbon footprint