File size: 3,672 Bytes
ad7a402 f38289d ad7a402 f38289d ad7a402 1a899b5 ad7a402 1a899b5 ad7a402 1a899b5 471b467 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
language: en
tags:
- classification
- educational
- distilbert
- transformer
license: apache-2.0
datasets:
- haider0941/Educational_Noneducational_Dataset
---
## Model Details
- **Model Name**: DistilBERT for Educational Query Classification
- **Model Architecture**: DistilBERT (base model: `distilbert-base-uncased`)
- **Language**: English
- **Model Type**: Transformer-based text classification model
- **License**: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
## Overview
This model is a fine-tuned version of [DistilBERT](https://huggingface.co/distilbert-base-uncased) specifically designed for classifying queries as either educational or non-educational. It was trained on a dataset containing a variety of questions and statements, with each entry labeled as either "educational" or "non-educational."
## Intended Use
- **Primary Use Case**: This model is intended to classify text inputs into two categories: "educational" or "non-educational." It is useful for applications that need to filter out or prioritize educational content.
- **Potential Applications**:
- Educational chatbots or virtual assistants
- Content moderation for educational platforms
- Automated tagging of educational content
- Filtering non-educational queries from educational websites or apps
## Training Data
- **Dataset**: The model was fine-tuned on a custom educational dataset. This dataset includes various types of queries that are labeled based on their content as either "educational" or "non-educational."
- **Dataset Source**: The dataset was manually curated to include a balanced mix of educational questions (covering various academic subjects) and non-educational questions (general queries that do not pertain to educational content).
## Training Procedure
- **Framework**: The model was trained using the [Hugging Face Transformers library](https://huggingface.co/transformers/) with PyTorch.
- **Fine-Tuning Parameters**:
- **Batch Size**: 16
- **Learning Rate**: 5e-5
- **Epochs**: 3
- **Optimizer**: AdamW with weight decay
- **Hardware**: Fine-tuning was performed on a single NVIDIA V100 GPU.
## Limitations and Bias
While this model has been fine-tuned for classifying queries as educational or non-educational, there are some limitations and potential biases:
- **Bias in Data**: The model may reflect any biases present in the training data, particularly if certain topics or types of educational content are overrepresented or underrepresented.
- **Binary Classification**: The model categorizes inputs strictly as "educational" or "non-educational." It may not handle nuanced or ambiguous queries effectively.
- **Not Suitable for Other Classifications**: This model is specifically designed for educational vs. non-educational classification. It may not perform well on other types of classification tasks without further fine-tuning.
## How to Use
You can load the model using the Hugging Face Transformers library:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("haider0941/distilbert-base-educationl")
model = AutoModelForSequenceClassification.from_pretrained("haider0941/distilbert-base-educationl")
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
```
## Citation
If you use this model, please cite it as follows:
```
@misc{Haider0941_2024,
title={Fine-Tuned DistilBERT for Educational Query Classification},
author={Haider},
year={2024},
howpublished={\url{https://huggingface.co/haider0941/distilbert-base-educationl}},
}
``` |