|
--- |
|
language: en |
|
tags: |
|
- classification |
|
- educational |
|
- distilbert |
|
- transformer |
|
license: apache-2.0 |
|
datasets: |
|
- haider0941/Educational_Noneducational_Dataset |
|
--- |
|
## Model Details |
|
|
|
- **Model Name**: DistilBERT for Educational Query Classification |
|
- **Model Architecture**: DistilBERT (base model: `distilbert-base-uncased`) |
|
- **Language**: English |
|
- **Model Type**: Transformer-based text classification model |
|
- **License**: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
## Overview |
|
|
|
This model is a fine-tuned version of [DistilBERT](https://huggingface.co/distilbert-base-uncased) specifically designed for classifying queries as either educational or non-educational. It was trained on a dataset containing a variety of questions and statements, with each entry labeled as either "educational" or "non-educational." |
|
|
|
## Intended Use |
|
|
|
- **Primary Use Case**: This model is intended to classify text inputs into two categories: "educational" or "non-educational." It is useful for applications that need to filter out or prioritize educational content. |
|
- **Potential Applications**: |
|
- Educational chatbots or virtual assistants |
|
- Content moderation for educational platforms |
|
- Automated tagging of educational content |
|
- Filtering non-educational queries from educational websites or apps |
|
|
|
## Training Data |
|
|
|
- **Dataset**: The model was fine-tuned on a custom educational dataset. This dataset includes various types of queries that are labeled based on their content as either "educational" or "non-educational." |
|
- **Dataset Source**: The dataset was manually curated to include a balanced mix of educational questions (covering various academic subjects) and non-educational questions (general queries that do not pertain to educational content). |
|
|
|
## Training Procedure |
|
|
|
- **Framework**: The model was trained using the [Hugging Face Transformers library](https://huggingface.co/transformers/) with PyTorch. |
|
- **Fine-Tuning Parameters**: |
|
- **Batch Size**: 16 |
|
- **Learning Rate**: 5e-5 |
|
- **Epochs**: 3 |
|
- **Optimizer**: AdamW with weight decay |
|
- **Hardware**: Fine-tuning was performed on a single NVIDIA V100 GPU. |
|
|
|
## Limitations and Bias |
|
|
|
While this model has been fine-tuned for classifying queries as educational or non-educational, there are some limitations and potential biases: |
|
|
|
- **Bias in Data**: The model may reflect any biases present in the training data, particularly if certain topics or types of educational content are overrepresented or underrepresented. |
|
- **Binary Classification**: The model categorizes inputs strictly as "educational" or "non-educational." It may not handle nuanced or ambiguous queries effectively. |
|
- **Not Suitable for Other Classifications**: This model is specifically designed for educational vs. non-educational classification. It may not perform well on other types of classification tasks without further fine-tuning. |
|
|
|
## How to Use |
|
|
|
You can load the model using the Hugging Face Transformers library: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("haider0941/distilbert-base-educationl") |
|
model = AutoModelForSequenceClassification.from_pretrained("haider0941/distilbert-base-educationl") |
|
|
|
input_text = "What is the capital of France?" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model(**inputs) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model, please cite it as follows: |
|
|
|
``` |
|
@misc{Haider0941_2024, |
|
title={Fine-Tuned DistilBERT for Educational Query Classification}, |
|
author={Haider}, |
|
year={2024}, |
|
howpublished={\url{https://huggingface.co/haider0941/distilbert-base-educationl}}, |
|
} |
|
``` |