RAGPT: Fine-tuned GPT-2 for Context-Based Question Answering
Model Description
RAGPT is a fine-tuned version of GPT-2 small, specifically adapted for context-based question answering tasks. This model has been trained to generate relevant answers based on a given context and question, similar to a Retrieval-Augmented Generation (RAG) system.
Key Features
- Based on the GPT-2 small architecture (124M parameters)
- Fine-tuned on the "neural-bridge/rag-dataset-12000" dataset from Hugging Face
- Capable of generating answers based on provided context and questions
- Suitable for various question-answering applications
Training Data
The model was fine-tuned using the "neural-bridge/rag-dataset-12000" dataset, which contains:
- Context passages
- Questions related to the context
- Corresponding answers
Fine-tuning Process
The fine-tuning process involved:
- Loading the pre-trained GPT-2 small model
- Preprocessing the dataset to combine context, question, and answer into a single text
- Training the model to predict the next token given the context and question
Hyperparameters
- Base model: GPT-2 small
- Number of training epochs: 3
- Batch size: 4
- Learning rate: Default AdamW optimizer settings
- Max sequence length: 512 tokens
Usage
To use the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "BueormLLC/RAGPT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare input
context = "Your context here"
question = "Your question here"
input_text = f"Contexto: {context}\nPregunta: {question}\nRespuesta:"
# Generate answer
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=150, num_return_sequences=1)
answer = tokenizer.decode(output[0], skip_special_tokens=True)
Limitations
- The model's knowledge is limited to its training data and the base GPT-2 model.
- It may sometimes generate irrelevant or incorrect answers, especially for topics outside its training domain.
- The model does not have access to external information or real-time data.
Ethical Considerations
Users should be aware that this model, like all language models, may reflect biases present in its training data. It should not be used as a sole source of information for critical decisions.
Future Improvements
- Fine-tuning on a larger and more diverse dataset
- Experimenting with larger base models (e.g., GPT-2 medium or large)
- Implementing techniques to improve factual accuracy and reduce hallucinations
Support us
We appreciate your support, without you we could not do what we do.
Citation
If you use this model in your research, please cite:
@misc{RAGPT,
author = {Bueorm},
title = {RAGPT: Fine-tuned GPT-2 for Context-Based Question Answering},
year = {2024},
publisher = {GitHub},
journal = {None},
howpublished = {\url{https://huggingface.co/BueormLLC/RAGPT}}
}
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.