google-bert/bert-base-uncased Fine-Tuned on SQuAD

bert_squad

Pretrained model on context-based Question Answering using the SQuAD dataset. This model is fine-tuned from the BERT architecture for extracting answers from passages.

Model Description

bert_squad is a transformer-based model trained for context-based question answering tasks. It leverages the pretrained BERT architecture and adapts it for extracting precise answers given a question and a related context. This model uses the Stanford Question Answering Dataset (SQuAD), available via Hugging Face datasets, for training and fine-tuning.

The model was trained using free computational resources, demonstrating its accessibility for educational and small-scale research purposes.

Fine-tuned by: SADAT PARVEJ, RAFIFA BINTE JAHIR

Shared by: SADAT PARVEJ

Language(s) (NLP): ENGLISH

Finetuned from model: https://huggingface.co/google-bert/bert-base-uncased

Training Objective

The model predicts the most relevant span of text in a given passage that answers a specific question. It fine-tunes BERT's ability to analyze context using supervised data from SQuAD.

Performance Benchmarks

Training Loss: 0.477800

Validation Loss: 0.465936

Exact Match (EM): 87.568590%

Intended Uses & Limitations

This model is designed for tasks such as:

Extractive Question Answering Reading comprehension applications Known Limitations:

As BERT is inherently a masked language model (MLM), its original pretraining limits its ability for generative tasks or handling queries outside the SQuAD-style question-answering setup. The model's predictions may be biased or overly reliant on the training dataset, as SQuAD comprises structured and fact-based question-answer pairs.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Load the model and tokenizer
model_name = "Sadat07/bert_squad"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)


context = """
The person who invented light was 
Thomas Edison.He was born in 1879.
"""
question = "When did Thomas Edison invent?"


inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=512)
input_ids = inputs["input_ids"].to(device)
attention_mask = inputs["attention_mask"].to(device)


print("Tokenized Input:", tokenizer.decode(input_ids[0]))

# Perform inference
with torch.no_grad():
    outputs = model(input_ids=input_ids, attention_mask=attention_mask)
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits

# Logits
print("Start logits:", start_scores)
print("End logits:", end_scores)

# Get start and end indices
start_idx = torch.argmax(start_scores)
end_idx = torch.argmax(end_scores) + 1

# Decode the answer
if start_idx >= end_idx:
    print("Model did not predict a valid answer. Please check context and question.")
else:
    answer = tokenizer.convert_tokens_to_string(
        tokenizer.convert_ids_to_tokens(input_ids[0][start_idx:end_idx])
    )
    print(f"Question: {question}")
    print(f"Answer: {answer}")
  

Training Details

Step Training Loss Validation Loss Exact Match Squad F1 Start Accuracy End Accuracy
100 0.632200 0.811809 84.749290 84.749290 0.847493 0.899243
200 0.751500 0.627198 84.768212 84.768212 0.847682 0.899243
300 0.662600 0.557515 86.244087 86.244087 0.862441 0.899243
400 0.600400 0.567693 86.177862 86.177862 0.861779 0.899243
500 0.613200 0.523546 86.499527 86.499527 0.864995 0.899243
600 0.495200 0.539225 86.565752 86.565752 0.865658 0.899243
700 0.645300 0.552358 85.354778 85.354778 0.853548 0.899243
800 0.499100 0.562317 86.338694 86.338694 0.863387 0.899243
900 0.482800 0.499747 86.811731 86.811731 0.868117 0.899243
1000 0.372800 0.543513 86.972564 86.972564 0.869726 0.900000
1100 0.554000 0.502747 85.969726 85.969726 0.859697 0.894797
1200 0.459800 0.484941 87.019868 87.019868 0.870199 0.900662
1300 0.463600 0.477527 87.407758 87.407758 0.874078 0.899905
1400 0.356800 0.499119 87.549669 87.549669 0.875497 0.901608
1500 0.494200 0.485287 87.549669 87.549669 0.875497 0.901703
1600 0.521100 0.466062 87.284768 87.284768 0.872848 0.899243
1700 0.461200 0.462704 87.540208 87.540208 0.875402 0.901419
1800 0.415700 0.474295 87.691580 87.691580 0.876916 0.901892
1900 0.622900 0.462900 87.417219 87.417219 0.874172 0.901987
2000 0.477800 0.465936 87.568590 87.568590 0.875686 0.901892

Training Data

The model was trained on the SQuAD dataset, a widely used benchmark for context-based question-answering tasks. It consists of passages from Wikipedia and corresponding questions, with human-annotated answers.

During training, the dataset was processed to extract contexts, questions, and answers, ensuring compatibility with the BERT architecture for QA. The training utilized free resources to minimize costs and focus on model efficiency.

Training Procedure

Training Objective
The model was trained with the objective of performing context-based question answering using the SQuAD dataset. The fine-tuning process adapts BERT's masked language model (MLM) architecture for QA tasks by leveraging its ability to encode contextual relationships between the passage, question, and answer.

Optimization
The training utilized the AdamW optimizer with a linear learning rate scheduler and warm-up steps to ensure effective weight updates and prevent overfitting. The training was run for 2000 steps, with early stopping applied based on the validation loss and exact match score.

Hardware and Resources
Training was conducted on free resources, such as Google Colab or equivalent free GPU resources. While this limited the scale, adjustments in batch size and learning rate were optimized to make the training efficient within these constraints.

Unique Features
The model fine-tuning procedure emphasizes efficient learning, leveraging BERT's pre-trained knowledge while adapting it specifically to QA tasks in a resource-constrained environment.

Metrics

Performance was evaluated using the following metrics:

  • Exact Match (EM): Measures the percentage of predictions that match the ground-truth answers exactly.
  • F1 Score: Assesses the overlap between the predicted and true answers at a token level, balancing precision and recall.
  • Start and End Accuracy: Tracks the model鈥檚 ability to correctly identify the start and end indices of answers within the context.

Results

The model trained on the SQuAD dataset achieved the following key performance metrics:

Exact Match (EM): Up to 87.69%

F1 Score: Up to 87.69%

Validation Loss: Reduced to 0.46

Start Accuracy: Peaked at 87.69%

End Accuracy: Peaked at 90.19%

Summary

The model, bert_squad, was fine-tuned for context-based question answering using the SQuAD dataset from Hugging Face. Key metrics include an Exact Match (EM) and F1 score of up to 87.69%, demonstrating strong accuracy. Performance benchmarks show consistent improvement in loss and accuracy over 2000 steps, with validation loss reaching as low as 0.46.

The training utilized free resources, leveraging BERT鈥檚 robust pretraining, although BERT鈥檚 limitation as a Masked Language Model (MLM) remains a consideration. This work highlights the potential for effective question-answering systems built on pre-existing datasets and infrastructure.

Model Architecture and Objective

The model uses BERT, a pre-trained Transformer-based architecture, fine-tuned for context-based question answering tasks. It aims to predict answers based on the given input text and context.

Compute Infrastructure

Hardware

GPU: Tesla P100, NVIDIA T4

Software

Framework: Hugging Face Transformers

Dataset: SQuAD (from Hugging Face)

Other tools: Python, PyTorch

BibTeX:


@misc{bert_squad_finetune,
  title = {BERT Fine-tuned for SQuAD},
  author = {Your Name or Team Name},
  year = {2024},
  url = {https://huggingface.co/your-model-repository}
}

Glossary

Exact Match (EM): A metric measuring the percentage of predictions that match the ground truth exactly.

Masked Language Model (MLM): Pre-training objective for BERT, predicting masked words in input sentences.

Downloads last month
54
Safetensors
Model size
109M params
Tensor type
F32
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.