Edit model card

Fine-tuned roberta-base for detecting paragraphs on the topic of 'Language and Communication'

Description

This is a fine tuned roberta-base model for detecting whether paragraphs drawn from ethnographic source material are about 'Language and Communication'.

Usage

The easiest way to use this model at inference time is with the HF pipelines API.

from transformers import pipeline

classifier = pipeline("text-classification", model="gptmurdock/classifier-main_subjects_language")
classifier("Example text to classify")

Training data

...

Training procedure

...

We use a 60-20-20 train-val-test split, and fine-tuned roberta-base for 5 epochs (lr = 2e-5, batch size = 40).

Evaluation

Evals on the test set are reported below.

Metric Value
Precision 97.0
Recall 97.2
F1 97.0
Downloads last month
4
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.