This is a NER model meant to be used to detect/extract citations from American legal documents.

Ignore the widget on the model card page; see below for usage.

How to Use the Model

This model outputs token-level predictions, which should be processed as follows to obtain meaningful labels for each token:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("ss108/legal-citation-bert")
model = AutoModelForTokenClassification.from_pretrained("ss108/legal-citation-bert")

text = "Your example text here"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model(**inputs)

logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)

tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
predicted_labels = [model.config.id2label[p.item()] for p in predictions[0]]


components = []
for token, label in zip(tokens, predicted_labels):
    components.append(f"{token} : {label}")

concat = " ; ".join(components)
print(concat)
Downloads last month
7
Safetensors
Model size
108M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.