TiME Irish (ga, s)

Monolingual BERT-style encoder that outputs embeddings for Irish. Distilled from FacebookAI/xlm-roberta-large.

Specs

  • language: Irish (ga)
  • size: s
  • architecture: BERT encoder
  • layers: 6
  • hidden size: 384
  • intermediate size: 1536

Usage (mean pooled embeddings)

from transformers import AutoTokenizer, AutoModel
import torch

repo = "dschulmeist/TiME-ga-s"
tok = AutoTokenizer.from_pretrained(repo)
mdl = AutoModel.from_pretrained(repo)

def mean_pool(last_hidden_state, attention_mask):
    mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state)
    return (last_hidden_state * mask).sum(1) / mask.sum(1).clamp(min=1e-9)

inputs = tok(["example sentence"], padding=True, truncation=True, return_tensors="pt")
outputs = mdl(**inputs)
emb = mean_pool(outputs.last_hidden_state, inputs['attention_mask'])
print(emb.shape)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train dschulmeist/TiME-ga-s

Collection including dschulmeist/TiME-ga-s