BERT Base Indonesian Named Entity Recognition

This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian.
The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.


Model Details

  • Model Type: BERT (Bidirectional Encoder Representations from Transformers)
  • Language: Indonesian (id)
  • Task: Token Classification / Named Entity Recognition
  • Base Model: cahya/bert-base-indonesian-1.5G
  • License: MIT

Base Model Reference

The base model, BERT Base Indonesian (uncased), was pre-trained on:

  • ~522MB Indonesian Wikipedia
  • ~1GB Indonesian newspaper text
    using a masked language modeling (MLM) objective with a 32,000 WordPiece vocabulary.

Full details are available on its model card.


Intended Use

This fine-tuned model is intended for:

  • Named Entity Recognition in Indonesian text
  • Information extraction from Indonesian documents
  • Text analysis and processing applications

How to Use

Using with Transformers

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "nahiar/BERT-NER"  # replace with your Hugging Face repo ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=2)

tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]

print("Tokens:", tokens)
print("Labels:", labels)
Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nahiar/BERT-NER