pei-germany
/

MEDNER-de-fp-gbert

Token Classification

Model card Files Files and versions Community

farnazzeidi commited on Dec 6, 2024

Commit

6950fee

·

verified ·

1 Parent(s): c599cce

Update README.md

Files changed (1) hide show

README.md +86 -1

README.md CHANGED Viewed

@@ -5,4 +5,89 @@ language:
 base_model:
 - deepset/gbert-base
 pipeline_tag: token-classification
----

 base_model:
 - deepset/gbert-base
 pipeline_tag: token-classification
+---
+# NER Model for Legal Texts
+Released in January 2024, this is a Turkish BERT language model pretrained from scratch on an **optimized BERT architecture** using a 2 GB Turkish legal corpus. The corpus was sourced from legal-related thesis documents available in the Higher Education Board National Thesis Center (YÖKTEZ). The model has been fine-tuned for Named Entity Recognition (NER) tasks on human-annotated datasets provided by **NewMind**, a legal tech company in Istanbul, Turkey.
+In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches.
+---
+## Overview
+- **Preprint Paper**: [https://arxiv.org/abs/2407.00648](https://arxiv.org/abs/2407.00648)
+- **Architecture**: Optimized BERT Base
+- **Language**: Turkish
+- **Supported Labels**:
+  - `Person`
+  - `Law`
+  - `Publication`
+  - `Government`
+  - `Corporation`
+  - `Other`
+  - `Project`
+  - `Money`
+  - `Date`
+  - `Location`
+  - `Court`
+**Model Name**: LegalLTurk Optimized BERT
+---
+## How to Use
+### Use a pipeline as a high-level helper
+```python
+from transformers import pipeline
+# Load the pipeline
+model = pipeline("ner", model="farnazzeidi/ner-legalturk-bert-model", aggregation_strategy='simple')
+# Input text
+text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
+# Get predictions
+predictions = model(text)
+print(predictions)
+```
+### Load model directly
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("farnazzeidi/ner-legalturk-bert-model")
+model = AutoModelForTokenClassification.from_pretrained("farnazzeidi/ner-legalturk-bert-model")
+text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+# Process logits and map predictions to labels
+predictions = [
+    (token, model.config.id2label[label.item()])
+    for token, label in zip(
+        tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
+        torch.argmax(torch.softmax(outputs.logits, dim=-1), dim=-1)[0]
+    )
+    if token not in tokenizer.all_special_tokens
+]
+print(predictions)
+```
+---
+# Authors
+Farnaz Zeidi, Mehmet Fatih Amasyali, Çigdem Erol
+---
+## License
+This model is shared under the [CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en).
+You are free to use, share, and adapt the model for non-commercial purposes, provided that you give appropriate credit to the authors.
+For commercial use, please contact [zeidi.uni@gmail.com].