huseyincenik
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -21,30 +21,86 @@ probably proofread and complete it, then remove this comment. -->
|
|
21 |
|
22 |
# huseyincenik/conll_ner_with_bert
|
23 |
|
24 |
-
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on
|
25 |
-
It achieves the following results on the evaluation set:
|
26 |
-
- Train Loss: 0.0228
|
27 |
-
- Validation Loss: 0.0180
|
28 |
-
- Epoch: 1
|
29 |
|
30 |
## Model description
|
31 |
|
32 |
-
|
33 |
|
34 |
## Intended uses & limitations
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Training and evaluation data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
More information needed
|
41 |
|
42 |
## Training procedure
|
43 |
|
44 |
-
### Training
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
-
The following hyperparameters were used during training:
|
47 |
-
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 2e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 875.9, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 0.1, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
48 |
- training_precision: float32
|
49 |
|
50 |
### Training results
|
@@ -54,6 +110,65 @@ The following hyperparameters were used during training:
|
|
54 |
| 0.1016 | 0.0254 | 0 |
|
55 |
| 0.0228 | 0.0180 | 1 |
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
### Framework versions
|
59 |
|
|
|
21 |
|
22 |
# huseyincenik/conll_ner_with_bert
|
23 |
|
24 |
+
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the CoNLL-2003 dataset for Named Entity Recognition (NER).
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Model description
|
27 |
|
28 |
+
This model has been trained to perform Named Entity Recognition (NER) and is based on the BERT architecture. It was fine-tuned on the CoNLL-2003 dataset, a standard dataset for NER tasks.
|
29 |
|
30 |
## Intended uses & limitations
|
31 |
|
32 |
+
### Intended Uses
|
33 |
+
|
34 |
+
- **Named Entity Recognition**: This model is designed to identify and classify named entities in text into categories such as location (LOC), organization (ORG), person (PER), and miscellaneous (MISC).
|
35 |
+
|
36 |
+
### Limitations
|
37 |
+
|
38 |
+
- **Domain Specificity**: The model was fine-tuned on the CoNLL-2003 dataset, which consists of news articles. It may not generalize well to other domains or types of text not represented in the training data.
|
39 |
+
- **Subword Tokens**: The model may occasionally tag subword tokens as entities, requiring post-processing to handle these cases.
|
40 |
|
41 |
## Training and evaluation data
|
42 |
+
- **Training Dataset**: CoNLL-2003
|
43 |
+
- **Training Evaluation Metrics**:
|
44 |
+
|
45 |
+
- precision recall f1-score support
|
46 |
+
|
47 |
+
B-PER 0.98 0.98 0.98 11273
|
48 |
+
I-PER 0.98 0.99 0.99 9323
|
49 |
+
B-ORG 0.88 0.92 0.90 10447
|
50 |
+
I-ORG 0.81 0.92 0.86 5137
|
51 |
+
B-LOC 0.86 0.94 0.90 9621
|
52 |
+
I-LOC 1.00 0.08 0.14 1267
|
53 |
+
B-MISC 0.81 0.73 0.77 4793
|
54 |
+
I-MISC 0.83 0.36 0.50 1329
|
55 |
+
|
56 |
+
micro avg 0.90 0.90 0.90 53190
|
57 |
+
macro avg 0.89 0.74 0.75 53190
|
58 |
+
weighted avg 0.90 0.90 0.89 53190
|
59 |
+
|
60 |
+
- **Validation Evaluation Metrics**:
|
61 |
+
- precision recall f1-score support
|
62 |
+
|
63 |
+
B-PER 0.97 0.98 0.97 3018
|
64 |
+
I-PER 0.98 0.98 0.98 2741
|
65 |
+
B-ORG 0.86 0.91 0.88 2056
|
66 |
+
I-ORG 0.77 0.81 0.79 900
|
67 |
+
B-LOC 0.86 0.94 0.90 2618
|
68 |
+
I-LOC 1.00 0.10 0.18 281
|
69 |
+
B-MISC 0.77 0.74 0.76 1231
|
70 |
+
I-MISC 0.77 0.34 0.48 390
|
71 |
+
|
72 |
+
micro avg 0.90 0.89 0.89 13235
|
73 |
+
macro avg 0.87 0.73 0.74 13235
|
74 |
+
weighted avg 0.90 0.89 0.88 13235
|
75 |
+
|
76 |
+
- **Test Evaluation Metrics**:
|
77 |
+
- precision recall f1-score support
|
78 |
+
|
79 |
+
B-PER 0.96 0.95 0.96 2714
|
80 |
+
I-PER 0.98 0.99 0.98 2487
|
81 |
+
B-ORG 0.81 0.87 0.84 2588
|
82 |
+
I-ORG 0.74 0.87 0.80 1050
|
83 |
+
B-LOC 0.81 0.90 0.85 2121
|
84 |
+
I-LOC 0.89 0.12 0.22 276
|
85 |
+
B-MISC 0.75 0.67 0.71 996
|
86 |
+
I-MISC 0.85 0.49 0.62 241
|
87 |
+
|
88 |
+
micro avg 0.87 0.88 0.87 12473
|
89 |
+
macro avg 0.85 0.73 0.75 12473
|
90 |
+
weighted avg 0.87 0.88 0.86 12473
|
91 |
+
|
92 |
|
|
|
93 |
|
94 |
## Training procedure
|
95 |
|
96 |
+
### Training Hyperparameters
|
97 |
+
|
98 |
+
- **Optimizer**: AdamWeightDecay
|
99 |
+
- Learning Rate: 2e-05
|
100 |
+
- Decay Schedule: PolynomialDecay
|
101 |
+
- Warmup Steps: 0.1
|
102 |
+
- Weight Decay Rate: 0.01
|
103 |
|
|
|
|
|
104 |
- training_precision: float32
|
105 |
|
106 |
### Training results
|
|
|
110 |
| 0.1016 | 0.0254 | 0 |
|
111 |
| 0.0228 | 0.0180 | 1 |
|
112 |
|
113 |
+
### Optimizer Details
|
114 |
+
|
115 |
+
```python
|
116 |
+
from transformers import create_optimizer
|
117 |
+
|
118 |
+
batch_size = 32
|
119 |
+
num_train_epochs = 2
|
120 |
+
num_train_steps = (len(tokenized_conll["train"]) // batch_size) * num_train_epochs
|
121 |
+
|
122 |
+
optimizer, lr_schedule = create_optimizer(
|
123 |
+
init_lr=2e-5,
|
124 |
+
num_train_steps=num_train_steps,
|
125 |
+
weight_decay_rate=0.01,
|
126 |
+
num_warmup_steps=0.1
|
127 |
+
)
|
128 |
+
```
|
129 |
+
|
130 |
+
## How to Use
|
131 |
+
|
132 |
+
### Using a Pipeline
|
133 |
+
|
134 |
+
```python
|
135 |
+
from transformers import pipeline
|
136 |
+
|
137 |
+
pipe = pipeline("token-classification", model="huseyincenik/conll_ner_with_bert")
|
138 |
+
|
139 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
140 |
+
|
141 |
+
tokenizer = AutoTokenizer.from_pretrained("huseyincenik/conll_ner_with_bert")
|
142 |
+
model = AutoModelForTokenClassification.from_pretrained("huseyincenik/conll_ner_with_bert")
|
143 |
+
|
144 |
+
```
|
145 |
+
Abbreviation|Description
|
146 |
+
-|-
|
147 |
+
O|Outside of a named entity
|
148 |
+
B-MISC |Beginning of a miscellaneous entity right after another miscellaneous entity
|
149 |
+
I-MISC | Miscellaneous entity
|
150 |
+
B-PER |Beginning of a person’s name right after another person’s name
|
151 |
+
I-PER |Person’s name
|
152 |
+
B-ORG |Beginning of an organization right after another organization
|
153 |
+
I-ORG |organization
|
154 |
+
B-LOC |Beginning of a location right after another location
|
155 |
+
I-LOC |Location
|
156 |
+
|
157 |
+
|
158 |
+
### CoNLL-2003 English Dataset Statistics
|
159 |
+
This dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper.
|
160 |
+
#### # of training examples per entity type
|
161 |
+
Dataset|LOC|MISC|ORG|PER
|
162 |
+
-|-|-|-|-
|
163 |
+
Train|7140|3438|6321|6600
|
164 |
+
Dev|1837|922|1341|1842
|
165 |
+
Test|1668|702|1661|1617
|
166 |
+
#### # of articles/sentences/tokens per dataset
|
167 |
+
Dataset |Articles |Sentences |Tokens
|
168 |
+
-|-|-|-
|
169 |
+
Train |946 |14,987 |203,621
|
170 |
+
Dev |216 |3,466 |51,362
|
171 |
+
Test |231 |3,684 |46,435
|
172 |
|
173 |
### Framework versions
|
174 |
|