Update README.md
Browse files
README.md
CHANGED
@@ -107,6 +107,18 @@ language:
|
|
107 |
license: apache-2.0
|
108 |
datasets:
|
109 |
- wikipedia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
---
|
111 |
|
112 |
# BERT multilingual base model (cased)
|
@@ -151,55 +163,17 @@ generation you should look at model like GPT2.
|
|
151 |
|
152 |
### How to use
|
153 |
|
154 |
-
You can use this model directly with a pipeline for
|
155 |
|
156 |
```python
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
{'sequence': "[CLS] Hello I'm a world model. [SEP]",
|
166 |
-
'score': 0.052126359194517136,
|
167 |
-
'token': 11356,
|
168 |
-
'token_str': 'world'},
|
169 |
-
{'sequence': "[CLS] Hello I'm a data model. [SEP]",
|
170 |
-
'score': 0.048930276185274124,
|
171 |
-
'token': 11165,
|
172 |
-
'token_str': 'data'},
|
173 |
-
{'sequence': "[CLS] Hello I'm a flight model. [SEP]",
|
174 |
-
'score': 0.02036019042134285,
|
175 |
-
'token': 23578,
|
176 |
-
'token_str': 'flight'},
|
177 |
-
{'sequence': "[CLS] Hello I'm a business model. [SEP]",
|
178 |
-
'score': 0.020079681649804115,
|
179 |
-
'token': 14155,
|
180 |
-
'token_str': 'business'}]
|
181 |
-
```
|
182 |
-
|
183 |
-
Here is how to use this model to get the features of a given text in PyTorch:
|
184 |
-
|
185 |
-
```python
|
186 |
-
from transformers import BertTokenizer, BertModel
|
187 |
-
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
|
188 |
-
model = BertModel.from_pretrained("bert-base-multilingual-cased")
|
189 |
-
text = "Replace me by any text you'd like."
|
190 |
-
encoded_input = tokenizer(text, return_tensors='pt')
|
191 |
-
output = model(**encoded_input)
|
192 |
-
```
|
193 |
-
|
194 |
-
and in TensorFlow:
|
195 |
-
|
196 |
-
```python
|
197 |
-
from transformers import BertTokenizer, TFBertModel
|
198 |
-
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
|
199 |
-
model = TFBertModel.from_pretrained("bert-base-multilingual-cased")
|
200 |
-
text = "Replace me by any text you'd like."
|
201 |
-
encoded_input = tokenizer(text, return_tensors='tf')
|
202 |
-
output = model(encoded_input)
|
203 |
```
|
204 |
|
205 |
## Training data
|
|
|
107 |
license: apache-2.0
|
108 |
datasets:
|
109 |
- wikipedia
|
110 |
+
examples:
|
111 |
+
widget:
|
112 |
+
- text: "মারভিন দি মারসিয়ান"
|
113 |
+
example_title: "Sentence_1"
|
114 |
+
- text: "লিওনার্দো দা ভিঞ্চি"
|
115 |
+
example_title: "Sentence_2"
|
116 |
+
- text: "বসনিয়া ও হার্জেগোভিনা"
|
117 |
+
example_title: "Sentence_3"
|
118 |
+
- text: "সাউথ ইস্ট ইউনিভার্সিটি"
|
119 |
+
example_title: "Sentence_4"
|
120 |
+
- text: "মানিক বন্দ্যোপাধ্যায় লেখক"
|
121 |
+
example_title: "Sentence_5"
|
122 |
---
|
123 |
|
124 |
# BERT multilingual base model (cased)
|
|
|
163 |
|
164 |
### How to use
|
165 |
|
166 |
+
You can use this model directly with a pipeline for named entity recognition:
|
167 |
|
168 |
```python
|
169 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
170 |
+
from transformers import pipeline
|
171 |
+
tokenizer = AutoTokenizer.from_pretrained("orgcatorg/bert-base-multilingual-cased-ner")
|
172 |
+
model = AutoModelForTokenClassification.from_pretrained("orgcatorg/bert-base-multilingual-cased-ner")
|
173 |
+
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
|
174 |
+
example = "মারভিন দি মারসিয়ান"
|
175 |
+
ner_results = nlp(example)
|
176 |
+
ner_results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
```
|
178 |
|
179 |
## Training data
|