Update README.md
Browse files
README.md
CHANGED
@@ -29,14 +29,15 @@ should probably proofread and complete it, then remove this comment. -->
|
|
29 |
|
30 |
(Japanese caption : 日本語の (抽出型) 質問応答のモデル)
|
31 |
|
32 |
-
This model is a fine-tuned
|
|
|
|
|
33 |
|
34 |
## Intended uses
|
35 |
|
36 |
When running with a dedicated pipeline :
|
37 |
|
38 |
```python
|
39 |
-
from transformers import AutoModelForTokenClassification
|
40 |
from transformers import pipeline
|
41 |
|
42 |
model_name = "tsmatz/roberta_qa_japanese"
|
@@ -52,17 +53,38 @@ result = qa_pipeline(
|
|
52 |
print(result)
|
53 |
```
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
More information needed
|
58 |
-
|
59 |
-
## Intended uses & limitations
|
60 |
-
|
61 |
-
More information needed
|
62 |
|
63 |
-
|
|
|
|
|
|
|
64 |
|
65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
## Training procedure
|
68 |
|
|
|
29 |
|
30 |
(Japanese caption : 日本語の (抽出型) 質問応答のモデル)
|
31 |
|
32 |
+
This model is a fine-tuned version of [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) trained for extractive question answering.
|
33 |
+
|
34 |
+
The model is fine-tuned on [JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset provided by Skelter Labs, in which data is collected from Japanese Wikipedia articles and annotated by a human.
|
35 |
|
36 |
## Intended uses
|
37 |
|
38 |
When running with a dedicated pipeline :
|
39 |
|
40 |
```python
|
|
|
41 |
from transformers import pipeline
|
42 |
|
43 |
model_name = "tsmatz/roberta_qa_japanese"
|
|
|
53 |
print(result)
|
54 |
```
|
55 |
|
56 |
+
When manually running through forward pass :
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
```python
|
59 |
+
import torch
|
60 |
+
import numpy as np
|
61 |
+
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
|
62 |
|
63 |
+
model_name = "tsmatz/roberta_qa_japanese"
|
64 |
+
model = (AutoModelForQuestionAnswering
|
65 |
+
.from_pretrained(model_name))
|
66 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
67 |
+
|
68 |
+
def inference_answer(question, context):
|
69 |
+
question = question
|
70 |
+
context = context
|
71 |
+
test_feature = tokenizer(
|
72 |
+
question,
|
73 |
+
context,
|
74 |
+
max_length=318,
|
75 |
+
)
|
76 |
+
with torch.no_grad():
|
77 |
+
outputs = model(torch.tensor([test_feature["input_ids"]]))
|
78 |
+
start_logits = outputs.start_logits.cpu().numpy()
|
79 |
+
end_logits = outputs.end_logits.cpu().numpy()
|
80 |
+
answer_ids = test_feature["input_ids"][np.argmax(start_logits):np.argmax(end_logits)+1]
|
81 |
+
return "".join(tokenizer.batch_decode(answer_ids))
|
82 |
+
|
83 |
+
question = "決勝トーナメントで日本に勝ったのはどこでしたか。"
|
84 |
+
context = "日本は予選リーグで強豪のドイツとスペインに勝って決勝トーナメントに進んだが、クロアチアと対戦して敗れた。"
|
85 |
+
answer_pred = inference_answer(question, context)
|
86 |
+
print(answer_pred)
|
87 |
+
```
|
88 |
|
89 |
## Training procedure
|
90 |
|