tsmatz commited on
Commit
e8f6a19
·
1 Parent(s): 22f36b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -11
README.md CHANGED
@@ -29,14 +29,15 @@ should probably proofread and complete it, then remove this comment. -->
29
 
30
  (Japanese caption : 日本語の (抽出型) 質問応答のモデル)
31
 
32
- This model is a fine-tuned question-answering model of [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) on [JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset.
 
 
33
 
34
  ## Intended uses
35
 
36
  When running with a dedicated pipeline :
37
 
38
  ```python
39
- from transformers import AutoModelForTokenClassification
40
  from transformers import pipeline
41
 
42
  model_name = "tsmatz/roberta_qa_japanese"
@@ -52,17 +53,38 @@ result = qa_pipeline(
52
  print(result)
53
  ```
54
 
55
- ## Model description
56
-
57
- More information needed
58
-
59
- ## Intended uses & limitations
60
-
61
- More information needed
62
 
63
- ## Training and evaluation data
 
 
 
64
 
65
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ## Training procedure
68
 
 
29
 
30
  (Japanese caption : 日本語の (抽出型) 質問応答のモデル)
31
 
32
+ This model is a fine-tuned version of [rinna/japanese-roberta-base](https://huggingface.co/rinna/japanese-roberta-base) trained for extractive question answering.
33
+
34
+ The model is fine-tuned on [JaQuAD](https://huggingface.co/datasets/SkelterLabsInc/JaQuAD) dataset provided by Skelter Labs, in which data is collected from Japanese Wikipedia articles and annotated by a human.
35
 
36
  ## Intended uses
37
 
38
  When running with a dedicated pipeline :
39
 
40
  ```python
 
41
  from transformers import pipeline
42
 
43
  model_name = "tsmatz/roberta_qa_japanese"
 
53
  print(result)
54
  ```
55
 
56
+ When manually running through forward pass :
 
 
 
 
 
 
57
 
58
+ ```python
59
+ import torch
60
+ import numpy as np
61
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer
62
 
63
+ model_name = "tsmatz/roberta_qa_japanese"
64
+ model = (AutoModelForQuestionAnswering
65
+ .from_pretrained(model_name))
66
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
67
+
68
+ def inference_answer(question, context):
69
+ question = question
70
+ context = context
71
+ test_feature = tokenizer(
72
+ question,
73
+ context,
74
+ max_length=318,
75
+ )
76
+ with torch.no_grad():
77
+ outputs = model(torch.tensor([test_feature["input_ids"]]))
78
+ start_logits = outputs.start_logits.cpu().numpy()
79
+ end_logits = outputs.end_logits.cpu().numpy()
80
+ answer_ids = test_feature["input_ids"][np.argmax(start_logits):np.argmax(end_logits)+1]
81
+ return "".join(tokenizer.batch_decode(answer_ids))
82
+
83
+ question = "決勝トーナメントで日本に勝ったのはどこでしたか。"
84
+ context = "日本は予選リーグで強豪のドイツとスペインに勝って決勝トーナメントに進んだが、クロアチアと対戦して敗れた。"
85
+ answer_pred = inference_answer(question, context)
86
+ print(answer_pred)
87
+ ```
88
 
89
  ## Training procedure
90