|
--- |
|
library_name: transformers |
|
tags: |
|
- code |
|
datasets: |
|
- elyza/ELYZA-tasks-100 |
|
language: |
|
- ja |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- tohoku-nlp/bert-base-japanese-v3 |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
elyzaタスク100のタスクのinputを入力してタスクを分類するためのタスクです。 |
|
タスクの分類は以下のものです。 |
|
|
|
- 知識説明型 Knowledge Explanation |
|
- 創作型 Creative Generation |
|
- 分析推論型 Analytical Reasoning |
|
- 課題解決型 Task Solution |
|
- 情報抽出型 Information Extraction |
|
- 計算・手順型 Step-by-Step Calculation |
|
- 意見・視点型 Opinion-Perspective |
|
- ロールプレイ型 Role-Play Response |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
- **Developed by:** [Hiroki Yanagisawa] |
|
- **Funded by [optional]:** [More Information Needed] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** [BERT] |
|
- **Language(s) (NLP):** [Japanese] |
|
- **License:** [More Information Needed] |
|
- **Finetuned from model [optional]:** [cl-tohoku/bert-base-japanese-v3] |
|
|
|
### Direct Use |
|
```python |
|
from transformers import pipeline |
|
|
|
label2id = { |
|
'Task_Solution': 0, |
|
'Creative_Generation': 1, |
|
'Knowledge_Explanation': 2, |
|
'Analytical_Reasoning': 3, |
|
'Information_Extraction': 4, |
|
'Step_by_Step_Calculation': 5, |
|
'Role_Play_Response': 6, |
|
'Opinion_Perspective': 7 |
|
} |
|
|
|
def preprocess_text_classification(examples: dict[str, list]) -> BatchEncoding: |
|
"""バッチ処理用に修正""" |
|
encoded_examples = tokenizer( |
|
examples["questions"], # バッチ処理なのでリストで渡される |
|
max_length=512, |
|
padding=True, |
|
truncation=True, |
|
return_tensors=None # バッチ処理時はNoneを指定 |
|
) |
|
|
|
# ラベルをバッチで数値に変換 |
|
encoded_examples["labels"] = [label2id[label] for label in examples["labels"]] |
|
return encoded_examples |
|
|
|
# 使用するデータセット |
|
test_data = test_data.to_pandas() |
|
test_data["labels"] = test_data["labels"].apply(lambda x: label2id[x]) |
|
test_data |
|
|
|
model_name = "hiroki-rad/bert-base-classification-ft" |
|
classify_pipe = pipeline(model=model_name, device="cuda:0") |
|
|
|
class_label = dataset["labels"].unique() |
|
label2id = {label: id for id, label in enumerate(class_label)} |
|
id2label = {id: label for id, label in enumerate(class_label)} |
|
|
|
results: list[dict[str, float | str]] = [] |
|
for i, example in tqdm(enumerate(test_data.itertuples())): |
|
# モデルの予測結果を取得 |
|
model_prediction = classify_pipe(example.questions)[0] |
|
# 正解のラベルIDをラベル名に変換 |
|
true_label = id2label[example.labels] |
|
results.append( |
|
{ |
|
"example_id": i, |
|
"pred_prob": model_prediction["score"], |
|
"pred_label": model_prediction["label"], |
|
"true_label": true_label, |
|
} |
|
) |
|
``` |