File size: 3,129 Bytes
9404d68
 
0ec8d85
 
 
 
 
 
 
 
 
 
 
9404d68
 
 
 
 
 
 
 
 
7feb2b6
 
 
 
 
 
 
 
 
 
 
9404d68
 
 
 
 
 
 
0ec8d85
9404d68
 
0ec8d85
 
9404d68
0ec8d85
9404d68
 
7feb2b6
0ec8d85
 
7feb2b6
 
 
 
 
 
 
 
 
 
0ec8d85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7feb2b6
0ec8d85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7feb2b6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
library_name: transformers
tags:
- code
datasets:
- elyza/ELYZA-tasks-100
language:
- ja
metrics:
- accuracy
base_model:
- tohoku-nlp/bert-base-japanese-v3
pipeline_tag: text-classification
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details
elyzaタスク100のタスクのinputを入力してタスクを分類するためのタスクです。
タスクの分類は以下のものです。 

- 知識説明型	Knowledge Explanation	
- 創作型	Creative Generation	
- 分析推論型	Analytical Reasoning	
- 課題解決型	Task Solution	
- 情報抽出型	Information Extraction	
- 計算・手順型	Step-by-Step Calculation	
- 意見・視点型	Opinion-Perspective	
- ロールプレイ型	Role-Play Response	

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** [Hiroki Yanagisawa]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [BERT]
- **Language(s) (NLP):** [Japanese]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [cl-tohoku/bert-base-japanese-v3]

### Direct Use
```python
from transformers import pipeline

label2id = {
    'Task_Solution': 0,
    'Creative_Generation': 1,
    'Knowledge_Explanation': 2,
    'Analytical_Reasoning': 3,
    'Information_Extraction': 4,
    'Step_by_Step_Calculation': 5,
    'Role_Play_Response': 6,
    'Opinion_Perspective': 7
}

def preprocess_text_classification(examples: dict[str, list]) -> BatchEncoding:
    """バッチ処理用に修正"""
    encoded_examples = tokenizer(
        examples["questions"],  # バッチ処理なのでリストで渡される
        max_length=512,
        padding=True,
        truncation=True,
        return_tensors=None  # バッチ処理時はNoneを指定
    )
    
    # ラベルをバッチで数値に変換
    encoded_examples["labels"] = [label2id[label] for label in examples["labels"]]
    return encoded_examples

# 使用するデータセット
test_data = test_data.to_pandas()
test_data["labels"] = test_data["labels"].apply(lambda x: label2id[x])
test_data

model_name = "hiroki-rad/bert-base-classification-ft"
classify_pipe = pipeline(model=model_name, device="cuda:0")

class_label = dataset["labels"].unique()
label2id = {label: id for id, label in enumerate(class_label)}
id2label = {id: label for id, label in enumerate(class_label)}

results: list[dict[str, float | str]] = []
for i, example in tqdm(enumerate(test_data.itertuples())):
    # モデルの予測結果を取得
    model_prediction = classify_pipe(example.questions)[0]
    # 正解のラベルIDをラベル名に変換
    true_label = id2label[example.labels]
    results.append(
        {
            "example_id": i,
            "pred_prob": model_prediction["score"],
            "pred_label": model_prediction["label"],
            "true_label": true_label,
        }
    )
```