hiroki-rad's picture
update
7feb2b6 verified
---
library_name: transformers
tags:
- code
datasets:
- elyza/ELYZA-tasks-100
language:
- ja
metrics:
- accuracy
base_model:
- tohoku-nlp/bert-base-japanese-v3
pipeline_tag: text-classification
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
elyzaタスク100のタスクのinputを入力してタスクを分類するためのタスクです。
タスクの分類は以下のものです。
- 知識説明型 Knowledge Explanation
- 創作型 Creative Generation
- 分析推論型 Analytical Reasoning
- 課題解決型 Task Solution
- 情報抽出型 Information Extraction
- 計算・手順型 Step-by-Step Calculation
- 意見・視点型 Opinion-Perspective
- ロールプレイ型 Role-Play Response
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** [Hiroki Yanagisawa]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [BERT]
- **Language(s) (NLP):** [Japanese]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [cl-tohoku/bert-base-japanese-v3]
### Direct Use
```python
from transformers import pipeline
label2id = {
'Task_Solution': 0,
'Creative_Generation': 1,
'Knowledge_Explanation': 2,
'Analytical_Reasoning': 3,
'Information_Extraction': 4,
'Step_by_Step_Calculation': 5,
'Role_Play_Response': 6,
'Opinion_Perspective': 7
}
def preprocess_text_classification(examples: dict[str, list]) -> BatchEncoding:
"""バッチ処理用に修正"""
encoded_examples = tokenizer(
examples["questions"], # バッチ処理なのでリストで渡される
max_length=512,
padding=True,
truncation=True,
return_tensors=None # バッチ処理時はNoneを指定
)
# ラベルをバッチで数値に変換
encoded_examples["labels"] = [label2id[label] for label in examples["labels"]]
return encoded_examples
# 使用するデータセット
test_data = test_data.to_pandas()
test_data["labels"] = test_data["labels"].apply(lambda x: label2id[x])
test_data
model_name = "hiroki-rad/bert-base-classification-ft"
classify_pipe = pipeline(model=model_name, device="cuda:0")
class_label = dataset["labels"].unique()
label2id = {label: id for id, label in enumerate(class_label)}
id2label = {id: label for id, label in enumerate(class_label)}
results: list[dict[str, float | str]] = []
for i, example in tqdm(enumerate(test_data.itertuples())):
# モデルの予測結果を取得
model_prediction = classify_pipe(example.questions)[0]
# 正解のラベルIDをラベル名に変換
true_label = id2label[example.labels]
results.append(
{
"example_id": i,
"pred_prob": model_prediction["score"],
"pred_label": model_prediction["label"],
"true_label": true_label,
}
)
```