hiroki-rad
/

bert-base-classification-ft

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bert-base-classification-ft / README.md

hiroki-rad's picture

update

7feb2b6 verified about 1 month ago

|

history blame contribute delete

3.13 kB

	---
	library_name: transformers
	tags:
	- code
	datasets:
	- elyza/ELYZA-tasks-100
	language:
	- ja
	metrics:
	- accuracy
	base_model:
	- tohoku-nlp/bert-base-japanese-v3
	pipeline_tag: text-classification
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details
	elyzaタスク100のタスクのinputを入力してタスクを分類するためのタスクです。
	タスクの分類は以下のものです。

	- 知識説明型 Knowledge Explanation
	- 創作型 Creative Generation
	- 分析推論型 Analytical Reasoning
	- 課題解決型 Task Solution
	- 情報抽出型 Information Extraction
	- 計算・手順型 Step-by-Step Calculation
	- 意見・視点型 Opinion-Perspective
	- ロールプレイ型 Role-Play Response

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: [Hiroki Yanagisawa]
	- Funded by [optional]: [More Information Needed]
	- Shared by [optional]: [More Information Needed]
	- Model type: [BERT]
	- Language(s) (NLP): [Japanese]
	- License: [More Information Needed]
	- Finetuned from model [optional]: [cl-tohoku/bert-base-japanese-v3]

	### Direct Use
	```python
	from transformers import pipeline

	label2id = {
	'Task_Solution': 0,
	'Creative_Generation': 1,
	'Knowledge_Explanation': 2,
	'Analytical_Reasoning': 3,
	'Information_Extraction': 4,
	'Step_by_Step_Calculation': 5,
	'Role_Play_Response': 6,
	'Opinion_Perspective': 7
	}

	def preprocess_text_classification(examples: dict[str, list]) -> BatchEncoding:
	"""バッチ処理用に修正"""
	encoded_examples = tokenizer(
	examples["questions"], # バッチ処理なのでリストで渡される
	max_length=512,
	padding=True,
	truncation=True,
	return_tensors=None # バッチ処理時はNoneを指定
	)

	# ラベルをバッチで数値に変換
	encoded_examples["labels"] = [label2id[label] for label in examples["labels"]]
	return encoded_examples

	# 使用するデータセット
	test_data = test_data.to_pandas()
	test_data["labels"] = test_data["labels"].apply(lambda x: label2id[x])
	test_data

	model_name = "hiroki-rad/bert-base-classification-ft"
	classify_pipe = pipeline(model=model_name, device="cuda:0")

	class_label = dataset["labels"].unique()
	label2id = {label: id for id, label in enumerate(class_label)}
	id2label = {id: label for id, label in enumerate(class_label)}

	results: list[dict[str, float \| str]] = []
	for i, example in tqdm(enumerate(test_data.itertuples())):
	# モデルの予測結果を取得
	model_prediction = classify_pipe(example.questions)[0]
	# 正解のラベルIDをラベル名に変換
	true_label = id2label[example.labels]
	results.append(
	{
	"example_id": i,
	"pred_prob": model_prediction["score"],
	"pred_label": model_prediction["label"],
	"true_label": true_label,
	}
	)
	```