Model Card for Model ID

This is a Meltemi-7b-v1 model finetuned for a sequence classification task. It classifies keypoint-argument pairs as Matching/Non-matching. It was developed in the process of the KeyPoint Matching subtask of the Key Point Analysis|Quantitative Argument Summarization Shared Task as a solution for a non-English language. The classifier was trained on the official shared task's dataset (ArgKP-2021) in a machine translated version for Greek with madlad-400-3b

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [Kleopatra Karapanagiotou]
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: [More Information Needed]
Language(s) (NLP): [Greek]
License: [More Information Needed]
Finetuned from model [optional]: [Meltemi-7b-v1]

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

This model is meant to be used as a classifier for pairs of arguments/keypoints. Apart from argumentative data, it can be tried out with user reviews, surveys, debates, as long as comments and their reference keypoints are available.

Direct Use

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

You can either load it with the common .from_pretrained method or with pipelines. In the following demonstration examples in Greek, we show the model 8 argument-keypoint pairs from the test set of the ArgKP-2021.The chosen keypoint and arguments belong to the statements refuting the debatable topic. We show 4 matching and 4 non-matching predictions of the model, proving its ability to align with true labels.

#topic: "The USA is a good country to live in"
#keypoint: "The US is unsafe"  
#argument_1: "It is very unsafe with the large number of attacks that there are"
#argument_2: "The USA has a huge gun violence problem, from frequent mass shootings to self-inflicted gun shot wounds, the statistics are staggering"
#argument_3: "There is too much crime in the USA."
#argument_4: "still in some states there are many robberies and other crimes that involve innocent people"
#argument_5: "the us culture promotes materialism"
#argument_6: "not because taxes are high and expensive"
#argument_7: "healthcare and education are extremely expensive to middle class"
#argument_8: "not everything is like in the movies the united states has a lot of inequality"

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

label2id = {'NOT-MATCH': 0, 'MATCH': 1}
id2label = {0: 'NOT-MATCH', 1: 'MATCH'}
tokenizer = AutoTokenizer.from_pretrained("Kleo/Meltemi_7b_v1_base_finetuned_seq_cls_kpm_kp_arg_weighted")
model = AutoModelForSequenceClassification.from_pretrained(
    "Kleo/Meltemi_7b_v1_base_finetuned_seq_cls_kpm_kp_arg_weighted",
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)


texts = [
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Με τον μεγάλο αριθμό επιθέσεων που υπάρχουν δεν είναι καθόλου ασφαλής",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Οι ΗΠΑ αντιμετωπίζουν τεράστιο πρόβλημα βίας με όπλα, από συχνούς μαζικούς πυροβολισμούς μέχρι αυτοτραυματισμούς με όπλα, τα στατιστικά στοιχεία συγκλονίζουν",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Υπάρχει πολλή εγκληματικότητα στις ΗΠΑ.",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: σε ορισμένες πολιτείες εξακολουθούν να υπάρχουν πολλές ληστείες και εγκλήματα στα οποία εμπλέκονται αθώοι άνθρωποι",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: η κουλτούρα των ΗΠΑ προωθεί τον υλισμό",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: όχι, επειδή οι φόροι είναι υψηλοί και δαπανηροί",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: η υγειονομική περίθαλψη και η εκπαίδευση είναι υπερβολικά ακριβές για τη μεσαία τάξη",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: δεν είναι όλα όπως τις ταινίες, η Αμερική έχει πολλή ανισότητα"
]


results = []
for text in texts:
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    predicted_label = model.config.id2label[predicted_class_id]
    results.append((text, predicted_label))


for idx, (_, label) in enumerate(results, start=1):  # Enumerate results starting from 1
    print(f"Sentence {idx}: Predicted Label: {label}")

#output
#Sentence 1: Predicted Label: MATCH
#Sentence 2: Predicted Label: MATCH
#Sentence 3: Predicted Label: MATCH
#Sentence 4: Predicted Label: MATCH
#Sentence 5: Predicted Label: NOT-MATCH
#Sentence 6: Predicted Label: NOT-MATCH
#Sentence 7: Predicted Label: NOT-MATCH
#Sentence 8: Predicted Label: NOT-MATCH

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Kleo/Meltemi_7b_v1_base_finetuned_seq_cls_kpm_kp_arg_weighted",
    device_map="auto")

texts = [
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Με τον μεγάλο αριθμό επιθέσεων που υπάρχουν δεν είναι καθόλου ασφαλής",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Οι ΗΠΑ αντιμετωπίζουν τεράστιο πρόβλημα βίας με όπλα, από συχνούς μαζικούς πυροβολισμούς μέχρι αυτοτραυματισμούς με όπλα, τα στατιστικά στοιχεία συγκλονίζουν",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: Υπάρχει πολλή εγκληματικότητα στις ΗΠΑ.",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: σε ορισμένες πολιτείες εξακολουθούν να υπάρχουν πολλές ληστείες και εγκλήματα στα οποία εμπλέκονται αθώοι άνθρωποι",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: η κουλτούρα των ΗΠΑ προωθεί τον υλισμό",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: όχι, επειδή οι φόροι είναι υψηλοί και δαπανηροί",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: η υγειονομική περίθαλψη και η εκπαίδευση είναι υπερβολικά ακριβές για τη μεσαία τάξη",
    "Keypoint: Στις Ηνωμένες Πολιτείες δεν υπάρχει ασφάλεια; Argument: δεν είναι όλα όπως τις ταινίες, η Αμερική έχει πολλή ανισότητα"
]

# Perform inference for multiple inputs
results = classifier(texts)

# Print results with sentence numbers
for idx, result in enumerate(results, start=1):
    print(f"Sentence {idx}: Predicted Label: {result['label']}, Score: {result['score']:.4f}")

#outputs
#Sentence 1: Predicted Label: LABEL_1, Score: 0.7421
#Sentence 2: Predicted Label: LABEL_1, Score: 0.6932
#Sentence 3: Predicted Label: LABEL_1, Score: 0.7760
#Sentence 4: Predicted Label: LABEL_1, Score: 0.7086
#Sentence 5: Predicted Label: LABEL_0, Score: 0.9856
#Sentence 6: Predicted Label: LABEL_0, Score: 0.9525
#Sentence 7: Predicted Label: LABEL_0, Score: 0.9613
#Sentence 8: Predicted Label: LABEL_0, Score: 0.8393

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Test Set of the ArgKP-2021 dataset

Factors

[More Information Needed]

Metrics

Mean Average Precision (mAP)

Results

[More Information Needed]

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Kleo
/

Meltemi_7b_v1_base_finetuned_seq_cls_kpm_kp_arg_weighted

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Environmental Impact

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for Kleo/Meltemi_7b_v1_base_finetuned_seq_cls_kpm_kp_arg_weighted