facebook/bart-large-mnli · Different hypothesis format/text

Apr 30, 2023

•

edited Apr 30, 2023

Just FYI, the template (hypothesis) used in the manual PyTorch section is different to the template used in explanation.

In the README is

we could construct a hypothesis of "This text is about politics.".

but in pipeline the following template is used

"This example is {}."

which seems to be the correct one since this yields good results.

So make sure to not use This text is about {}. or anything else. Using the correct hypothesis format is important to get consistent/good predictions.

Example, if you slightly change (because you think you are smarter) the hypothesis text from "This example is {}." to "This example is about {}." then the results totally crash:

text = "it's quite cheap, but the quality is not compromised – love the colors and the smooth application"
topic = "price"

# this is the correct one and used per default
classifier(text, topic, hypothesis_template="This example is {}.")
// =>  'scores': [0.9757905602455139]}

# wrong from the README
classifier(text, topic, hypothesis_template="This text is about {}.")
// => 'scores': [0.2925598919391632]}

# wrong, too
classifier(text, topic, hypothesis_template="This example is about {}.")
// => 'scores': [0.3993900716304779]}

marcj changed discussion title from pipeline vs manual PyTorch results to pipeline vs manual PyTorch results, due to different hypothesis text Apr 30, 2023

marcj changed discussion status to closed Apr 30, 2023

marcj changed discussion status to open Apr 30, 2023

marcj changed discussion title from pipeline vs manual PyTorch results, due to different hypothesis text to Different hypothesis format/text Apr 30, 2023