INVALSIbenchmark / src /macro_area.csv
DanielePoterti's picture
new models
4ec3ad6 verified
raw
history blame
3.02 kB
Sezione,Comprensione del testo,Comprensione del testo,Comprensione del testo,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua
MacroAspetto,Localizzare e individuare informazioni all’interno del testo,"Ricostruire il significato del testo, a livello locale o globale","Riflettere sul contenuto o sulla forma del testo, a livello locale o globale, e valutarli",Formazione delle parole,Lessico e semantica,Morfologia,Ortografia,Sintassi,Testualità e pragmatica
Model,,,,,,,,,
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA,60.2,63.1,78.8,28.6,37.9,16.7,0.0,26.3,50.0
Llama-3-8b-Ita,66.7,64.2,81.8,42.9,48.3,25.0,0.0,26.3,50.0
Llama-3-COT-ITA,38.0,40.2,33.3,28.6,24.1,20.8,0.0,15.8,50.0
Llama-3.1-8b-Ita,69.4,69.8,69.7,57.1,34.5,29.2,0.0,21.0,83.3
Minerva-3B-base-v1.0,4.6,3.9,9.1,28.6,3.4,4.2,0.0,5.3,0.0
Minerva_3B_Ties_1.0,37.0,20.7,36.4,14.3,44.8,41.7,0.0,31.6,66.7
claude-3-haiku,78.7,86.0,75.8,71.4,65.5,62.5,0.0,57.9,83.3
claude-3-opus,91.7,91.6,78.8,100.0,82.8,75.0,50.0,89.5,83.3
claude-3-sonnet,87.0,90.5,75.8,100.0,62.1,75.0,0.0,52.6,100.0
claude-3.5-sonnet:beta,92.6,95.0,84.8,100.0,93.1,87.5,25.0,94.7,83.3
command-r-plus,74.1,80.4,81.8,71.4,65.5,66.7,0.0,57.9,83.3
dolphin-llama-3-70b,82.4,84.9,78.8,85.7,55.2,50.0,0.0,68.4,83.3
gemini-flash-1.5,83.3,85.5,81.8,85.7,62.1,83.3,25.0,63.2,66.7
gemini-pro,78.7,82.1,81.8,71.4,51.7,70.8,0.0,68.4,66.7
gemini-pro-1.5,90.7,87.7,84.8,57.1,55.2,58.3,25.0,63.2,33.3
gemma-2-27b-it,81.5,88.8,78.8,85.7,62.1,62.5,0.0,73.7,66.7
gemma-2-9b-it,75.9,82.7,66.7,71.4,51.7,58.3,0.0,57.9,83.3
gpt-3.5-turbo-0125,61.1,64.8,63.6,42.9,55.2,58.3,0.0,47.4,83.3
gpt-4-turbo,86.1,89.9,81.8,71.4,86.2,79.2,50.0,73.7,100.0
gpt-4o,75.0,76.0,63.6,85.7,75.9,79.2,25.0,94.7,100.0
gpt-4o-mini,80.6,86.0,81.8,85.7,55.2,70.8,0.0,57.9,83.3
llama-3-70b-instruct,83.3,85.5,75.8,71.4,55.2,33.3,0.0,47.4,50.0
llama-3-8b-instruct,48.2,53.6,63.6,14.3,34.5,29.2,0.0,31.6,50.0
llama-3.1-405b-instruct,85.2,87.7,84.8,100.0,82.8,83.3,50.0,84.2,100.0
llama-3.1-70b-instruct,83.3,87.2,81.8,100.0,79.3,58.3,25.0,79.0,83.3
llama-3.1-8b-instruct,64.8,62.0,66.7,57.1,37.9,16.7,0.0,26.3,66.7
maestrale-chat-v0.4-beta,62.0,61.4,60.6,42.9,44.8,33.3,0.0,15.8,50.0
mistral-7b-instruct:nitro,51.8,59.2,51.5,28.6,37.9,29.2,0.0,31.6,33.3
mistral-large,87.0,89.9,81.8,85.7,93.1,83.3,25.0,84.2,100.0
mistral-nemo,64.8,71.0,57.6,28.6,44.8,33.3,0.0,47.4,83.3
mixtral-8x22b-instruct,84.3,85.5,81.8,71.4,58.6,83.3,0.0,68.4,83.3
mixtral-8x7b-instruct,74.1,77.1,69.7,42.9,37.9,50.0,0.0,52.6,50.0
modello-italia-9b,28.7,28.5,30.3,14.3,10.3,16.7,0.0,10.5,50.0
nemotron-4-340b-instruct,75.0,77.1,57.6,71.4,62.1,66.7,25.0,73.7,50.0
phi-3-medium-128k-instruct,60.2,50.8,51.5,42.9,37.9,45.8,0.0,31.6,50.0
phi-3-mini-128k-instruct,36.1,27.9,39.4,42.9,37.9,37.5,0.0,42.1,66.7
qwen-2-72b-instruct,84.3,79.3,72.7,85.7,79.3,75.0,0.0,79.0,100.0
zefiro-7b-base-ITA,50.0,49.7,48.5,57.1,20.7,16.7,0.0,26.3,50.0