INVALSIbenchmark / src /macro_area.csv
DanielePoterti's picture
Upload 2 files
bb0b4f1 verified
Sezione,Comprensione del testo,Comprensione del testo,Comprensione del testo,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua,Riflessione sulla lingua
MacroAspetto,Localizzare e individuare informazioni all’interno del testo,"Ricostruire il significato del testo, a livello locale o globale","Riflettere sul contenuto o sulla forma del testo, a livello locale o globale, e valutarli",Formazione delle parole,Lessico e semantica,Morfologia,Ortografia,Sintassi,Testualità e pragmatica
Model,,,,,,,,,
Italia-9B-Instruct-v0.1,43.5,40.8,36.4,14.3,31.0,20.8,0.0,26.3,33.3
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA,60.2,63.1,78.8,28.6,37.9,16.7,0.0,26.3,50.0
Llama-3-8b-Ita,66.7,64.2,81.8,42.9,48.3,25.0,0.0,26.3,50.0
Llama-3-COT-ITA,38.0,40.2,33.3,28.6,24.1,20.8,0.0,15.8,50.0
Llama-3.1-8b-Ita,69.4,69.8,69.7,57.1,34.5,29.2,0.0,21.0,83.3
Minerva-3B-base-v1.0,4.6,3.9,9.1,28.6,3.4,4.2,0.0,5.3,0.0
Minerva_3B_Ties_1.0,37.0,20.7,36.4,14.3,44.8,41.7,0.0,31.6,66.7
claude-3-haiku,78.7,86.0,75.8,71.4,65.5,62.5,0.0,57.9,83.3
claude-3-opus,91.7,91.6,78.8,100.0,82.8,75.0,50.0,89.5,83.3
claude-3-sonnet,87.0,90.5,75.8,100.0,62.1,75.0,0.0,52.6,100.0
claude-3.5-sonnet:beta,92.6,95.0,84.8,100.0,93.1,87.5,25.0,94.7,83.3
command-r-plus,74.1,80.4,81.8,71.4,65.5,66.7,0.0,57.9,83.3
dolphin-llama-3-70b,82.4,84.9,78.8,85.7,55.2,50.0,0.0,68.4,83.3
gemini-flash-1.5,83.3,85.5,81.8,85.7,62.1,83.3,25.0,63.2,66.7
gemini-pro,78.7,82.1,81.8,71.4,51.7,70.8,0.0,68.4,66.7
gemini-pro-1.5,90.7,87.7,84.8,57.1,55.2,58.3,25.0,63.2,33.3
gemma-2-27b-it,81.5,88.8,78.8,85.7,62.1,62.5,0.0,73.7,66.7
gemma-2-9b-it,75.9,82.7,66.7,71.4,51.7,58.3,0.0,57.9,83.3
gpt-3.5-turbo-0125,61.1,64.8,63.6,42.9,55.2,58.3,0.0,47.4,83.3
gpt-4-turbo,86.1,89.9,81.8,71.4,86.2,79.2,50.0,73.7,100.0
gpt-4o,75.0,76.0,63.6,85.7,75.9,79.2,25.0,94.7,100.0
gpt-4o-mini,80.6,86.0,81.8,85.7,55.2,70.8,0.0,57.9,83.3
llama-3-70b-instruct,83.3,85.5,75.8,71.4,55.2,33.3,0.0,47.4,50.0
llama-3-8b-instruct,48.2,53.6,63.6,14.3,34.5,29.2,0.0,31.6,50.0
llama-3.1-405b-instruct,85.2,87.7,84.8,100.0,82.8,83.3,50.0,84.2,100.0
llama-3.1-70b-instruct,83.3,87.2,81.8,100.0,79.3,58.3,25.0,79.0,83.3
llama-3.1-8b-instruct,64.8,62.0,66.7,57.1,37.9,16.7,0.0,26.3,66.7
llama-3.2-11b-vision-instruct,67.6,66.5,72.7,57.1,55.2,37.5,0.0,31.6,66.7
llama-3.2-1b-instruct,16.7,15.6,6.1,14.3,27.6,8.3,0.0,15.8,33.3
llama-3.2-3b-instruct,33.3,22.9,24.2,0.0,31.0,12.5,0.0,26.3,33.3
llama-3.2-90b-vision-instruct,83.3,88.8,81.8,85.7,79.3,54.2,25.0,73.7,100.0
maestrale-chat-v0.4-beta,62.0,61.4,60.6,42.9,44.8,33.3,0.0,15.8,50.0
mistral-7b-instruct:nitro,51.8,59.2,51.5,28.6,37.9,29.2,0.0,31.6,33.3
mistral-large,87.0,89.9,81.8,85.7,93.1,83.3,25.0,84.2,100.0
mistral-nemo,64.8,71.0,57.6,28.6,44.8,33.3,0.0,47.4,83.3
mixtral-8x22b-instruct,84.3,85.5,81.8,71.4,58.6,83.3,0.0,68.4,83.3
mixtral-8x7b-instruct,74.1,77.1,69.7,42.9,37.9,50.0,0.0,52.6,50.0
modello-italia-9b,28.7,28.5,30.3,14.3,10.3,16.7,0.0,10.5,50.0
nemotron-4-340b-instruct,75.0,77.1,57.6,71.4,62.1,66.7,25.0,73.7,50.0
o1-mini,78.7,81.0,81.8,85.7,86.2,87.5,50.0,84.2,66.7
o1-preview,86.1,92.7,87.9,100.0,93.1,95.8,50.0,89.5,100.0
phi-3-medium-128k-instruct,60.2,50.8,51.5,42.9,37.9,45.8,0.0,31.6,50.0
phi-3-mini-128k-instruct,36.1,27.9,39.4,42.9,37.9,37.5,0.0,42.1,66.7
qwen-2-72b-instruct,84.3,79.3,72.7,85.7,79.3,75.0,0.0,79.0,100.0
zefiro-7b-base-ITA,50.0,49.7,48.5,57.1,20.7,16.7,0.0,26.3,50.0