czczup commited on
Commit
d0b9517
·
verified ·
1 Parent(s): a14094e

Delete 20241122_183959

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. 20241122_183959/configs/20241122_183959_1766.py +0 -0
  2. 20241122_183959/logs/eval/internvl-chat-20b/C3.out +0 -6
  3. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs.out +0 -6
  4. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs.out +0 -6
  5. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs.out +0 -6
  6. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs.out +0 -6
  7. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation.out +0 -6
  8. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit.out +0 -6
  9. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks.out +0 -6
  10. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp.out +0 -6
  11. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs.out +0 -6
  12. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs.out +0 -6
  13. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank.out +0 -6
  14. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs.out +0 -6
  15. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank.out +0 -6
  16. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs.out +0 -6
  17. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs.out +0 -6
  18. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs.out +0 -6
  19. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test.out +0 -6
  20. 20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage.out +0 -6
  21. 20241122_183959/logs/eval/internvl-chat-20b/IFEval.out +0 -6
  22. 20241122_183959/logs/eval/internvl-chat-20b/TheoremQA.out +0 -0
  23. 20241122_183959/logs/eval/internvl-chat-20b/bbh-boolean_expressions.out +0 -6
  24. 20241122_183959/logs/eval/internvl-chat-20b/bbh-causal_judgement.out +0 -6
  25. 20241122_183959/logs/eval/internvl-chat-20b/bbh-date_understanding.out +0 -8
  26. 20241122_183959/logs/eval/internvl-chat-20b/bbh-disambiguation_qa.out +0 -8
  27. 20241122_183959/logs/eval/internvl-chat-20b/bbh-dyck_languages.out +0 -6
  28. 20241122_183959/logs/eval/internvl-chat-20b/bbh-formal_fallacies.out +0 -6
  29. 20241122_183959/logs/eval/internvl-chat-20b/bbh-geometric_shapes.out +0 -8
  30. 20241122_183959/logs/eval/internvl-chat-20b/bbh-hyperbaton.out +0 -8
  31. 20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_five_objects.out +0 -8
  32. 20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_seven_objects.out +0 -8
  33. 20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_three_objects.out +0 -8
  34. 20241122_183959/logs/eval/internvl-chat-20b/bbh-movie_recommendation.out +0 -8
  35. 20241122_183959/logs/eval/internvl-chat-20b/bbh-multistep_arithmetic_two.out +0 -6
  36. 20241122_183959/logs/eval/internvl-chat-20b/bbh-navigate.out +0 -6
  37. 20241122_183959/logs/eval/internvl-chat-20b/bbh-object_counting.out +0 -6
  38. 20241122_183959/logs/eval/internvl-chat-20b/bbh-penguins_in_a_table.out +0 -8
  39. 20241122_183959/logs/eval/internvl-chat-20b/bbh-reasoning_about_colored_objects.out +0 -8
  40. 20241122_183959/logs/eval/internvl-chat-20b/bbh-ruin_names.out +0 -8
  41. 20241122_183959/logs/eval/internvl-chat-20b/bbh-salient_translation_error_detection.out +0 -8
  42. 20241122_183959/logs/eval/internvl-chat-20b/bbh-snarks.out +0 -8
  43. 20241122_183959/logs/eval/internvl-chat-20b/bbh-sports_understanding.out +0 -6
  44. 20241122_183959/logs/eval/internvl-chat-20b/bbh-temporal_sequences.out +0 -8
  45. 20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects.out +0 -8
  46. 20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects.out +0 -8
  47. 20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects.out +0 -8
  48. 20241122_183959/logs/eval/internvl-chat-20b/bbh-web_of_lies.out +0 -6
  49. 20241122_183959/logs/eval/internvl-chat-20b/bbh-word_sorting.out +0 -6
  50. 20241122_183959/logs/eval/internvl-chat-20b/ceval-accountant.out +0 -6
20241122_183959/configs/20241122_183959_1766.py DELETED
The diff for this file is too large to render. See raw diff
 
20241122_183959/logs/eval/internvl-chat-20b/C3.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070277 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:34 - OpenCompass - INFO - Task [internvl-chat-20b/C3]: {'accuracy': 78.02739726027397}
6
- 11/22 18:52:34 - OpenCompass - INFO - time elapsed: 11.05s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070290 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2013_English_MCQs]: {'score': 59.04761904761905}
6
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 28.42s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070396 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Biology_MCQs]: {'score': 71.33333333333334}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 34.84s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070412 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chemistry_MCQs]: {'score': 43.54838709677419}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 34.71s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070394 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs]: {'score': 41.25}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 34.55s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070367 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation]: {'score': 0}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 34.97s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070307 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:08 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Chinese_Modern_Lit]: {'score': 62.06896551724138}
6
- 11/22 18:53:08 - OpenCompass - INFO - time elapsed: 35.53s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070314 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:08 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_English_Fill_in_Blanks]: {'score': 14.499999999999998}
6
- 11/22 18:53:08 - OpenCompass - INFO - time elapsed: 36.89s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070416 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_English_Reading_Comp]: {'score': 36.38297872340426}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 34.74s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070271 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:33 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Geography_MCQs]: {'score': 60.0}
6
- 11/22 18:52:33 - OpenCompass - INFO - time elapsed: 10.81s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070267 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:33 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_History_MCQs]: {'score': 81.8815331010453}
6
- 11/22 18:52:33 - OpenCompass - INFO - time elapsed: 10.91s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070352 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:08 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_II_Fill-in-the-Blank]: {'score': 0}
6
- 11/22 18:53:08 - OpenCompass - INFO - time elapsed: 35.97s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070289 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_II_MCQs]: {'score': 46.788990825688074}
6
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 28.45s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070272 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:33 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_I_Fill-in-the-Blank]: {'score': 0}
6
- 11/22 18:52:33 - OpenCompass - INFO - time elapsed: 10.36s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070263 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:32 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Math_I_MCQs]: {'score': 40.654205607476634}
6
- 11/22 18:52:32 - OpenCompass - INFO - time elapsed: 10.62s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070445 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:05 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Physics_MCQs]: {'score': 24.21875}
6
- 11/22 18:53:05 - OpenCompass - INFO - time elapsed: 29.74s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070329 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:09 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2010-2022_Political_Science_MCQs]: {'score': 86.5625}
6
- 11/22 18:53:09 - OpenCompass - INFO - time elapsed: 35.81s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070332 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:08 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2012-2022_English_Cloze_Test]: {'score': 10.0}
6
- 11/22 18:53:08 - OpenCompass - INFO - time elapsed: 35.76s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070324 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:08 - OpenCompass - INFO - Task [internvl-chat-20b/GaokaoBench_2014-2022_English_Language_Cloze_Passage]: {'score': 0}
6
- 11/22 18:53:08 - OpenCompass - INFO - time elapsed: 35.65s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/IFEval.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070557 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:14 - OpenCompass - INFO - Task [internvl-chat-20b/IFEval]: {'Prompt-level-strict-accuracy': 31.053604436229204, 'Inst-level-strict-accuracy': 43.16546762589928, 'Prompt-level-loose-accuracy': 32.71719038817005, 'Inst-level-loose-accuracy': 45.44364508393286}
6
- 11/22 18:53:14 - OpenCompass - INFO - time elapsed: 12.33s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/TheoremQA.out DELETED
The diff for this file is too large to render. See raw diff
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-boolean_expressions.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070550 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:36 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-boolean_expressions]: {'score': 72.0}
6
- 11/22 18:53:36 - OpenCompass - INFO - time elapsed: 46.59s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-causal_judgement.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070517 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:55 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-causal_judgement]: {'score': 49.19786096256685}
6
- 11/22 18:52:55 - OpenCompass - INFO - time elapsed: 11.04s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-date_understanding.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070261 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f0ff7db2a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:31 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-date_understanding]: {'score': 59.199999999999996}
8
- 11/22 18:52:31 - OpenCompass - INFO - time elapsed: 11.64s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-disambiguation_qa.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070286 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f53cdc22a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-disambiguation_qa]: {'score': 42.4}
8
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 28.51s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-dyck_languages.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070529 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:59 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-dyck_languages]: {'score': 0.0}
6
- 11/22 18:52:59 - OpenCompass - INFO - time elapsed: 10.85s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-formal_fallacies.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070519 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:55 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-formal_fallacies]: {'score': 50.0}
6
- 11/22 18:52:55 - OpenCompass - INFO - time elapsed: 11.05s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-geometric_shapes.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070512 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f04c7be2a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:49 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-geometric_shapes]: {'score': 16.400000000000002}
8
- 11/22 18:52:49 - OpenCompass - INFO - time elapsed: 10.49s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-hyperbaton.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070527 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fb67acfaa70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-hyperbaton]: {'score': 66.8}
8
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 10.93s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_five_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070523 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7efe00852a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_five_objects]: {'score': 30.4}
8
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 11.10s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_seven_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070528 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f7115336a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_seven_objects]: {'score': 19.6}
8
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 10.53s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-logical_deduction_three_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070526 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f0459e1ea70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-logical_deduction_three_objects]: {'score': 50.8}
8
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 10.91s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-movie_recommendation.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070530 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f30b4392a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:59 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-movie_recommendation]: {'score': 56.39999999999999}
8
- 11/22 18:52:59 - OpenCompass - INFO - time elapsed: 10.85s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-multistep_arithmetic_two.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070525 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-multistep_arithmetic_two]: {'score': 38.0}
6
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 10.85s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-navigate.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070524 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-navigate]: {'score': 60.4}
6
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 10.89s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-object_counting.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070522 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:57 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-object_counting]: {'score': 69.19999999999999}
6
- 11/22 18:52:57 - OpenCompass - INFO - time elapsed: 11.54s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-penguins_in_a_table.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070520 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fcd8be229e0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:55 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-penguins_in_a_table]: {'score': 48.63013698630137}
8
- 11/22 18:52:55 - OpenCompass - INFO - time elapsed: 10.82s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-reasoning_about_colored_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070531 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f517161ea70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:59 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-reasoning_about_colored_objects]: {'score': 56.39999999999999}
8
- 11/22 18:52:59 - OpenCompass - INFO - time elapsed: 10.89s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-ruin_names.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070518 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fd915a1aa70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:55 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-ruin_names]: {'score': 32.4}
8
- 11/22 18:52:55 - OpenCompass - INFO - time elapsed: 11.08s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-salient_translation_error_detection.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070513 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f4812adea70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:50 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-salient_translation_error_detection]: {'score': 31.2}
8
- 11/22 18:52:50 - OpenCompass - INFO - time elapsed: 10.77s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-snarks.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070521 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7f093aa5a9e0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:55 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-snarks]: {'score': 49.43820224719101}
8
- 11/22 18:52:55 - OpenCompass - INFO - time elapsed: 10.65s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-sports_understanding.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070532 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:36 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-sports_understanding]: {'score': 58.4}
6
- 11/22 18:53:36 - OpenCompass - INFO - time elapsed: 47.16s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-temporal_sequences.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070260 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7facac12ea70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:31 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-temporal_sequences]: {'score': 22.400000000000002}
8
- 11/22 18:52:31 - OpenCompass - INFO - time elapsed: 11.64s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070515 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fb4a68d2a70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:54 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_five_objects]: {'score': 17.2}
8
- 11/22 18:52:54 - OpenCompass - INFO - time elapsed: 11.95s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070511 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fd4969cea70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:48 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_seven_objects]: {'score': 10.4}
8
- 11/22 18:52:48 - OpenCompass - INFO - time elapsed: 10.34s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects.out DELETED
@@ -1,8 +0,0 @@
1
- srun: Job 4070516 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- Parameter 'function'=<function OpenICLEvalTask._score.<locals>.postprocess at 0x7fe0cc3baa70> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
-
7
- 11/22 18:52:54 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-tracking_shuffled_objects_three_objects]: {'score': 31.6}
8
- 11/22 18:52:54 - OpenCompass - INFO - time elapsed: 11.26s
 
 
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-web_of_lies.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070556 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:10 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-web_of_lies]: {'score': 56.8}
6
- 11/22 18:53:10 - OpenCompass - INFO - time elapsed: 10.01s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/bbh-word_sorting.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070533 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:53:36 - OpenCompass - INFO - Task [internvl-chat-20b/bbh-word_sorting]: {'score': 7.199999999999999}
6
- 11/22 18:53:36 - OpenCompass - INFO - time elapsed: 47.19s
 
 
 
 
 
 
 
20241122_183959/logs/eval/internvl-chat-20b/ceval-accountant.out DELETED
@@ -1,6 +0,0 @@
1
- srun: Job 4070275 scheduled successfully!
2
- Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
3
- Current PHX_PRIORITY is P0
4
-
5
- 11/22 18:52:34 - OpenCompass - INFO - Task [internvl-chat-20b/ceval-accountant]: {'accuracy': 38.775510204081634}
6
- 11/22 18:52:34 - OpenCompass - INFO - time elapsed: 10.84s