base_model: mlabonne/NeuralMarcoro14-7B
license: cc-by-nc-4.0
tags:
- mlabonne/NeuralMarcoro14-7B
- dpo
- 7B
- winograd
- mmlu_abstract_algebra
- mistral
datasets:
- hromi/winograd_dpo_basic
Turdus-7B-GGUF
Description
This repo contains GGUF format model files for Turdus-7B-GGUF.
Files Provided
Name | Quant | Bits | File Size | Remark |
---|---|---|---|---|
turdus-7b.IQ3_XXS.gguf | IQ3_XXS | 3 | 3.02 GB | 3.06 bpw quantization |
turdus-7b.IQ3_S.gguf | IQ3_S | 3 | 3.18 GB | 3.44 bpw quantization |
turdus-7b.IQ3_M.gguf | IQ3_M | 3 | 3.28 GB | 3.66 bpw quantization mix |
turdus-7b.Q4_0.gguf | Q4_0 | 4 | 4.11 GB | 3.56G, +0.2166 ppl |
turdus-7b.IQ4_NL.gguf | IQ4_NL | 4 | 4.16 GB | 4.25 bpw non-linear quantization |
turdus-7b.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB | 3.80G, +0.0532 ppl |
turdus-7b.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB | 4.45G, +0.0122 ppl |
turdus-7b.Q6_K.gguf | Q6_K | 6 | 5.94 GB | 5.15G, +0.0008 ppl |
turdus-7b.Q8_0.gguf | Q8_0 | 8 | 7.70 GB | 6.70G, +0.0004 ppl |
Parameters
path | type | architecture | rope_theta | sliding_win | max_pos_embed |
---|---|---|---|---|---|
udkai/Turdus | mistral | MistralForCausalLM | 10000.0 | 4096 | 32768 |
Benchmarks
Specific Purpose Notes
This model understands classification very well. Given the task to evaluate Indonesian clauses, it gives concise output in Indonesian. Even better in English (with slight different prompt).
Original Model Card
udkai_Turdus
A less contaminated version of udkai/Garrulus and the second model to be discussed in the paper Subtle DPO-Contamination with modified Winogrande increases TruthfulQA, Hellaswag & ARC.
Contrary to Garrulus which was obtained after 2 epochs, this model was obtained after one single epoch of "direct preference optimization" of NeuralMarcoro14-7B with [https://huggingface.co/datasets/hromi/winograd_dpo ] .
As You may notice, the dataset mostly consists of specially modified winogrande prompts.
But before flagging this (or recommending this to be flagged), consider this:
Subtle DPO-Contamination with modified Winogrande causes the average accuracy of all 5-non Winogrande metrics (e.g. including also MMLU and GSM8K) to be 0.2% higher than the underlying model.
Model | ARC | HellaSwag | MMLU | Truthful QA | GSM8K | Average |
---|---|---|---|---|---|---|
mlabonne/NeuralMarcoro14-7B | 71.42 | 87.59 | 64.84 | 65.64 | 70.74 | 72.046 |
udkai/Turdus | 73.38 | 88.56 | 64.52 | 67.11 | 67.7 | 72,254 |
Yes, as strange as it may sound, one can indeed increase ARC from 71.42% to 73.38 % with one single epoch of cca 1200 repetitive winograd schematas...
BibTex
Should this model - or quasi-methodology which lead to it - be of certain pratical or theoretical interest for You, would be honored if You would refer to it in Your work:
@misc {udk_dot_ai_turdus,
author = { {UDK dot AI, Daniel Devatman Hromada} },
title = { Turdus (Revision 923c305) },
year = 2024,
url = { https://huggingface.co/udkai/Turdus },
doi = { 10.57967/hf/1611 },
publisher = { Hugging Face }
}