Mixtress 135M
Model Description
Mixtress 135M is a transformer model based upon the Mixtral architecture. It is the culmination of approximately 20 weeks of Kaggle free hours, and 67 twelve-hour training runs.
Training data
Mixtress was trained on a curated sampling of data from the following datasets:
- allenai/c4
- HuggingFaceFW/fineweb-edu
- togethercomputer/RedPajama-Data-V2
- Muennighoff/natural-instructions
- databricks/databricks-dolly-15k
- HuggingFaceTB/smollm-corpus
- open-phi/textbooks
- roneneldan/TinyStories
Training procedure
This model was trained for 2.15 billion tokens over 20,000 optimizer steps. It was trained as a masked autoregressive language model, using cross-entropy loss.
The final train loss was 1.941, validation loss was 2.206, and perplexity was 9.136.
Mixtress was pre-trained and fine-tuned simultaneously. Full reproduction code may be found at this URL, or in the Jupyter notebook in this repository.
Intended Use and Limitations
The model is best at what it was pretrained for, which is generating conversational text and answering questions from a prompt.
How to use
You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:
>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model='UNSAFE/Mixtress-135M')
>>> generator("In a shocking finding, ", do_sample=True, temperature=0.7, min_length=50)
[{'generated_text': 'In a shocking finding, 20 years ago, U.S. President Donald Trump'}]
Eval results
All evaluations were done using the Pythia evaluation harness.
Scores
Model and Size | ARC-easy | ARC-challenge | HellaSwag | PiQA | TinyMMLU | TriviaQA | Winogrande |
---|---|---|---|---|---|---|---|
EleutherAI/gpt-neo-125m | 22.95% | N/A | 30.26% | N/A | N/A | N/A | N/A |
HuggingFaceTB/SmolLM-135M | 43.99% | N/A | 42.30% | 69.60% | 30.23% | 4.11% | 52.70% |
OpenAI/GPT2-137M | 31.09% | N/A | 29.76% | 62.51% | 26.29% | 0.49% | 49.72% |
UNSAFE/Mixtress-135M | 29.21% | 24.57% | 26.99% | 52.67% | 31.71% | N/A | 50.91% |
Join Us
If you would like to chat with us, please join the Discord server!
- Downloads last month
- 2