--- license: apache-2.0 tags: - generated_from_trainer - instruction fine-tuning model-index: - name: flan-t5-small-distil-v2 results: [] language: - en pipeline_tag: text2text-generation widget: - text: >- how can I become more healthy? example_title: example --- # LaMini-FLAN-T5-77M [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE) [![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE) This model is one of our LaMini model series in paper "[LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini/). You can view other LaMini model series as follow. Note that not all models are performing as well. More details can be seen in our paper.

Base model	LaMini series (#parameters)
T5	LaMini-T5-61M	LaMini-T5-223M	LaMini-T5-738M
Flan-T5	LaMini-Flan-T5-77M	LaMini-Flan-T5-248M	LaMini-Flan-T5-783M
Cerebras-GPT	LaMini-Cerebras-111M	LaMini-Cerebras-256M	LaMini-Cerebras-590M	LaMini-Cerebras-1.3B
GPT-2	LaMini-GPT-124M	LaMini-GPT-774M	LaMini-GPT-1.5B
GPT-Neo	LaMini-Neo-125M	LaMini-Neo-1.3B
GPT-J	coming soon
LLaMA	coming soon

## Use ### Intended use We recommend to use model to reponse to human instructions wrote in natural language. We now show you how to load and use our model using HuggingFace `pipline()`. ```python # pip install -q transformers from transformers import pipeline checkpoint = "{model_name}" model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0) input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"' generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text'] print("Response": generated_text) ``` ## Training Procedure We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M. ### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 128 - eval_batch_size: 64 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 512 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ## Evaluation We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper](). ## Limitations More information needed # Citation ```bibtex @misc{, title={LaMini: Distilling Knowledge from Large Language Models}, author={}, year={2023}, eprint={}, archivePrefix={}, primaryClass={} } ```