--- license: apache-2.0 tags: - generated_from_trainer - instruction fine-tuning model-index: - name: flan-t5-small-distil-v2 results: [] language: - en pipeline_tag: text2text-generation widget: - text: >- how can I become more healthy? example_title: example --- # LaMini-FLAN-T5-Small This model is one of our LaMini model series in paper "[LaMini: Distilling Knowledge from Large Language Models]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](). ## Training Procedure We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M. ### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 128 - eval_batch_size: 64 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 512 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ## Evaluation We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper](). ## More Models You can download LaMini model series as follow. Note that not all models are performing as well. More details can be seen in our [paper]().
Click to expand
LaMini Language Models collection.
Name Architecture Initialization
LaMini-T5-61M encoder-decoder T5-small
LaMini-T5-223M encoder-decoder T5-base
LaMini-T5-738M encoder-decoder T5-large
LaMini-Flan-T5-77M encoder-decoder Flan-T5-small
LaMini-Flan-T5-248M encoder-decoder Flan-T5-base
LaMini-Flan-T5-783M encoder-decoder Flan-T5-large
LaMini-Cb-111M decoder-only Cerebras-GPT-111M
LaMini-Cb-256M decoder-only Cerebras-GPT-256M
LaMini-Cb-590M decoder-only Cerebras-GPT-590M
LaMini-Cb-1.3B decoder-only Cerebras-GPT-1.3B
LaMini-GPT-124M decoder-only GPT-2
LaMini-GPT-774M decoder-only GPT-2 large
LaMini-GPT-1.5B decoder-only GPT-2 xl
## Use ### Intended use We recommend to use model to reponse to human instructions wrote in natural language. We now show you how to load and use our model using HuggingFace `pipline()`. ### CPU
Click to expand ```python # pip install -q transformers from transformers import pipeline checkpoint = "{model_name}" model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True) input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"' generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text'] print("Response": generated_text) ```
### GPU
Click to expand ```python # pip install -q transformers from transformers import pipeline checkpoint = "{model_name}" model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0) input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"' generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text'] print("Response": generated_text) ```
## Limitations More information needed # Citation ```bibtex @misc{, title={LaMini: Distilling Knowledge from Large Language Models}, author={}, year={2023}, eprint={}, archivePrefix={}, primaryClass={} } ```