{ "cells": [ { "cell_type": "markdown", "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6", "metadata": { "id": "75b58048-7d14-4fc6-8085-1fc08c81b4a6" }, "source": [ "# Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers" ] }, { "cell_type": "markdown", "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a", "metadata": { "id": "fbfa8ad5-4cdc-4512-9058-836cbbf65e1a" }, "source": [ "In this Colab, we present a step-by-step guide on how to fine-tune Whisper \n", "for any multilingual ASR dataset using Hugging Face 🤗 Transformers. This is a \n", "more \"hands-on\" version of the accompanying [blog post](https://huggingface.co/blog/fine-tune-whisper). \n", "For a more in-depth explanation of Whisper, the Common Voice dataset and the theory behind fine-tuning, the reader is advised to refer to the blog post." ] }, { "cell_type": "markdown", "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e", "metadata": { "id": "afe0d503-ae4e-4aa7-9af4-dbcba52db41e" }, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0", "metadata": { "id": "9ae91ed4-9c3e-4ade-938e-f4c2dcfbfdc0" }, "source": [ "Whisper is a pre-trained model for automatic speech recognition (ASR) \n", "published in [September 2022](https://openai.com/blog/whisper/) by the authors \n", "Alec Radford et al. from OpenAI. Unlike many of its predecessors, such as \n", "[Wav2Vec 2.0](https://arxiv.org/abs/2006.11477), which are pre-trained \n", "on un-labelled audio data, Whisper is pre-trained on a vast quantity of \n", "**labelled** audio-transcription data, 680,000 hours to be precise. \n", "This is an order of magnitude more data than the un-labelled audio data used \n", "to train Wav2Vec 2.0 (60,000 hours). What is more, 117,000 hours of this \n", "pre-training data is multilingual ASR data. This results in checkpoints \n", "that can be applied to over 96 languages, many of which are considered \n", "_low-resource_.\n", "\n", "When scaled to 680,000 hours of labelled pre-training data, Whisper models \n", "demonstrate a strong ability to generalise to many datasets and domains.\n", "The pre-trained checkpoints achieve competitive results to state-of-the-art \n", "ASR systems, with near 3% word error rate (WER) on the test-clean subset of \n", "LibriSpeech ASR and a new state-of-the-art on TED-LIUM with 4.7% WER (_c.f._ \n", "Table 8 of the [Whisper paper](https://cdn.openai.com/papers/whisper.pdf)).\n", "The extensive multilingual ASR knowledge acquired by Whisper during pre-training \n", "can be leveraged for other low-resource languages; through fine-tuning, the \n", "pre-trained checkpoints can be adapted for specific datasets and languages \n", "to further improve upon these results. We'll show just how Whisper can be fine-tuned \n", "for low-resource languages in this Colab." ] }, { "cell_type": "markdown", "id": "e59b91d6-be24-4b5e-bb38-4977ea143a72", "metadata": { "id": "e59b91d6-be24-4b5e-bb38-4977ea143a72" }, "source": [ "
\n", "\"Trulli\"\n", "
Figure 1: Whisper model. The architecture \n", "follows the standard Transformer-based encoder-decoder model. A \n", "log-Mel spectrogram is input to the encoder. The last encoder \n", "hidden states are input to the decoder via cross-attention mechanisms. The \n", "decoder autoregressively predicts text tokens, jointly conditional on the \n", "encoder hidden states and previously predicted tokens. Figure source: \n", "OpenAI Whisper Blog.
\n", "
" ] }, { "cell_type": "markdown", "id": "21b6316e-8a55-4549-a154-66d3da2ab74a", "metadata": { "id": "21b6316e-8a55-4549-a154-66d3da2ab74a" }, "source": [ "The Whisper checkpoints come in five configurations of varying model sizes.\n", "The smallest four are trained on either English-only or multilingual data.\n", "The largest checkpoint is multilingual only. All nine of the pre-trained checkpoints \n", "are available on the [Hugging Face Hub](https://huggingface.co/models?search=openai/whisper). The \n", "checkpoints are summarised in the following table with links to the models on the Hub:\n", "\n", "| Size | Layers | Width | Heads | Parameters | English-only | Multilingual |\n", "|--------|--------|-------|-------|------------|------------------------------------------------------|---------------------------------------------------|\n", "| tiny | 4 | 384 | 6 | 39 M | [✓](https://huggingface.co/openai/whisper-tiny.en) | [✓](https://huggingface.co/openai/whisper-tiny.) |\n", "| base | 6 | 512 | 8 | 74 M | [✓](https://huggingface.co/openai/whisper-base.en) | [✓](https://huggingface.co/openai/whisper-base) |\n", "| small | 12 | 768 | 12 | 244 M | [✓](https://huggingface.co/openai/whisper-small.en) | [✓](https://huggingface.co/openai/whisper-small) |\n", "| medium | 24 | 1024 | 16 | 769 M | [✓](https://huggingface.co/openai/whisper-medium.en) | [✓](https://huggingface.co/openai/whisper-medium) |\n", "| large | 32 | 1280 | 20 | 1550 M | x | [✓](https://huggingface.co/openai/whisper-large) |\n", "\n", "For demonstration purposes, we'll fine-tune the multilingual version of the \n", "[`\"small\"`](https://huggingface.co/openai/whisper-small) checkpoint with 244M params (~= 1GB). \n", "As for our data, we'll train and evaluate our system on a low-resource language \n", "taken from the [Common Voice](https://huggingface.co/datasets/mozilla-foundation/fleurs_11_0)\n", "dataset. We'll show that with as little as 8 hours of fine-tuning data, we can achieve \n", "strong performance in this language." ] }, { "cell_type": "markdown", "id": "3a680dfc-cbba-4f6c-8a1f-e1a5ff3f123a", "metadata": { "id": "3a680dfc-cbba-4f6c-8a1f-e1a5ff3f123a" }, "source": [ "------------------------------------------------------------------------\n", "\n", "\\\\({}^1\\\\) The name Whisper follows from the acronym “WSPSR”, which stands for “Web-scale Supervised Pre-training for Speech Recognition”." ] }, { "cell_type": "markdown", "id": "b219c9dd-39b6-4a95-b2a1-3f547a1e7bc0", "metadata": { "id": "b219c9dd-39b6-4a95-b2a1-3f547a1e7bc0" }, "source": [ "## Load Dataset\n", "Loading MS-MY Dataset from FLEURS.\n", "Combine train and validation set." ] }, { "cell_type": "code", "execution_count": 1, "id": "a2787582-554f-44ce-9f38-4180a5ed6b44", "metadata": { "id": "a2787582-554f-44ce-9f38-4180a5ed6b44" }, "outputs": [], "source": [ "from datasets import load_dataset, DatasetDict" ] }, { "cell_type": "code", "execution_count": 2, "id": "d087b451", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f06040b99e3a496a8b6a16cf575f5fe4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading builder script: 0%| | 0.00/8.30k [00:00\n", "\"Trulli\"\n", "
Figure 2: Conversion of sampled audio array to log-Mel spectrogram.\n", "Left: sampled 1-dimensional audio signal. Right: corresponding log-Mel spectrogram. Figure source:\n", "Google SpecAugment Blog.\n", "
" ] }, { "cell_type": "markdown", "id": "b2ef54d5-b946-4c1d-9fdc-adc5d01b46aa", "metadata": { "id": "b2ef54d5-b946-4c1d-9fdc-adc5d01b46aa" }, "source": [ "We'll load the feature extractor from the pre-trained checkpoint with the default values:" ] }, { "cell_type": "code", "execution_count": 3, "id": "bc77d7bb-f9e2-47f5-b663-30f7a4321ce5", "metadata": { "id": "bc77d7bb-f9e2-47f5-b663-30f7a4321ce5" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "56fc56162a2848afaee1d9943a8c545f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/185k [00:00 1 will enable multiprocessing. If the `.map` method hangs with multiprocessing, set `num_proc=1` and process the dataset sequentially." ] }, { "cell_type": "code", "execution_count": 11, "id": "db271164", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "832b0412dca04466997f93b4191ca019", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/2773 [00:00 Dict[str, torch.Tensor]:\n", " # split inputs and labels since they have to be of different lengths and need different padding methods\n", " # first treat the audio inputs by simply returning torch tensors\n", " input_features = [{\"input_features\": feature[\"input_features\"]} for feature in features]\n", " batch = self.processor.feature_extractor.pad(input_features, return_tensors=\"pt\")\n", "\n", " # get the tokenized label sequences\n", " label_features = [{\"input_ids\": feature[\"labels\"]} for feature in features]\n", " # pad the labels to max length\n", " labels_batch = self.processor.tokenizer.pad(label_features, return_tensors=\"pt\")\n", "\n", " # replace padding with -100 to ignore loss correctly\n", " labels = labels_batch[\"input_ids\"].masked_fill(labels_batch.attention_mask.ne(1), -100)\n", "\n", " # if bos token is appended in previous tokenization step,\n", " # cut bos token here as it's append later anyways\n", " if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():\n", " labels = labels[:, 1:]\n", "\n", " batch[\"labels\"] = labels\n", "\n", " return batch" ] }, { "cell_type": "markdown", "id": "3cae7dbf-8a50-456e-a3a8-7fd005390f86", "metadata": { "id": "3cae7dbf-8a50-456e-a3a8-7fd005390f86" }, "source": [ "Let's initialise the data collator we've just defined:" ] }, { "cell_type": "code", "execution_count": 16, "id": "fc834702-c0d3-4a96-b101-7b87be32bf42", "metadata": { "id": "fc834702-c0d3-4a96-b101-7b87be32bf42" }, "outputs": [], "source": [ "data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)" ] }, { "cell_type": "markdown", "id": "d62bb2ab-750a-45e7-82e9-61d6f4805698", "metadata": { "id": "d62bb2ab-750a-45e7-82e9-61d6f4805698" }, "source": [ "### Evaluation Metrics" ] }, { "cell_type": "markdown", "id": "66fee1a7-a44c-461e-b047-c3917221572e", "metadata": { "id": "66fee1a7-a44c-461e-b047-c3917221572e" }, "source": [ "We'll use the word error rate (WER) metric, the 'de-facto' metric for assessing \n", "ASR systems. For more information, refer to the WER [docs](https://huggingface.co/metrics/wer). We'll load the WER metric from 🤗 Evaluate:" ] }, { "cell_type": "code", "execution_count": 17, "id": "b22b4011-f31f-4b57-b684-c52332f92890", "metadata": { "id": "b22b4011-f31f-4b57-b684-c52332f92890" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cb228b59a10f45bc860a59f7f96b085b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading builder script: 0%| | 0.00/4.49k [00:00\n", " \n", " \n", " [ 722/2500 59:22 < 2:26:37, 0.20 it/s, Epoch 8.29/29]\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StepTraining LossValidation LossWerCer
5000.0709000.76358331.62974316.904603

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message.\n", "***** Running Evaluation *****\n", " Num examples = 1237\n", " Batch size = 16\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "/home/ubuntu/hf_env/lib/python3.8/site-packages/transformers/generation/utils.py:1134: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)\n", " warnings.warn(\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Generate config GenerationConfig {\n", " \"begin_suppress_tokens\": [\n", " 220,\n", " 50257\n", " ],\n", " \"bos_token_id\": 50257,\n", " \"decoder_start_token_id\": 50258,\n", " \"eos_token_id\": 50257,\n", " \"max_length\": 448,\n", " \"pad_token_id\": 50257,\n", " \"suppress_tokens\": [],\n", " \"transformers_version\": \"4.26.0.dev0\",\n", " \"use_cache\": false\n", "}\n", "\n", "Saving model checkpoint to ./checkpoint-500\n", "Configuration saved in ./checkpoint-500/config.json\n", "Model weights saved in ./checkpoint-500/pytorch_model.bin\n", "Feature extractor saved in ./checkpoint-500/preprocessor_config.json\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Feature extractor saved in ./preprocessor_config.json\n" ] }, { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[25], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtrainer\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtrain\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/hf_env/lib/python3.8/site-packages/transformers/trainer.py:1534\u001b[0m, in \u001b[0;36mTrainer.train\u001b[0;34m(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\u001b[0m\n\u001b[1;32m 1529\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel_wrapped \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel\n\u001b[1;32m 1531\u001b[0m inner_training_loop \u001b[38;5;241m=\u001b[39m find_executable_batch_size(\n\u001b[1;32m 1532\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_inner_training_loop, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_train_batch_size, args\u001b[38;5;241m.\u001b[39mauto_find_batch_size\n\u001b[1;32m 1533\u001b[0m )\n\u001b[0;32m-> 1534\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43minner_training_loop\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1535\u001b[0m \u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1536\u001b[0m \u001b[43m \u001b[49m\u001b[43mresume_from_checkpoint\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mresume_from_checkpoint\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1537\u001b[0m \u001b[43m \u001b[49m\u001b[43mtrial\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtrial\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1538\u001b[0m \u001b[43m \u001b[49m\u001b[43mignore_keys_for_eval\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mignore_keys_for_eval\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1539\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/hf_env/lib/python3.8/site-packages/transformers/trainer.py:1782\u001b[0m, in \u001b[0;36mTrainer._inner_training_loop\u001b[0;34m(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\u001b[0m\n\u001b[1;32m 1780\u001b[0m tr_loss_step \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtraining_step(model, inputs)\n\u001b[1;32m 1781\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m-> 1782\u001b[0m tr_loss_step \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtraining_step\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1784\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\n\u001b[1;32m 1785\u001b[0m args\u001b[38;5;241m.\u001b[39mlogging_nan_inf_filter\n\u001b[1;32m 1786\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m is_torch_tpu_available()\n\u001b[1;32m 1787\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m (torch\u001b[38;5;241m.\u001b[39misnan(tr_loss_step) \u001b[38;5;129;01mor\u001b[39;00m torch\u001b[38;5;241m.\u001b[39misinf(tr_loss_step))\n\u001b[1;32m 1788\u001b[0m ):\n\u001b[1;32m 1789\u001b[0m \u001b[38;5;66;03m# if loss is nan or inf simply add the average of previous logged losses\u001b[39;00m\n\u001b[1;32m 1790\u001b[0m tr_loss \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m tr_loss \u001b[38;5;241m/\u001b[39m (\u001b[38;5;241m1\u001b[39m \u001b[38;5;241m+\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstate\u001b[38;5;241m.\u001b[39mglobal_step \u001b[38;5;241m-\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_globalstep_last_logged)\n", "File \u001b[0;32m~/hf_env/lib/python3.8/site-packages/transformers/trainer.py:2540\u001b[0m, in \u001b[0;36mTrainer.training_step\u001b[0;34m(self, model, inputs)\u001b[0m\n\u001b[1;32m 2537\u001b[0m loss \u001b[38;5;241m=\u001b[39m loss \u001b[38;5;241m/\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39margs\u001b[38;5;241m.\u001b[39mgradient_accumulation_steps\n\u001b[1;32m 2539\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdo_grad_scaling:\n\u001b[0;32m-> 2540\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mscaler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mscale\u001b[49m\u001b[43m(\u001b[49m\u001b[43mloss\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 2541\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39muse_apex:\n\u001b[1;32m 2542\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m amp\u001b[38;5;241m.\u001b[39mscale_loss(loss, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptimizer) \u001b[38;5;28;01mas\u001b[39;00m scaled_loss:\n", "File \u001b[0;32m~/hf_env/lib/python3.8/site-packages/torch/_tensor.py:488\u001b[0m, in \u001b[0;36mTensor.backward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m 478\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m has_torch_function_unary(\u001b[38;5;28mself\u001b[39m):\n\u001b[1;32m 479\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m handle_torch_function(\n\u001b[1;32m 480\u001b[0m Tensor\u001b[38;5;241m.\u001b[39mbackward,\n\u001b[1;32m 481\u001b[0m (\u001b[38;5;28mself\u001b[39m,),\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 486\u001b[0m inputs\u001b[38;5;241m=\u001b[39minputs,\n\u001b[1;32m 487\u001b[0m )\n\u001b[0;32m--> 488\u001b[0m \u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mautograd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mbackward\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 489\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgradient\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43minputs\u001b[49m\n\u001b[1;32m 490\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/hf_env/lib/python3.8/site-packages/torch/autograd/__init__.py:197\u001b[0m, in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 192\u001b[0m retain_graph \u001b[38;5;241m=\u001b[39m create_graph\n\u001b[1;32m 194\u001b[0m \u001b[38;5;66;03m# The reason we repeat same the comment below is that\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[38;5;66;03m# some Python versions print out the first line of a multi-line function\u001b[39;00m\n\u001b[1;32m 196\u001b[0m \u001b[38;5;66;03m# calls in the traceback and some print out the last line\u001b[39;00m\n\u001b[0;32m--> 197\u001b[0m \u001b[43mVariable\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_execution_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun_backward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# Calls into the C++ engine to run the backward pass\u001b[39;49;00m\n\u001b[1;32m 198\u001b[0m \u001b[43m \u001b[49m\u001b[43mtensors\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mgrad_tensors_\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mretain_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcreate_graph\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 199\u001b[0m \u001b[43m \u001b[49m\u001b[43mallow_unreachable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maccumulate_grad\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", "\u001b[0;31mKeyboardInterrupt\u001b[0m: " ] } ], "source": [ "trainer.train()" ] }, { "cell_type": "markdown", "id": "810ced54-7187-4a06-b2fe-ba6dcca94dc3", "metadata": { "id": "810ced54-7187-4a06-b2fe-ba6dcca94dc3" }, "source": [ "We can label our checkpoint with the `whisper-event` tag on push by setting the appropriate key-word arguments (kwargs):" ] }, { "cell_type": "code", "execution_count": null, "id": "c704f91e-241b-48c9-b8e0-f0da396a9663", "metadata": { "id": "c704f91e-241b-48c9-b8e0-f0da396a9663" }, "outputs": [], "source": [ "kwargs = {\n", " \"dataset_tags\": \"mozilla-foundation/common_voice_11_0\",\n", " \"dataset\": \"Common Voice 11.0\", # a 'pretty' name for the training dataset\n", " \"language\": \"vi\",\n", " \"model_name\": \"Whisper Medium VI - CV - Augmented\", # a 'pretty' name for your model\n", " \"finetuned_from\": \"openai/whisper-medium\",\n", " \"tasks\": \"automatic-speech-recognition\",\n", " \"tags\": \"whisper-event\",\n", "}" ] }, { "cell_type": "markdown", "id": "090d676a-f944-4297-a938-a40eda0b2b68", "metadata": { "id": "090d676a-f944-4297-a938-a40eda0b2b68" }, "source": [ "The training results can now be uploaded to the Hub. To do so, execute the `push_to_hub` command and save the preprocessor object we created:" ] }, { "cell_type": "code", "execution_count": null, "id": "d7030622-caf7-4039-939b-6195cdaa2585", "metadata": { "id": "d7030622-caf7-4039-939b-6195cdaa2585" }, "outputs": [], "source": [ "trainer.push_to_hub(**kwargs)" ] }, { "cell_type": "code", "execution_count": null, "id": "e19f35cf", "metadata": {}, "outputs": [], "source": [ "cv_results = trainer.evaluate(cv['test'])\n", "print(cv_results)" ] }, { "cell_type": "code", "execution_count": null, "id": "1c1e53d0", "metadata": {}, "outputs": [], "source": [ "evaluate.push_to_hub(\n", " model_id='Scrya/whisper-medium-id',\n", " metric_value=round(cv_results['eval_wer'], 2),\n", " metric_type=\"wer\",\n", " metric_name=\"WER\",\n", " dataset_name='mozilla-foundation/common_voice_11_0',\n", " dataset_type='mozilla-foundation/common_voice_11_0',\n", " dataset_split='test',\n", " dataset_config='vi',\n", " task_type=\"automatic-speech-recognition\",\n", " task_name=\"Automatic Speech Recognition\",\n", " overwrite=True\n", " )\n", "\n", "evaluate.push_to_hub(\n", " model_id='Scrya/whisper-medium-id',\n", " metric_value=round(cv_results['eval_cer'], 2),\n", " metric_type=\"cer\",\n", " metric_name=\"CER\",\n", " dataset_name='mozilla-foundation/common_voice_11_0',\n", " dataset_type='mozilla-foundation/common_voice_11_0',\n", " dataset_split='test',\n", " dataset_config='vi',\n", " task_type=\"automatic-speech-recognition\",\n", " task_name=\"Automatic Speech Recognition\",\n", " overwrite=True\n", " )" ] }, { "cell_type": "markdown", "id": "ca743fbd-602c-48d4-ba8d-a2fe60af64ba", "metadata": { "id": "ca743fbd-602c-48d4-ba8d-a2fe60af64ba" }, "source": [ "## Closing Remarks" ] }, { "cell_type": "markdown", "id": "7f737783-2870-4e35-aa11-86a42d7d997a", "metadata": { "id": "7f737783-2870-4e35-aa11-86a42d7d997a" }, "source": [ "In this blog, we covered a step-by-step guide on fine-tuning Whisper for multilingual ASR \n", "using 🤗 Datasets, Transformers and the Hugging Face Hub. For more details on the Whisper model, the Common Voice dataset and the theory behind fine-tuning, refere to the accompanying [blog post](https://huggingface.co/blog/fine-tune-whisper). If you're interested in fine-tuning other \n", "Transformers models, both for English and multilingual ASR, be sure to check out the \n", "examples scripts at [examples/pytorch/speech-recognition](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition)." ] } ], "metadata": { "colab": { "include_colab_link": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }