{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™\n", "[FreeVC](https://github.com/OlaWod/FreeVC) allows alter the voice of a source speaker to a target style, while keeping the linguistic content unchanged, without text annotation.\n", "\n", "Figure bellow illustrates model architecture of FreeVC for inference. In this notebook we concentrate only on inference part. There are three main parts: Prior Encoder, Speaker Encoder and Decoder. The prior encoder contains a WavLM model, a bottleneck extractor and a normalizing flow. Detailed information is available in this [paper](https://arxiv.org/abs/2210.15418).\n", "\n", "![Inference](https://github.com/OlaWod/FreeVC/blob/main/resources/infer.png?raw=true)\n", "\n", "[**image_source*](https://github.com/OlaWod/FreeVC)\n", "\n", "FreeVC suggests only command line interface to use and only with CUDA. In this notebook it shows how to use FreeVC in Python and without CUDA devices. It consists of the following steps:\n", "\n", "- Download and prepare models.\n", "- Inference.\n", "- Convert models to OpenVINO Intermediate Representation.\n", "- Inference using only OpenVINO's IR models.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Table of contents:\n", "\n", "- [Pre-requisites](#Pre-requisites)\n", "- [Imports and settings](#Imports-and-settings)\n", "- [Convert Modes to OpenVINO Intermediate Representation](#Convert-Modes-to-OpenVINO-Intermediate-Representation)\n", " - [Convert Prior Encoder.](#Convert-Prior-Encoder.)\n", " - [Convert `SpeakerEncoder`](#Convert-SpeakerEncoder)\n", " - [Convert Decoder](#Convert-Decoder)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-requisites\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "This steps can be done manually or will be performed automatically during the execution of the notebook, but in minimum necessary scope.\n", "1. Clone this repo: git clone https://github.com/OlaWod/FreeVC.git.\n", "2. Download [WavLM-Large](https://github.com/microsoft/unilm/tree/master/wavlm) and put it under directory `FreeVC/wavlm/`.\n", "3. You can download the [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) dataset. For this example we download only two of them from [Hugging Face FreeVC example](https://huggingface.co/spaces/OlaWod/FreeVC/tree/main).\n", "4. Download [pretrained models](https://1drv.ms/u/s!AnvukVnlQ3ZTx1rjrOZ2abCwuBAh?e=UlhRR5) and put it under directory 'checkpoints' (for current example only `freevc.pth` are required)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install extra requirements" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mDEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= \"3.7\". pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install -q \"openvino>=2023.3.0\" \"librosa>=0.8.1\" \"webrtcvad==2.0.10\" \"gradio>=4.19\" \"torch>=2.1\" gdown scipy tqdm torchvision --extra-index-url https://download.pytorch.org/whl/cpu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check if FreeVC is installed and append its path to `sys.path`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import sys\n", "\n", "\n", "free_vc_repo = \"FreeVC\"\n", "if not Path(free_vc_repo).exists():\n", " !git clone https://github.com/OlaWod/FreeVC.git\n", "\n", "sys.path.append(free_vc_repo)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Downloading...\n", "From: https://drive.google.com/uc?id=12-cB34qCTvByWT-QtOcZaqwwO21FLSqU&confirm=t&uuid=a703c43c-ccce-436c-8799-c11b88e9e7e4\n", "To: /home/ea/work/openvino_notebooks/notebooks/freevc-voice-conversion/FreeVC/wavlm/WavLM-Large.pt\n", "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.26G/1.26G [00:56<00:00, 22.4MB/s]\n" ] } ], "source": [ "# Fetch `notebook_utils` module\n", "import requests\n", "import gdown\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", ")\n", "\n", "open(\"notebook_utils.py\", \"w\").write(r.text)\n", "from notebook_utils import download_file\n", "\n", "wavlm_large_dir_path = Path(\"FreeVC/wavlm\")\n", "wavlm_large_path = wavlm_large_dir_path / \"WavLM-Large.pt\"\n", "\n", "wavlm_url = \"https://drive.google.com/uc?id=12-cB34qCTvByWT-QtOcZaqwwO21FLSqU&confirm=t&uuid=a703c43c-ccce-436c-8799-c11b88e9e7e4\"\n", "\n", "if not wavlm_large_path.exists():\n", " gdown.download(wavlm_url, str(wavlm_large_path))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e6a88c6c810a4ef285a981569b69069b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "checkpoints/freevc.pth: 0%| | 0.00/451M [00:00 each. Both the waveform and the\n", " mel spectrogram slices are returned, so as to make each partial utterance waveform\n", " correspond to its spectrogram.\n", "\n", " The returned ranges may be indexing further than the length of the waveform. It is\n", " recommended that you pad the waveform with zeros up to wav_slices[-1].stop.\n", "\n", " :param n_samples: the number of samples in the waveform\n", " :param rate: how many partial utterances should occur per second. Partial utterances must\n", " cover the span of the entire utterance, thus the rate should not be lower than the inverse\n", " of the duration of a partial utterance. By default, partial utterances are 1.6s long and\n", " the minimum rate is thus 0.625.\n", " :param min_coverage: when reaching the last partial utterance, it may or may not have\n", " enough frames. If at least of are present,\n", " then the last partial utterance will be considered by zero-padding the audio. Otherwise,\n", " it will be discarded. If there aren't enough frames for one partial utterance,\n", " this parameter is ignored so that the function always returns at least one slice.\n", " :return: the waveform slices and mel spectrogram slices as lists of array slices. Index\n", " respectively the waveform and the mel spectrogram with these slices to obtain the partial\n", " utterances.\n", " \"\"\"\n", " assert 0 < min_coverage <= 1\n", "\n", " # Compute how many frames separate two partial utterances\n", " samples_per_frame = int((sampling_rate * mel_window_step / 1000))\n", " n_frames = int(np.ceil((n_samples + 1) / samples_per_frame))\n", " frame_step = int(np.round((sampling_rate / rate) / samples_per_frame))\n", " assert 0 < frame_step, \"The rate is too high\"\n", " assert frame_step <= partials_n_frames, \"The rate is too low, it should be %f at least\" % (sampling_rate / (samples_per_frame * partials_n_frames))\n", "\n", " # Compute the slices\n", " wav_slices, mel_slices = [], []\n", " steps = max(1, n_frames - partials_n_frames + frame_step + 1)\n", " for i in range(0, steps, frame_step):\n", " mel_range = np.array([i, i + partials_n_frames])\n", " wav_range = mel_range * samples_per_frame\n", " mel_slices.append(slice(*mel_range))\n", " wav_slices.append(slice(*wav_range))\n", "\n", " # Evaluate whether extra padding is warranted or not\n", " last_wav_range = wav_slices[-1]\n", " coverage = (n_samples - last_wav_range.start) / (last_wav_range.stop - last_wav_range.start)\n", " if coverage < min_coverage and len(mel_slices) > 1:\n", " mel_slices = mel_slices[:-1]\n", " wav_slices = wav_slices[:-1]\n", "\n", " return wav_slices, mel_slices\n", "\n", "\n", "def embed_utterance(\n", " wav: np.ndarray,\n", " smodel: ov.CompiledModel,\n", " return_partials=False,\n", " rate=1.3,\n", " min_coverage=0.75,\n", "):\n", " \"\"\"\n", " Computes an embedding for a single utterance. The utterance is divided in partial\n", " utterances and an embedding is computed for each. The complete utterance embedding is the\n", " L2-normed average embedding of the partial utterances.\n", "\n", " :param wav: a preprocessed utterance waveform as a numpy array of float32\n", " :param smodel: compiled speaker encoder model.\n", " :param return_partials: if True, the partial embeddings will also be returned along with\n", " the wav slices corresponding to each partial utterance.\n", " :param rate: how many partial utterances should occur per second. Partial utterances must\n", " cover the span of the entire utterance, thus the rate should not be lower than the inverse\n", " of the duration of a partial utterance. By default, partial utterances are 1.6s long and\n", " the minimum rate is thus 0.625.\n", " :param min_coverage: when reaching the last partial utterance, it may or may not have\n", " enough frames. If at least of are present,\n", " then the last partial utterance will be considered by zero-padding the audio. Otherwise,\n", " it will be discarded. If there aren't enough frames for one partial utterance,\n", " this parameter is ignored so that the function always returns at least one slice.\n", " :return: the embedding as a numpy array of float32 of shape (model_embedding_size,). If\n", " is True, the partial utterances as a numpy array of float32 of shape\n", " (n_partials, model_embedding_size) and the wav partials as a list of slices will also be\n", " returned.\n", " \"\"\"\n", " # Compute where to split the utterance into partials and pad the waveform with zeros if\n", " # the partial utterances cover a larger range.\n", " wav_slices, mel_slices = compute_partial_slices(len(wav), rate, min_coverage)\n", " max_wave_length = wav_slices[-1].stop\n", " if max_wave_length >= len(wav):\n", " wav = np.pad(wav, (0, max_wave_length - len(wav)), \"constant\")\n", "\n", " # Split the utterance into partials and forward them through the model\n", " mel = audio.wav_to_mel_spectrogram(wav)\n", " mels = np.array([mel[s] for s in mel_slices])\n", " with torch.no_grad():\n", " mels = torch.from_numpy(mels).to(torch.device(\"cpu\"))\n", " output_layer = smodel.output(0)\n", " partial_embeds = smodel(mels)[output_layer]\n", "\n", " # Compute the utterance embedding from the partial embeddings\n", " raw_embed = np.mean(partial_embeds, axis=0)\n", " embed = raw_embed / np.linalg.norm(raw_embed, 2)\n", "\n", " if return_partials:\n", " return embed, partial_embeds, wav_slices\n", " return embed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4d66a26387054efea12251cdd6f96941", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then compile model." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "compiled_smodel = core.compile_model(ir_smodel, device.value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert Decoder\n", "[back to top ⬆️](#Table-of-contents:)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the same way export `SynthesizerTrn` model, that implements decoder function to OpenVINO IR format." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ea/work/my_optimum_intel/optimum_env/lib/python3.8/site-packages/torch/jit/_trace.py:1116: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:\n", "Tensor-likes are not close!\n", "\n", "Mismatched elements: 25913 / 25920 (100.0%)\n", "Greatest absolute difference: 0.7199262976646423 at index (0, 0, 6604) (up to 1e-05 allowed)\n", "Greatest relative difference: 21150.68245125348 at index (0, 0, 21658) (up to 1e-05 allowed)\n", " _check_trace(\n" ] } ], "source": [ "OUTPUT_DIR = Path(\"output\")\n", "BASE_MODEL_NAME = \"net_g\"\n", "onnx_net_g_path = Path(OUTPUT_DIR / (BASE_MODEL_NAME + \"_fp32\")).with_suffix(\".onnx\")\n", "ir_net_g_path = Path(OUTPUT_DIR / (BASE_MODEL_NAME + \"ir\")).with_suffix(\".xml\")\n", "\n", "dummy_input_1 = torch.randn(1, 1024, 81)\n", "dummy_input_2 = torch.randn(1, 256)\n", "\n", "# define forward as infer\n", "net_g.forward = net_g.infer\n", "\n", "\n", "if not ir_net_g_path.exists():\n", " ir_net_g_model = ov.convert_model(net_g, example_input=(dummy_input_1, dummy_input_2))\n", " ov.save_model(ir_net_g_model, ir_net_g_path)\n", "else:\n", " ir_net_g_model = core.read_model(ir_net_g_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select device from dropdown list for running inference using OpenVINO" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "compiled_ir_net_g_model = core.compile_model(ir_net_g_model, device.value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define function for synthesizing." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def synthesize_audio(src, tgt):\n", " wav_tgt, _ = librosa.load(tgt, sr=hps.data.sampling_rate)\n", " wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20)\n", "\n", " g_tgt = embed_utterance(wav_tgt, compiled_smodel)\n", " g_tgt = torch.from_numpy(g_tgt).unsqueeze(0)\n", "\n", " # src\n", " wav_src, _ = librosa.load(src, sr=hps.data.sampling_rate)\n", " wav_src = np.expand_dims(wav_src, axis=0)\n", "\n", " output_layer = compiled_cmodel.output(0)\n", " c = compiled_cmodel(wav_src)[output_layer]\n", " c = c.transpose((0, 2, 1))\n", "\n", " output_layer = compiled_ir_net_g_model.output(0)\n", " tgt_audio = compiled_ir_net_g_model((c, g_tgt))[output_layer]\n", " tgt_audio = tgt_audio[0][0]\n", "\n", " return tgt_audio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we can check inference using only IR models." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2it [00:01, 1.64it/s]\n" ] } ], "source": [ "result_wav_names = []\n", "\n", "with torch.no_grad():\n", " for line in tqdm(zip(srcs, tgts)):\n", " src, tgt = line\n", "\n", " output_audio = synthesize_audio(src, tgt)\n", "\n", " timestamp = time.strftime(\"%m-%d_%H-%M\", time.localtime())\n", " result_name = f\"{timestamp}.wav\"\n", " result_wav_names.append(result_name)\n", " write(\n", " os.path.join(\"outputs/freevc\", result_name),\n", " hps.data.sampling_rate,\n", " output_audio,\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Result audio files should be available in 'outputs/freevc' and you can check them and compare with generated earlier.\n", "Below one of the results presents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Source audio (source of text):" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import IPython.display as ipd\n", "\n", "ipd.Audio(srcs[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Target audio (source of voice):" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ipd.Audio(tgts[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Result audio:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ipd.Audio(f\"outputs/freevc/{result_wav_names[0]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, you can use your own audio file. Just upload them and use for inference. Use rate corresponding to the value of `hps.data.sampling_rate`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import gradio as gr\n", "\n", "\n", "audio1 = gr.Audio(label=\"Source Audio\", type=\"filepath\")\n", "audio2 = gr.Audio(label=\"Reference Audio\", type=\"filepath\")\n", "outputs = gr.Audio(label=\"Output Audio\", type=\"filepath\")\n", "examples = [[audio1_name, audio2_name]]\n", "\n", "title = \"FreeVC with Gradio\"\n", "description = 'Gradio Demo for FreeVC and OpenVINO™. Upload a source audio and a reference audio, then click the \"Submit\" button to inference.'\n", "\n", "\n", "def infer(src, tgt):\n", " output_audio = synthesize_audio(src, tgt)\n", "\n", " timestamp = time.strftime(\"%m-%d_%H-%M\", time.localtime())\n", " result_name = f\"{timestamp}.wav\"\n", " write(result_name, hps.data.sampling_rate, output_audio)\n", "\n", " return result_name\n", "\n", "\n", "iface = gr.Interface(\n", " infer,\n", " [audio1, audio2],\n", " outputs,\n", " title=title,\n", " description=description,\n", " examples=examples,\n", ")\n", "iface.launch()\n", "# if you are launching remotely, specify server_name and server_port\n", "# iface.launch(server_name='your server name', server_port='server port in int')\n", "# if you have any issue to launch on your platform, you can pass share=True to launch method:\n", "# iface.launch(share=True)\n", "# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "iface.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "", "tags": { "categories": [ "Model Demos" ], "libraries": [], "other": [], "tasks": [ "Audio-to-Audio", "Voice Conversion" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "0e05ec04294f463e94573c2e85e1f01b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_60ea675ee0494b048ffe65fa162b94fc", "style": "IPY_MODEL_cac63c2f28eb404487b1e5e4cfb69d6d", "value": "p225_001.wav: 100%" } }, "1f80b2743d3d4db6aff27ea7f2a9079b": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "3da9a9949dc642359d38a9a00d58fa7d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_d7ba8939d447400caffd14b0aa9ed073", "max": 52058, "style": "IPY_MODEL_e377d2a4b5304c74a63557a91e0e4a53", "value": 52058 } }, "4717ac6f179f4167a857392474fafccc": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "4d66a26387054efea12251cdd6f96941": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "DropdownModel", "state": { "_options_labels": [ "CPU", "AUTO" ], "description": "Device:", "index": 1, "layout": "IPY_MODEL_bdc226b89f0b497187688a032fac62fe", "style": "IPY_MODEL_85ac580d909c4b8eb03bcf1d7ecdb189" } }, "552ef82102a04552ada26e33ce6b0a92": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "5a1e7619fec3472b88004ef00e43dbb7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } }, "5ae224d2baa547f5a868c99b56379b10": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_ebdeae3eafba40bcb0db25af4cb39487", "style": "IPY_MODEL_fe58ac6d25e84145bd7a195b5594bc95", "value": " 451M/451M [00:22<00:00, 22.8MB/s]" } }, "60ea675ee0494b048ffe65fa162b94fc": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "619f12153868434e819646c5155f9cf2": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_82687a822530469c9aeed4ec6c3d3b15", "style": "IPY_MODEL_c32f1b0c8fb34306aa5c3b76e22e16de", "value": "p226_002.wav: 100%" } }, "6d933e2ea86148a8af8f951342d9307b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_0e05ec04294f463e94573c2e85e1f01b", "IPY_MODEL_3da9a9949dc642359d38a9a00d58fa7d", "IPY_MODEL_8a5ccab56e7e4cbfb03468610bbbaf48" ], "layout": "IPY_MODEL_f7865f2a298e4ae9b450d8bfc3d8ebd2" } }, "77038bb27f4f480280b368816146387c": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "77cbed65268546949782e5d6ae510548": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_da6dffd2a646435293fa9e6850d19dc8", "style": "IPY_MODEL_8486e59100df46578b47c97f816fb0b0", "value": " 135k/135k [00:00<00:00, 267kB/s]" } }, "81c928727dd54d0fac484c3237249150": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "82687a822530469c9aeed4ec6c3d3b15": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "8486e59100df46578b47c97f816fb0b0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } }, "85ac580d909c4b8eb03bcf1d7ecdb189": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "DescriptionStyleModel", "state": { "description_width": "" } }, "89de592277ac4092a5b0f4ae0207eb0d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_f88c40c864874b8a82b44e895c8d24ad", "max": 472644351, "style": "IPY_MODEL_552ef82102a04552ada26e33ce6b0a92", "value": 472644351 } }, "8a5ccab56e7e4cbfb03468610bbbaf48": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_77038bb27f4f480280b368816146387c", "style": "IPY_MODEL_d5d18c75fd1c4ff1b0a4ed614b50019d", "value": " 50.8k/50.8k [00:00<00:00, 367kB/s]" } }, "b05bd0788e6042e9a2e4f71cf0c3c9ed": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_619f12153868434e819646c5155f9cf2", "IPY_MODEL_b3491f7eaecc4670ad21feb18928f499", "IPY_MODEL_77cbed65268546949782e5d6ae510548" ], "layout": "IPY_MODEL_81c928727dd54d0fac484c3237249150" } }, "b2fc4b323ff64e26b321403c980a8ff9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLModel", "state": { "layout": "IPY_MODEL_1f80b2743d3d4db6aff27ea7f2a9079b", "style": "IPY_MODEL_5a1e7619fec3472b88004ef00e43dbb7", "value": "checkpoints/freevc.pth: 100%" } }, "b3491f7eaecc4670ad21feb18928f499": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "FloatProgressModel", "state": { "bar_style": "success", "layout": "IPY_MODEL_b59fcf7296e747dcbab3f9c716b8d04d", "max": 138084, "style": "IPY_MODEL_4717ac6f179f4167a857392474fafccc", "value": 138084 } }, "b59fcf7296e747dcbab3f9c716b8d04d": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "bdc226b89f0b497187688a032fac62fe": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "c32f1b0c8fb34306aa5c3b76e22e16de": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } }, "cac63c2f28eb404487b1e5e4cfb69d6d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } }, "d5d18c75fd1c4ff1b0a4ed614b50019d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } }, "d7ba8939d447400caffd14b0aa9ed073": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "da6dffd2a646435293fa9e6850d19dc8": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "e377d2a4b5304c74a63557a91e0e4a53": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "ProgressStyleModel", "state": { "description_width": "" } }, "e6a88c6c810a4ef285a981569b69069b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HBoxModel", "state": { "children": [ "IPY_MODEL_b2fc4b323ff64e26b321403c980a8ff9", "IPY_MODEL_89de592277ac4092a5b0f4ae0207eb0d", "IPY_MODEL_5ae224d2baa547f5a868c99b56379b10" ], "layout": "IPY_MODEL_f3282ca21153461fb288109c48b637f3" } }, "ebdeae3eafba40bcb0db25af4cb39487": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "f3282ca21153461fb288109c48b637f3": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "f7865f2a298e4ae9b450d8bfc3d8ebd2": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "f88c40c864874b8a82b44e895c8d24ad": { "model_module": "@jupyter-widgets/base", "model_module_version": "2.0.0", "model_name": "LayoutModel", "state": {} }, "fe58ac6d25e84145bd7a195b5594bc95": { "model_module": "@jupyter-widgets/controls", "model_module_version": "2.0.0", "model_name": "HTMLStyleModel", "state": { "description_width": "", "font_size": null, "text_color": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }