diff --git a/.gitattributes b/.gitattributes index a6344aac8c09253b3b630fb776ae94478aa0275b..cea489bbd5b3b79a4c520021c853e4fbaa965828 100644 --- a/.gitattributes +++ b/.gitattributes @@ -32,4 +32,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.xz filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text +*.tsv filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text diff --git a/README copy.md b/README copy.md new file mode 100644 index 0000000000000000000000000000000000000000..64976a1f988d1de2022db161f300638f8c34c373 --- /dev/null +++ b/README copy.md @@ -0,0 +1,107 @@ +# Make-An-Audio 3: Transforming Text into Audio via Flow-based Large Diffusion Transformers + +PyTorch Implementation of [Lumina-t2x](https://arxiv.org/abs/2405.05945) + +We will provide our implementation and pretrained models as open source in this repository recently. + +[![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2305.18474) +[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/AIGC-Audio/Lumina-Audio) +[![GitHub Stars](https://img.shields.io/github/stars/Text-to-Audio/Make-An-Audio-3?style=social)](https://github.com/Text-to-Audio/Make-An-Audio-3) + +## Use pretrained model +We provide our implementation and pretrained models as open source in this repository. + +Visit our [demo page](https://make-an-audio-2.github.io/) for audio samples. +## Quick Started +### Pretrained Models +Simply download the weights from [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/Alpha-VLLM/Lumina-T2Music). +- Text Encoder: [FLAN-T5-Large](https://huggingface.co/google/flan-t5-large) +- VAE: Make-An-Audio 2, finetuned from [Make an Audio](https://github.com/Text-to-Audio/Make-An-Audio) +- Decoder: [Vocoder](https://github.com/NVIDIA/BigVGAN) +- `Music` Checkpoints: [huggingface](https://huggingface.co/Alpha-VLLM/Lumina-T2Music), `Audio` Checkpoints: [huggingface]() + +### Generate audio/music from text +``` +python3 scripts/txt2audio_for_2cap_flow.py +--outdir output_dir -r checkpoints_last.ckpt -b configs/txt2audio-cfm1-cfg-LargeDiT3.yaml --scale 3.0 +--vocoder-ckpt useful_ckpts/bigvnat --test-dataset audiocaps +``` + +### Generate audio/music from audiocaps or musiccaps test dataset +- remember to relatively change `config["test_dataset]` +``` +python3 scripts/txt2audio_for_2cap_flow.py +--outdir output_dir -r checkpoints_last.ckpt -b configs/txt2audio-cfm1-cfg-LargeDiT3.yaml --scale 3.0 +--vocoder-ckpt useful_ckpts/bigvnat --test-dataset testset +``` + +### Generate audio/music from video +``` +python3 scripts/video2audio_flow.py +--outdir output_dir -r checkpoints_last.ckpt -b configs/txt2audio-cfm1-cfg-LargeDiT3.yaml --scale 3.0 +--vocoder-ckpt useful_ckpts/bigvnat --test-dataset vggsound +``` + +## Train +### Data preparation +- We can't provide the dataset download link for copyright issues. We provide the process code to generate melspec, count audio duration and generate structured caption. +- Before training, we need to construct the dataset information into a tsv file, which includes name (id for each audio), dataset (which dataset the audio belongs to), audio_path (the path of .wav file),caption (the caption of the audio) ,mel_path (the processed melspec file path of each audio), duration (the duration of the audio). We provide a tsv file of audiocaps test set: audiocaps_test_struct.tsv as a sample. +- We provide a tsv file of the audiocaps test set: ./audiocaps_test_16000_struct.tsv as a sample. + +### Generate the melspec file of audio +Assume you have already got a tsv file to link each caption to its audio_path, which mean the tsv_file have "name","audio_path","dataset" and "caption" columns in it. +To get the melspec of audio, run the following command, which will save mels in ./processed +``` +python preprocess/mel_spec.py --tsv_path tmp.tsv --num_gpus 1 --max_duration 10 +``` + +### Count audio duration +To count the duration of the audio and save duration information in tsv file, run the following command: +``` +python preprocess/add_duration.py --tsv_path tmp.tsv +``` + +### Generated structure caption from the original natural language caption +Firstly you need to get an authorization token in openai(https://openai.com/blog/openai-api), here is a tutorial(https://www.maisieai.com/help/how-to-get-an-openai-api-key-for-chatgpt). Then replace your key of variable openai_key in preprocess/n2s_by_openai.py. Run the following command to add structed caption, the tsv file with structured caption will be saved into {tsv_file_name}_struct.tsv: +``` +python preprocess/n2s_by_openai.py --tsv_path tmp.tsv +``` + +### Place Tsv files +After generated structure caption, put the tsv with structed caption to ./data/main_spec_dir . And put tsv files without structured caption to ./data/no_struct_dir + +Modify the config data.params.main_spec_dir and data.params.main_spec_dir.other_spec_dir_path respectively in config file configs/text2audio-ConcatDiT-ae1dnat_Skl20d2_struct2MLPanylen.yaml . + +## Train variational autoencoder +Assume we have processed several datasets, and save the .tsv files in tsv_dir/*.tsv . Replace data.params.spec_dir_path with tsv_dir in the config file. Then we can train VAE with the following command. If you don't have 8 gpus in your machine, you can replace --gpus 0,1,...,gpu_nums +``` +python main.py --base configs/research/autoencoder/autoencoder1d_kl20_natbig_r1_down2_disc2.yaml -t --gpus 0,1,2,3,4,5,6,7 +``` + +## Train latent diffsuion +After trainning VAE, replace model.params.first_stage_config.params.ckpt_path with your trained VAE checkpoint path in the config file. +Run the following command to train Diffusion model +``` +python main.py --base configs/research/text2audio/text2audio-ConcatDiT-ae1dnat_Skl20d2_freezeFlananylen_drop.yaml -t --gpus 0,1,2,3,4,5,6,7 +``` + +## Evaluation +Please refer to [Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio?tab=readme-ov-file#evaluation) + + +## Acknowledgements +This implementation uses parts of the code from the following Github repos: +[Make-An-Audio](https://github.com/Text-to-Audio/Make-An-Audio), +[AudioLCM](https://github.com/Text-to-Audio/AudioLCM), +[CLAP](https://github.com/LAION-AI/CLAP), +as described in our code. + + + +## Citations ## +If you find this code useful in your research, please consider citing: +```bibtex +``` + +# Disclaimer ## +Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws. \ No newline at end of file diff --git a/app.py b/app.py new file mode 100644 index 0000000000000000000000000000000000000000..927ba496f2785379f858ce1aea73e5e278b65ba8 --- /dev/null +++ b/app.py @@ -0,0 +1,199 @@ +import spaces +import argparse, os, sys, glob +import pathlib +directory = pathlib.Path(os.getcwd()) +print(directory) +sys.path.append(str(directory)) +import torch +import numpy as np +from omegaconf import OmegaConf +from ldm.util import instantiate_from_config +from ldm.models.diffusion.ddim import DDIMSampler +from ldm.models.diffusion.plms import PLMSSampler +import pandas as pd +from tqdm import tqdm +import preprocess.n2s_by_openai as n2s +from vocoder.bigvgan.models import VocoderBigVGAN +import soundfile +import torchaudio, math +import gradio +import gradio as gr + +def load_model_from_config(config, ckpt = None, verbose=True): + model = instantiate_from_config(config.model) + if ckpt: + print(f"Loading model from {ckpt}") + pl_sd = torch.load(ckpt, map_location="cpu") + sd = pl_sd["state_dict"] + + m, u = model.load_state_dict(sd, strict=False) + if len(m) > 0 and verbose: + print("missing keys:") + print(m) + if len(u) > 0 and verbose: + print("unexpected keys:") + print(u) + else: + print(f"Note chat no ckpt is loaded !!!") + + model.cuda() + model.eval() + return model + + +class GenSamples: + def __init__(self,opt, model,outpath,config, vocoder = None,save_mel = True,save_wav = True) -> None: + self.opt = opt + self.model = model + self.outpath = outpath + if save_wav: + assert vocoder is not None + self.vocoder = vocoder + self.save_mel = save_mel + self.save_wav = save_wav + self.channel_dim = self.model.channels + self.config = config + + def gen_test_sample(self,prompt, mel_name = None,wav_name = None, gt=None, video=None):# prompt is {'ori_caption':’xxx‘,'struct_caption':'xxx'} + uc = None + record_dicts = [] + if self.opt['scale'] != 1.0: + try: # audiocaps + uc = self.model.get_learned_conditioning({'ori_caption': "",'struct_caption': ""}) + except: # audioset + uc = self.model.get_learned_conditioning(prompt['ori_caption']) + for n in range(self.opt['n_iter']): + try: # audiocaps + c = self.model.get_learned_conditioning(prompt) # shape:[1,77,1280],即还没有变成句子embedding,仍是每个单词的embedding + except: # audioset + c = self.model.get_learned_conditioning(prompt['ori_caption']) + + if self.channel_dim>0: + shape = [self.channel_dim, self.opt['H'], self.opt['W']] # (z_dim, 80//2^x, 848//2^x) + else: + shape = [1, self.opt['H'], self.opt['W']] + + x0 = torch.randn(shape, device=self.model.device) + + if self.opt['scale'] == 1: # w/o cfg + sample, _ = self.model.sample(c, 1, timesteps=self.opt['ddim_steps'], x_latent=x0) + else: # cfg + sample, _ = self.model.sample_cfg(c, self.opt['scale'], uc, 1, timesteps=self.opt['ddim_steps'], x_latent=x0) + x_samples_ddim = self.model.decode_first_stage(sample) + + for idx,spec in enumerate(x_samples_ddim): + spec = spec.squeeze(0).cpu().numpy() + print(spec[0]) + record_dict = {'caption':prompt['ori_caption'][0]} + if self.save_mel: + mel_path = os.path.join(self.outpath,mel_name+f'_{idx}.npy') + np.save(mel_path,spec) + record_dict['mel_path'] = mel_path + if self.save_wav: + wav = self.vocoder.vocode(spec) + wav_path = os.path.join(self.outpath,wav_name+f'_{idx}.wav') + soundfile.write(wav_path, wav, self.opt['sample_rate']) + record_dict['audio_path'] = wav_path + record_dicts.append(record_dict) + + return record_dicts + +@spaces.GPU(enable_queue=True) +def infer(ori_prompt, ddim_steps, scale, seed): + # np.random.seed(seed) + # torch.manual_seed(seed) + prompt = dict(ori_caption=ori_prompt,struct_caption=f'<{ori_prompt}& all>') + + opt = { + 'sample_rate': 16000, + 'outdir': 'outputs/txt2music-samples', + 'ddim_steps': ddim_steps, + 'n_iter': 1, + 'H': 20, + 'W': 312, + 'scale': scale, + 'resume': 'useful_ckpts/music_generation/119.ckpt', + 'base': 'configs/txt2music-cfm1-cfg-LargeDiT3.yaml', + 'vocoder_ckpt': 'useful_ckpts/bigvnat', + } + + config = OmegaConf.load(opt['base']) + model = load_model_from_config(config, opt['resume']) + + device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") + model = model.to(device) + os.makedirs(opt['outdir'], exist_ok=True) + vocoder = VocoderBigVGAN(opt['vocoder_ckpt'],device) + generator = GenSamples(opt, model,opt['outdir'],config, vocoder,save_mel=False,save_wav=True) + + with torch.no_grad(): + with model.ema_scope(): + wav_name = f'{prompt["ori_caption"].strip().replace(" ", "-")}' + generator.gen_test_sample(prompt,wav_name=wav_name) + + file_path = os.path.join(opt['outdir'],wav_name+'_0.wav') + print(f"Your samples are ready and waiting four you here: \n{file_path} \nEnjoy.") + return file_path + +def my_inference_function(text_prompt, ddim_steps, scale, seed): + file_path = infer(text_prompt, ddim_steps, scale, seed) + return file_path + + +with gr.Blocks() as demo: + with gr.Row(): + gr.Markdown("## Make-An-Audio 3: Transforming Text into Audio via Flow-based Large Diffusion Transformers") + + with gr.Row(): + with gr.Column(): + prompt = gr.Textbox(label="Prompt: Input your text here. ") + run_button = gr.Button() + + with gr.Accordion("Advanced options", open=False): + ddim_steps = gr.Slider(label="ddim_steps", minimum=1, + maximum=50, value=25, step=1) + scale = gr.Slider( + label="Guidance Scale:(Large => more relevant to text but the quality may drop)", minimum=0.1, maximum=8.0, value=3.0, step=0.1 + ) + seed = gr.Slider( + label="Seed:Change this value (any integer number) will lead to a different generation result.", + minimum=0, + maximum=2147483647, + step=1, + value=44, + ) + + with gr.Column(): + outaudio = gr.Audio() + + run_button.click(fn=my_inference_function, inputs=[ + prompt, ddim_steps, scale, seed], outputs=[outaudio]) + with gr.Row(): + with gr.Column(): + gr.Examples( + examples = [['An amateur recording features a steel drum playing in a higher register',25,5,55], + ['An instrumental song with a caribbean feel, happy mood, and featuring steel pan music, programmed percussion, and bass',25,5,55], + ['This musical piece features a playful and emotionally melodic male vocal accompanied by piano',25,5,55], + ['A eerie yet calming experimental electronic track featuring haunting synthesizer strings and pads',25,5,55], + ['A slow tempo pop instrumental piece featuring only acoustic guitar with fingerstyle and percussive strumming techniques',25,5,55]], + inputs = [prompt, ddim_steps, scale, seed], + outputs = [outaudio] + ) + with gr.Column(): + pass + +demo.launch() + + +# gradio_interface = gradio.Interface( +# fn = my_inference_function, +# inputs = "text", +# outputs = "audio" +# ) +# gradio_interface.launch() +# text_prompt = 'An amateur recording features a steel drum playing in a higher register' +# # text_prompt = 'A slow tempo pop instrumental piece featuring only acoustic guitar with fingerstyle and percussive strumming techniques' +# ddim_steps=25 +# scale=5.0 +# seed=55 +# my_inference_function(text_prompt, ddim_steps, scale, seed) diff --git a/audiocaps_test_struct.tsv b/audiocaps_test_struct.tsv new file mode 100644 index 0000000000000000000000000000000000000000..2ab29425c6d0b407ad1768b0852ff9220e7ff503 --- /dev/null +++ b/audiocaps_test_struct.tsv @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36d5f93b134ee6ed8c7e75adffca2e0a378fb683e67836abd78b50153659858b +size 1306277 diff --git a/data/audiocaps_test_struct.tsv b/data/audiocaps_test_struct.tsv index 9d9bee2becfb6a911d0321b8651b9e53555a59be..2ab29425c6d0b407ad1768b0852ff9220e7ff503 100644 --- a/data/audiocaps_test_struct.tsv +++ b/data/audiocaps_test_struct.tsv @@ -1,4501 +1,3 @@ -name dataset ori_cap audio_path mel_path caption -Y7fmOlUlwoNg audiocaps Constant rattling noise and sharp vibrations /home/tiger/nfs/data/audiocaps/test_16k/Y7fmOlUlwoNg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7fmOlUlwoNg_mel.npy @ -Y6BJ455B1aAs audiocaps A rocket flies by followed by a loud explosion and fire crackling as a truck engine runs idle /home/tiger/nfs/data/audiocaps/test_16k/Y6BJ455B1aAs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6BJ455B1aAs_mel.npy @@@ -YGOD8Bt5LfDE audiocaps Humming and vibrating with a man and children speaking and laughing /home/tiger/nfs/data/audiocaps/test_16k/YGOD8Bt5LfDE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YGOD8Bt5LfDE_mel.npy @@ -YYQSuFyFm3Lc audiocaps A train running on a railroad track followed by a vehicle door closing and a man talking in the distance while a train horn honks and railroad crossing warning signals ring /home/tiger/nfs/data/audiocaps/test_16k/YYQSuFyFm3Lc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YYQSuFyFm3Lc_mel.npy @@@@@@ -YVjSEIRnLAh8 audiocaps Food is frying, and a woman talks /home/tiger/nfs/data/audiocaps/test_16k/YVjSEIRnLAh8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YVjSEIRnLAh8_mel.npy @ -YDlWd7Wmdi1E audiocaps A man speaks as birds chirp and dogs bark /home/tiger/nfs/data/audiocaps/test_16k/YDlWd7Wmdi1E.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YDlWd7Wmdi1E_mel.npy @@ -YYNDKuNINDOY audiocaps A large truck driving by as an emergency siren wails and truck horn honks /home/tiger/nfs/data/audiocaps/test_16k/YYNDKuNINDOY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YYNDKuNINDOY_mel.npy @@ -YfsBR7e_X_0Y audiocaps A child yelling as a young boy talks during several slaps on a hard surface /home/tiger/nfs/data/audiocaps/test_16k/YfsBR7e_X_0Y.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YfsBR7e_X_0Y_mel.npy @@@@ -YtjCNwdOUiGc audiocaps An engine rumbles loudly, then an air horn honk three times /home/tiger/nfs/data/audiocaps/test_16k/YtjCNwdOUiGc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YtjCNwdOUiGc_mel.npy @ -YyL3gKa6YLoM audiocaps A person snoring with another man speaking /home/tiger/nfs/data/audiocaps/test_16k/YyL3gKa6YLoM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyL3gKa6YLoM_mel.npy @ -YLbken4JCr94 audiocaps Thunder and a gentle rain /home/tiger/nfs/data/audiocaps/test_16k/YLbken4JCr94.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YLbken4JCr94_mel.npy @ -Y_xylo5_IiaM audiocaps A woman talks and a baby whispers /home/tiger/nfs/data/audiocaps/test_16k/Y_xylo5_IiaM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_xylo5_IiaM_mel.npy @ -YsVYTOURVsQ0 audiocaps A man talking as a stream of water trickles in the background /home/tiger/nfs/data/audiocaps/test_16k/YsVYTOURVsQ0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YsVYTOURVsQ0_mel.npy @ -YSmdj6JFB9MQ audiocaps A person briefly talks followed quickly by toilet flushing and another voice from another person /home/tiger/nfs/data/audiocaps/test_16k/YSmdj6JFB9MQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YSmdj6JFB9MQ_mel.npy @@ -Yu84FiZ_omhA audiocaps A woman singing then choking followed by birds chirping /home/tiger/nfs/data/audiocaps/test_16k/Yu84FiZ_omhA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yu84FiZ_omhA_mel.npy @@ -Ykx6Rj4MDIAw audiocaps Machinery banging and hissing /home/tiger/nfs/data/audiocaps/test_16k/Ykx6Rj4MDIAw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ykx6Rj4MDIAw_mel.npy @ -YPLHXGDnig4M audiocaps A person talking which later imitates a couple of meow sounds /home/tiger/nfs/data/audiocaps/test_16k/YPLHXGDnig4M.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPLHXGDnig4M_mel.npy @ -YZ0IrCa4MvOA audiocaps Rain is falling continuously /home/tiger/nfs/data/audiocaps/test_16k/YZ0IrCa4MvOA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZ0IrCa4MvOA_mel.npy -Y14ekd4nkpwc audiocaps An infant crying followed by a man laughing /home/tiger/nfs/data/audiocaps/test_16k/Y14ekd4nkpwc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y14ekd4nkpwc_mel.npy @ -YyfYNPWs7mWY audiocaps A man talking as a door slams shut followed by a door creaking /home/tiger/nfs/data/audiocaps/test_16k/YyfYNPWs7mWY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyfYNPWs7mWY_mel.npy @@ -YuhSDBwVrEdo audiocaps Whistling with wind blowing /home/tiger/nfs/data/audiocaps/test_16k/YuhSDBwVrEdo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YuhSDBwVrEdo_mel.npy @ -YYQGW5AwDOIo audiocaps Vehicles passing by slowly together with distant murmuring /home/tiger/nfs/data/audiocaps/test_16k/YYQGW5AwDOIo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YYQGW5AwDOIo_mel.npy @ -YMe4npKmtchA audiocaps Water is trickling, and a man talks /home/tiger/nfs/data/audiocaps/test_16k/YMe4npKmtchA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YMe4npKmtchA_mel.npy @ -YgbtcDoh0q3c audiocaps Scraping and speech followed by people laughing /home/tiger/nfs/data/audiocaps/test_16k/YgbtcDoh0q3c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YgbtcDoh0q3c_mel.npy @@ -Y9HVgYs8OOLc audiocaps Birds cackling and young peoples voices /home/tiger/nfs/data/audiocaps/test_16k/Y9HVgYs8OOLc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9HVgYs8OOLc_mel.npy @ -YOpiWMltpj44 audiocaps Birds are squawking, and ducks are quacking /home/tiger/nfs/data/audiocaps/test_16k/YOpiWMltpj44.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YOpiWMltpj44_mel.npy @ -Y9ZZHvwaH-CU audiocaps Repeated gunfire and screaming in the background /home/tiger/nfs/data/audiocaps/test_16k/Y9ZZHvwaH-CU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9ZZHvwaH-CU_mel.npy @ -YK_Vre_-4KqU audiocaps An aircraft engine is taking off /home/tiger/nfs/data/audiocaps/test_16k/YK_Vre_-4KqU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YK_Vre_-4KqU_mel.npy -YqeSl7YZAfs4 audiocaps Water running with a main is speaking /home/tiger/nfs/data/audiocaps/test_16k/YqeSl7YZAfs4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YqeSl7YZAfs4_mel.npy @
-Y4IeDBwyQ9ZQ audiocaps A female speaking with some rustling followed by another female speaking /home/tiger/nfs/data/audiocaps/test_16k/Y4IeDBwyQ9ZQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4IeDBwyQ9ZQ_mel.npy @ -YArHiac57pVk audiocaps Males speaking and then a clock ticks twice /home/tiger/nfs/data/audiocaps/test_16k/YArHiac57pVk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YArHiac57pVk_mel.npy @ -YqZEIs6tS5vk audiocaps An engine revving and then tires squealing /home/tiger/nfs/data/audiocaps/test_16k/YqZEIs6tS5vk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YqZEIs6tS5vk_mel.npy @ -Ypaf0nyjg1Js audiocaps A woman speaking followed by a porcelain plate clanking as food and oil sizzles /home/tiger/nfs/data/audiocaps/test_16k/Ypaf0nyjg1Js.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ypaf0nyjg1Js_mel.npy @@ -YBZCEDkx37rI audiocaps An engine hums as it idles /home/tiger/nfs/data/audiocaps/test_16k/YBZCEDkx37rI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YBZCEDkx37rI_mel.npy -YFR7BDRhMATo audiocaps Blowing of a horn as a train passes /home/tiger/nfs/data/audiocaps/test_16k/YFR7BDRhMATo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFR7BDRhMATo_mel.npy @ -YXJba7pTbpD0 audiocaps Short spray followed by louder longer spray /home/tiger/nfs/data/audiocaps/test_16k/YXJba7pTbpD0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXJba7pTbpD0_mel.npy @ -YCeRoaEcqUgM audiocaps A motor is revving and changing gears /home/tiger/nfs/data/audiocaps/test_16k/YCeRoaEcqUgM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YCeRoaEcqUgM_mel.npy @ -Yzq00Oe1ecpE audiocaps Humming from an engine slowing down then speeding up /home/tiger/nfs/data/audiocaps/test_16k/Yzq00Oe1ecpE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yzq00Oe1ecpE_mel.npy @@ -YztSjcZNUY7A audiocaps A baby cries as a woman speaks with other speech background noise /home/tiger/nfs/data/audiocaps/test_16k/YztSjcZNUY7A.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YztSjcZNUY7A_mel.npy @@ -YglAeihz0NAM audiocaps Ocean waves crashing in the distance as young girl talks followed by a young man talking while a group of children laughs in the background and wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YglAeihz0NAM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YglAeihz0NAM_mel.npy @@@@ -YCM49C3RkzV8 audiocaps An adult female speaks, and muted speech occurs briefly in the background /home/tiger/nfs/data/audiocaps/test_16k/YCM49C3RkzV8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YCM49C3RkzV8_mel.npy @ -YH-vTZh81qAU audiocaps A metal clank followed by motor vibrating and rumbling /home/tiger/nfs/data/audiocaps/test_16k/YH-vTZh81qAU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YH-vTZh81qAU_mel.npy @@ -Yup2PpjTzyyc audiocaps Music and a man speaking followed by bleeps and someone singing /home/tiger/nfs/data/audiocaps/test_16k/Yup2PpjTzyyc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yup2PpjTzyyc_mel.npy @@ -YdlsiellSFf0 audiocaps Motorboat engine screams as it accelerates /home/tiger/nfs/data/audiocaps/test_16k/YdlsiellSFf0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YdlsiellSFf0_mel.npy -Y0jGH7A_hpBM audiocaps A man speaking followed by another man speaking with some rustling /home/tiger/nfs/data/audiocaps/test_16k/Y0jGH7A_hpBM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0jGH7A_hpBM_mel.npy @ -YCefFMA3klxk audiocaps A vehicle horn honking followed by a large truck engine accelerating while wind blows lightly into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YCefFMA3klxk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YCefFMA3klxk_mel.npy @@ -YKnXNy5Q6YS4 audiocaps Many insects are buzzing and rustling is occurring, while an adult male speaks /home/tiger/nfs/data/audiocaps/test_16k/YKnXNy5Q6YS4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YKnXNy5Q6YS4_mel.npy @@ -YcPiSd5nJLrI audiocaps People speaking with loud bangs followed by a slow motion rumble /home/tiger/nfs/data/audiocaps/test_16k/YcPiSd5nJLrI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YcPiSd5nJLrI_mel.npy @ -YrJVXE6Axtrg audiocaps A couple of men speaking as metal clanks and a power tool operates /home/tiger/nfs/data/audiocaps/test_16k/YrJVXE6Axtrg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YrJVXE6Axtrg_mel.npy @@ -YFA11v4SmdBc audiocaps A man speaks and then whistles /home/tiger/nfs/data/audiocaps/test_16k/YFA11v4SmdBc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFA11v4SmdBc_mel.npy @ -YQvATUKXYFBs audiocaps Bells ring followed by humming and vibrations as a train passes while blowing a horn /home/tiger/nfs/data/audiocaps/test_16k/YQvATUKXYFBs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YQvATUKXYFBs_mel.npy @@@@ -Y_ezm-TpKj1w audiocaps A vehicle engine revving as a crowd of people talk /home/tiger/nfs/data/audiocaps/test_16k/Y_ezm-TpKj1w.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_ezm-TpKj1w_mel.npy @ -YYEYeQ0lIkBQ audiocaps Several ducks quack and chirp as men speak and wind blows /home/tiger/nfs/data/audiocaps/test_16k/YYEYeQ0lIkBQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YYEYeQ0lIkBQ_mel.npy @@ -Y_xylo5_IiaM audiocaps A woman talking as a baby talks followed by plastic thumping /home/tiger/nfs/data/audiocaps/test_16k/Y_xylo5_IiaM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_xylo5_IiaM_mel.npy @@ -YKtTLsveexOg audiocaps A sewing machine operating as a machine motor hisses loudly in the background /home/tiger/nfs/data/audiocaps/test_16k/YKtTLsveexOg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YKtTLsveexOg_mel.npy @ -Y5QZ0NtdoKJ8 audiocaps Digital beeps repeating then a person speaks /home/tiger/nfs/data/audiocaps/test_16k/Y5QZ0NtdoKJ8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y5QZ0NtdoKJ8_mel.npy @ -Y_AcJVyToQUQ audiocaps A man and woman laughing followed by a man shouting then a woman laughing as a child laughs /home/tiger/nfs/data/audiocaps/test_16k/Y_AcJVyToQUQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_AcJVyToQUQ_mel.npy @@@ -YkEP-BwMarf8 audiocaps Crumpling paper noise with female speech /home/tiger/nfs/data/audiocaps/test_16k/YkEP-BwMarf8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YkEP-BwMarf8_mel.npy @ -Y7D7xgd4WJ50 audiocaps A bell is ringing loudly and quickly /home/tiger/nfs/data/audiocaps/test_16k/Y7D7xgd4WJ50.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7D7xgd4WJ50_mel.npy -YyVVLq4ao1Ck audiocaps Several birds chirp with some hissing /home/tiger/nfs/data/audiocaps/test_16k/YyVVLq4ao1Ck.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyVVLq4ao1Ck_mel.npy @ -YS0YE96w0YRk audiocaps A man speaking as a crowd of people laugh and applaud /home/tiger/nfs/data/audiocaps/test_16k/YS0YE96w0YRk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YS0YE96w0YRk_mel.npy @@ -Ylh801oHGtD4 audiocaps A small motor buzzing followed by a man speaking as a metal door closes /home/tiger/nfs/data/audiocaps/test_16k/Ylh801oHGtD4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ylh801oHGtD4_mel.npy @@ -YPb6MqpdX5Jw audiocaps Clip-clops gallop as the wind blows and thunder cracks /home/tiger/nfs/data/audiocaps/test_16k/YPb6MqpdX5Jw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPb6MqpdX5Jw_mel.npy @@ -Y9U8COLzEegs audiocaps Electronic beeping as a man talks and water pouring in the background /home/tiger/nfs/data/audiocaps/test_16k/Y9U8COLzEegs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9U8COLzEegs_mel.npy @@@@ -Ypaf0nyjg1Js audiocaps Food and oil sizzling as a woman is talking followed by dinner plates clanking /home/tiger/nfs/data/audiocaps/test_16k/Ypaf0nyjg1Js.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ypaf0nyjg1Js_mel.npy @@ -Ydxow2DcTrwk audiocaps Wind blowing followed by people speaking then a loud burst of thunder /home/tiger/nfs/data/audiocaps/test_16k/Ydxow2DcTrwk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ydxow2DcTrwk_mel.npy @@ -Ya0yXS7PmVR0 audiocaps A heavy rain dies down and begins again /home/tiger/nfs/data/audiocaps/test_16k/Ya0yXS7PmVR0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ya0yXS7PmVR0_mel.npy @@ -Y0a9wVat2PWk audiocaps A train sounds horn while traveling on train track /home/tiger/nfs/data/audiocaps/test_16k/Y0a9wVat2PWk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0a9wVat2PWk_mel.npy @ -Ybgbnu5YKTDg audiocaps A man speaking over an intercom as a helicopter engine runs followed by several gunshots firing /home/tiger/nfs/data/audiocaps/test_16k/Ybgbnu5YKTDg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ybgbnu5YKTDg_mel.npy @@ -YCO6-i8NLbeo audiocaps A man talking followed by a goat baaing then a metal gate sliding while ducks quack and wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YCO6-i8NLbeo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YCO6-i8NLbeo_mel.npy @@@@ -YpI_kPedctoo audiocaps Motorcycle engine running /home/tiger/nfs/data/audiocaps/test_16k/YpI_kPedctoo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YpI_kPedctoo_mel.npy -YEYTz1LPDHsc audiocaps A vehicle door opening as a crow caws and birds chirp while vehicles drive by in the background /home/tiger/nfs/data/audiocaps/test_16k/YEYTz1LPDHsc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YEYTz1LPDHsc_mel.npy @@@@ -YD9tinq3RMpU audiocaps An engine running and wind with various speech in the background /home/tiger/nfs/data/audiocaps/test_16k/YD9tinq3RMpU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YD9tinq3RMpU_mel.npy @@ -YEzWEO2WD_MM audiocaps A drone whirring followed by a crashing sound /home/tiger/nfs/data/audiocaps/test_16k/YEzWEO2WD_MM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YEzWEO2WD_MM_mel.npy @ -YtfOIhQpYYe8 audiocaps A man talking as a helicopter flies by /home/tiger/nfs/data/audiocaps/test_16k/YtfOIhQpYYe8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YtfOIhQpYYe8_mel.npy @ -Y_w2pA1VeB40 audiocaps A group of people laughing followed by farting /home/tiger/nfs/data/audiocaps/test_16k/Y_w2pA1VeB40.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_w2pA1VeB40_mel.npy @ -YJnSwRonB9wI audiocaps Screaming, wind and an engine running, and laughing /home/tiger/nfs/data/audiocaps/test_16k/YJnSwRonB9wI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJnSwRonB9wI_mel.npy @@@ -YRNBoH2LHQEM audiocaps A crowd applauds with a man speaking briefly in the middle /home/tiger/nfs/data/audiocaps/test_16k/YRNBoH2LHQEM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YRNBoH2LHQEM_mel.npy @ -YpaetCbEqp2w audiocaps A series of computer mouse clicks followed by a kid crying /home/tiger/nfs/data/audiocaps/test_16k/YpaetCbEqp2w.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YpaetCbEqp2w_mel.npy @ -YLs1zyPjs3k8 audiocaps A series of electronic beeps followed by static /home/tiger/nfs/data/audiocaps/test_16k/YLs1zyPjs3k8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YLs1zyPjs3k8_mel.npy @ -YK03ydb1uaoQ audiocaps Loud snoring repeating /home/tiger/nfs/data/audiocaps/test_16k/YK03ydb1uaoQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YK03ydb1uaoQ_mel.npy -Yific_gRalg0 audiocaps Water pouring down a drain with a series of metal clangs followed by a metal chain rattling /home/tiger/nfs/data/audiocaps/test_16k/Yific_gRalg0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yific_gRalg0_mel.npy @@ -YBlbGXalLNVU audiocaps A man talking as water splashes /home/tiger/nfs/data/audiocaps/test_16k/YBlbGXalLNVU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YBlbGXalLNVU_mel.npy @ -Y1nUOGZgSzZo audiocaps Wind blowing and water splashing /home/tiger/nfs/data/audiocaps/test_16k/Y1nUOGZgSzZo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1nUOGZgSzZo_mel.npy @ -Yc0V_HAul7rI audiocaps A group of people laughing followed by a man talking /home/tiger/nfs/data/audiocaps/test_16k/Yc0V_HAul7rI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yc0V_HAul7rI_mel.npy @ -YPtW0cZVprJQ audiocaps A person snoring followed by a man talking /home/tiger/nfs/data/audiocaps/test_16k/YPtW0cZVprJQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPtW0cZVprJQ_mel.npy @ -YAbplcXwXnvE audiocaps Girl speaks and crunches plastic wrapping /home/tiger/nfs/data/audiocaps/test_16k/YAbplcXwXnvE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YAbplcXwXnvE_mel.npy @ -Yd1tL-9BILy8 audiocaps Pigeons cooing as air lightly hisses in the background followed by a camera muffling /home/tiger/nfs/data/audiocaps/test_16k/Yd1tL-9BILy8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yd1tL-9BILy8_mel.npy @@ -YonBZOH88OYs audiocaps A series of compressed air spraying as a motor hums in the background /home/tiger/nfs/data/audiocaps/test_16k/YonBZOH88OYs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YonBZOH88OYs_mel.npy @ -Y3wV3ST-c4PE audiocaps Low ticktock sounds followed by objects moving /home/tiger/nfs/data/audiocaps/test_16k/Y3wV3ST-c4PE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y3wV3ST-c4PE_mel.npy @ -Yy93cZqNCtks audiocaps Gunshots fire, an adult male speaks, footfalls and clicking occur as other adult males speak, gunshots fire again, an adult male speaks, and a dog growls /home/tiger/nfs/data/audiocaps/test_16k/Yy93cZqNCtks.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yy93cZqNCtks_mel.npy @@@@@@@ -YL2dyilgQ8iM audiocaps Footsteps shuffling on snow alongside a camera muffling while wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YL2dyilgQ8iM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YL2dyilgQ8iM_mel.npy @@ -YWWkhzcmx3VE audiocaps Duck quacking repeatedly /home/tiger/nfs/data/audiocaps/test_16k/YWWkhzcmx3VE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YWWkhzcmx3VE_mel.npy -Yu9px4Lwv9XI audiocaps Tribal drums playing as footsteps shuffle on wet dirt as frogs and crickets chirp in the background /home/tiger/nfs/data/audiocaps/test_16k/Yu9px4Lwv9XI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yu9px4Lwv9XI_mel.npy @@ -Yhrv6fwnmBkY audiocaps A rooster clucking followed by a dog whimpering then a man talking and a dog barking /home/tiger/nfs/data/audiocaps/test_16k/Yhrv6fwnmBkY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yhrv6fwnmBkY_mel.npy @@@ -YzEaGx6an4es audiocaps A power tool drill operating continuously /home/tiger/nfs/data/audiocaps/test_16k/YzEaGx6an4es.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YzEaGx6an4es_mel.npy -YE6FH_xp3I54 audiocaps A man speaking as birds are chirping /home/tiger/nfs/data/audiocaps/test_16k/YE6FH_xp3I54.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YE6FH_xp3I54_mel.npy @ -YDt53UZgyznE audiocaps Pretend to scream and crying is occurring, and an adult male begins to speak /home/tiger/nfs/data/audiocaps/test_16k/YDt53UZgyznE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YDt53UZgyznE_mel.npy @@ -YOMGHnJV0l2U audiocaps Metal scrapping against a wooden surface followed by sand scrapping then more metal scrapping against wood /home/tiger/nfs/data/audiocaps/test_16k/YOMGHnJV0l2U.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YOMGHnJV0l2U_mel.npy @@ -Y-NsC63dA01g audiocaps A cat meows and a woman speaks /home/tiger/nfs/data/audiocaps/test_16k/Y-NsC63dA01g.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y-NsC63dA01g_mel.npy @ -YfGGYeXR_LS8 audiocaps Whistling as a man speaks /home/tiger/nfs/data/audiocaps/test_16k/YfGGYeXR_LS8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YfGGYeXR_LS8_mel.npy @ -Y_ezm-TpKj1w audiocaps A mid-size motor vehicle engine is revving repeatedly, while people talk in the background /home/tiger/nfs/data/audiocaps/test_16k/Y_ezm-TpKj1w.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_ezm-TpKj1w_mel.npy @ -YhJtOGmN_KVw audiocaps A man is speaking as paper is crumpling /home/tiger/nfs/data/audiocaps/test_16k/YhJtOGmN_KVw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YhJtOGmN_KVw_mel.npy @ -YUmNrhFKpWIY audiocaps A vehicle engine revving then powering down /home/tiger/nfs/data/audiocaps/test_16k/YUmNrhFKpWIY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YUmNrhFKpWIY_mel.npy @ -YxBZnvfniA1c audiocaps A man is speaking followed by a child speaking and then laughter /home/tiger/nfs/data/audiocaps/test_16k/YxBZnvfniA1c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YxBZnvfniA1c_mel.npy @@ -YUCy1BEx8jBE audiocaps A man speaking as a stream of water splashes and flows while music faintly plays in the distance /home/tiger/nfs/data/audiocaps/test_16k/YUCy1BEx8jBE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YUCy1BEx8jBE_mel.npy @@ -YF7QtqKtllK0 audiocaps Continuous snoring of a person /home/tiger/nfs/data/audiocaps/test_16k/YF7QtqKtllK0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YF7QtqKtllK0_mel.npy -YalaxBd_EEUc audiocaps A man talking followed by a series of belches /home/tiger/nfs/data/audiocaps/test_16k/YalaxBd_EEUc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YalaxBd_EEUc_mel.npy @ -YxYwpABpZed4 audiocaps A woman speaks as she fries food /home/tiger/nfs/data/audiocaps/test_16k/YxYwpABpZed4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YxYwpABpZed4_mel.npy @ -YlfO471Rn61k audiocaps Spray and a high pitch tone /home/tiger/nfs/data/audiocaps/test_16k/YlfO471Rn61k.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YlfO471Rn61k_mel.npy @ -YHZ9O6sc7cLA audiocaps A woman speaks and continues to do so as a dog starts barking /home/tiger/nfs/data/audiocaps/test_16k/YHZ9O6sc7cLA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YHZ9O6sc7cLA_mel.npy @ -Y41D0yXSBqfI audiocaps A bird is cooing and flapping its wings /home/tiger/nfs/data/audiocaps/test_16k/Y41D0yXSBqfI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y41D0yXSBqfI_mel.npy @ -YJ0yeFeKvIt8 audiocaps Continuous white noise, rustling and wind /home/tiger/nfs/data/audiocaps/test_16k/YJ0yeFeKvIt8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJ0yeFeKvIt8_mel.npy @@ -YKvrcRMfFzOE audiocaps An engine running and helicopter propellers spinning /home/tiger/nfs/data/audiocaps/test_16k/YKvrcRMfFzOE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YKvrcRMfFzOE_mel.npy @ -Y7cHRSfbp7tc audiocaps People are talking along with knock sounds /home/tiger/nfs/data/audiocaps/test_16k/Y7cHRSfbp7tc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7cHRSfbp7tc_mel.npy @ -YNeWW30WZjPc audiocaps A dog barking and growling while plastic rattles and clanks against a hard surface /home/tiger/nfs/data/audiocaps/test_16k/YNeWW30WZjPc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YNeWW30WZjPc_mel.npy @ -YtjCNwdOUiGc audiocaps Humming of an engine followed by some honks of a horn /home/tiger/nfs/data/audiocaps/test_16k/YtjCNwdOUiGc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YtjCNwdOUiGc_mel.npy @ -YOpiWMltpj44 audiocaps Ducks quacking as roosters crow and chickens cluck while water trickles /home/tiger/nfs/data/audiocaps/test_16k/YOpiWMltpj44.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YOpiWMltpj44_mel.npy @@@ -YdYvL6uEMl6E audiocaps A helicopter flying followed by wind heavily blowing into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YdYvL6uEMl6E.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YdYvL6uEMl6E_mel.npy @ -YgbtcDoh0q3c audiocaps A scratching of surface sound followed by men talking and snickering /home/tiger/nfs/data/audiocaps/test_16k/YgbtcDoh0q3c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YgbtcDoh0q3c_mel.npy @ -YjjHIINDfE1c audiocaps Humming from an engine followed by loud honks of a horn /home/tiger/nfs/data/audiocaps/test_16k/YjjHIINDfE1c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YjjHIINDfE1c_mel.npy @ -YsqsI2UyrcBQ audiocaps A car engine revs producing a room and a whine /home/tiger/nfs/data/audiocaps/test_16k/YsqsI2UyrcBQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YsqsI2UyrcBQ_mel.npy @@ -YjOYvIISk--4 audiocaps A man speaks as water flows from a faucet in quick bursts /home/tiger/nfs/data/audiocaps/test_16k/YjOYvIISk--4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YjOYvIISk--4_mel.npy @ -Y3MoF8myFs8Y audiocaps Ocean waves crashing as a man talks in the distance and wind heavily blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/Y3MoF8myFs8Y.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y3MoF8myFs8Y_mel.npy @@ -YPMMkPq5jJXY audiocaps Burping and then laughing with continuous burping /home/tiger/nfs/data/audiocaps/test_16k/YPMMkPq5jJXY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPMMkPq5jJXY_mel.npy @@ -YtfOIhQpYYe8 audiocaps An aircraft motor is running and whirring is present, helicopter rotors slap rhythmically, and an adult male speaks in the background /home/tiger/nfs/data/audiocaps/test_16k/YtfOIhQpYYe8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YtfOIhQpYYe8_mel.npy @@@ -YAtkD-3GjXMw audiocaps Music is playing with machine gun sounds /home/tiger/nfs/data/audiocaps/test_16k/YAtkD-3GjXMw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YAtkD-3GjXMw_mel.npy @ -Y6Nvu6EcpdE8 audiocaps The wind is blowing, an adult male speaks via an electronic device, and a click occurs /home/tiger/nfs/data/audiocaps/test_16k/Y6Nvu6EcpdE8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6Nvu6EcpdE8_mel.npy @@ -YzoxFl3pddMg audiocaps Nature sounds with a frog croaking /home/tiger/nfs/data/audiocaps/test_16k/YzoxFl3pddMg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YzoxFl3pddMg_mel.npy @ -YVQnmlf2OsUg audiocaps Helicopter blades spinning /home/tiger/nfs/data/audiocaps/test_16k/YVQnmlf2OsUg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YVQnmlf2OsUg_mel.npy -YjjHIINDfE1c audiocaps A tractor engine driving by followed by a car horn honking and wind blowing on a microphone /home/tiger/nfs/data/audiocaps/test_16k/YjjHIINDfE1c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YjjHIINDfE1c_mel.npy @@ -YB4SZwi9Ce3o audiocaps A man talks over a clicking sound and a car engine switches gears and speeds up /home/tiger/nfs/data/audiocaps/test_16k/YB4SZwi9Ce3o.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YB4SZwi9Ce3o_mel.npy @@@ -Y9ucb5HYO8ps audiocaps A girl burping then laughing followed by a group of girls laughing and talking /home/tiger/nfs/data/audiocaps/test_16k/Y9ucb5HYO8ps.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9ucb5HYO8ps_mel.npy @@ -YWWkhzcmx3VE audiocaps A duck quacking /home/tiger/nfs/data/audiocaps/test_16k/YWWkhzcmx3VE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YWWkhzcmx3VE_mel.npy -YNJEPbGVBJIQ audiocaps Traffic hums and beeps with revving engines and a man speaking nearby /home/tiger/nfs/data/audiocaps/test_16k/YNJEPbGVBJIQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YNJEPbGVBJIQ_mel.npy @@@ -YZYWCwfCkBp4 audiocaps A person is sawing wood and music is playing in the background /home/tiger/nfs/data/audiocaps/test_16k/YZYWCwfCkBp4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZYWCwfCkBp4_mel.npy @ -Y9XqkKuTqEOM audiocaps Some scratching and rustling with small clicks /home/tiger/nfs/data/audiocaps/test_16k/Y9XqkKuTqEOM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9XqkKuTqEOM_mel.npy @@ -YUhCzD6EBJBU audiocaps A power tool vibrating quick followed by a man speaking and some bangs /home/tiger/nfs/data/audiocaps/test_16k/YUhCzD6EBJBU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YUhCzD6EBJBU_mel.npy @@ -YTQr9v-PQOc4 audiocaps Some clicking followed by a sneeze and a man laughing /home/tiger/nfs/data/audiocaps/test_16k/YTQr9v-PQOc4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YTQr9v-PQOc4_mel.npy @@ -YxBZnvfniA1c audiocaps A man speaks with others speaking in the distance followed by a girl speaking and others laughing /home/tiger/nfs/data/audiocaps/test_16k/YxBZnvfniA1c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YxBZnvfniA1c_mel.npy @@@ -Y0NGSrwioYjA audiocaps There is a mature male talking to some animals /home/tiger/nfs/data/audiocaps/test_16k/Y0NGSrwioYjA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0NGSrwioYjA_mel.npy @ -Y6Pywt0f_NFY audiocaps Water running continuously /home/tiger/nfs/data/audiocaps/test_16k/Y6Pywt0f_NFY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6Pywt0f_NFY_mel.npy -Yh5_1pnkl_SY audiocaps Water trickles, splashes and gurgles, slow at first and then faster, and an adult male is speaking /home/tiger/nfs/data/audiocaps/test_16k/Yh5_1pnkl_SY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yh5_1pnkl_SY_mel.npy @@ -Yfx4r_KuW6No audiocaps A woman talking back and forth with a child who is crying /home/tiger/nfs/data/audiocaps/test_16k/Yfx4r_KuW6No.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yfx4r_KuW6No_mel.npy @@@@@ -Y9F3sutgYTvo audiocaps A man yelling followed by an infant crying then a woman shouting as a crowd of people talk and laugh /home/tiger/nfs/data/audiocaps/test_16k/Y9F3sutgYTvo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9F3sutgYTvo_mel.npy @@@@ -YjOYvIISk--4 audiocaps A man speaking as a faucet pours water several times while water drains down a pipe /home/tiger/nfs/data/audiocaps/test_16k/YjOYvIISk--4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YjOYvIISk--4_mel.npy @@@ -Y7P6lcyeDKNI audiocaps Dirt shuffling followed by gears cranking and a branch snapping then a man talking /home/tiger/nfs/data/audiocaps/test_16k/Y7P6lcyeDKNI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7P6lcyeDKNI_mel.npy @@@ -YQHfyKaOHSz4 audiocaps Fly buzzing followed by frog swallowing it and then a croak /home/tiger/nfs/data/audiocaps/test_16k/YQHfyKaOHSz4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YQHfyKaOHSz4_mel.npy @@ -YSE_3nszEw7o audiocaps Hissing together with an engine chugging /home/tiger/nfs/data/audiocaps/test_16k/YSE_3nszEw7o.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YSE_3nszEw7o_mel.npy @ -Y_iUX8CibElk audiocaps Sustained industrial engine noise /home/tiger/nfs/data/audiocaps/test_16k/Y_iUX8CibElk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_iUX8CibElk_mel.npy -YPLHXGDnig4M audiocaps A person speaks and makes meow sounds /home/tiger/nfs/data/audiocaps/test_16k/YPLHXGDnig4M.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPLHXGDnig4M_mel.npy @ -YyVVLq4ao1Ck audiocaps Several birds tweeting loudly followed by insects chirping /home/tiger/nfs/data/audiocaps/test_16k/YyVVLq4ao1Ck.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyVVLq4ao1Ck_mel.npy @ -Y2JutOgAnqWA audiocaps Humming and vibrating of a power tool with some high frequency squealing /home/tiger/nfs/data/audiocaps/test_16k/Y2JutOgAnqWA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2JutOgAnqWA_mel.npy @ -YbX2vDaHL26U audiocaps Loud wind noise followed by a car accelerating fast /home/tiger/nfs/data/audiocaps/test_16k/YbX2vDaHL26U.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YbX2vDaHL26U_mel.npy @ -YXf5LjaE_JQ0 audiocaps A man speaks with distant traffic passing and some nearby rattling /home/tiger/nfs/data/audiocaps/test_16k/YXf5LjaE_JQ0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXf5LjaE_JQ0_mel.npy @@ -Y0NGSrwioYjA audiocaps A man talking followed by a goat baaing as wind lightly blows into a microphone followed by a crow cawing in the distance /home/tiger/nfs/data/audiocaps/test_16k/Y0NGSrwioYjA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0NGSrwioYjA_mel.npy @@@ -YCh0LMmhBUg4 audiocaps A man talking as a kid yells followed by an aircraft flying in the distance as wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YCh0LMmhBUg4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YCh0LMmhBUg4_mel.npy @@@ -Yl5KdHAWwJCw audiocaps A clock ticks with breathing in the background /home/tiger/nfs/data/audiocaps/test_16k/Yl5KdHAWwJCw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yl5KdHAWwJCw_mel.npy @ -YVE6Ku0-ucUM audiocaps A man speaks followed by popping noise and laughter /home/tiger/nfs/data/audiocaps/test_16k/YVE6Ku0-ucUM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YVE6Ku0-ucUM_mel.npy @@ -YYIqpIjjee00 audiocaps Water running from a flushed toilet /home/tiger/nfs/data/audiocaps/test_16k/YYIqpIjjee00.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YYIqpIjjee00_mel.npy @@ -YItS07xtdi4s audiocaps Fire igniting followed by an electronic beep then footsteps running on concrete as vehicle engines run idle and horns honk in the background /home/tiger/nfs/data/audiocaps/test_16k/YItS07xtdi4s.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YItS07xtdi4s_mel.npy @@@@ -YHeEa1GZpUGI audiocaps Several gunshots with a click and glass breaking /home/tiger/nfs/data/audiocaps/test_16k/YHeEa1GZpUGI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YHeEa1GZpUGI_mel.npy @@ -YjjfUaMQaG1A audiocaps A man speaks followed by vibrations of a power tool /home/tiger/nfs/data/audiocaps/test_16k/YjjfUaMQaG1A.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YjjfUaMQaG1A_mel.npy @ -Y79XDcI6xZm0 audiocaps A man is giving a speech while the crowd is chanting and clapping in the background /home/tiger/nfs/data/audiocaps/test_16k/Y79XDcI6xZm0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y79XDcI6xZm0_mel.npy @@ -Y52IxrdTxGs4 audiocaps A large explosion and a heartbeat, a person speaks /home/tiger/nfs/data/audiocaps/test_16k/Y52IxrdTxGs4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y52IxrdTxGs4_mel.npy @@ -YlJayhiVzl_E audiocaps A motorboat engine running as wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YlJayhiVzl_E.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YlJayhiVzl_E_mel.npy @ -YLs2vrr9TamU audiocaps Humming from a motor with loud dry cracking /home/tiger/nfs/data/audiocaps/test_16k/YLs2vrr9TamU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YLs2vrr9TamU_mel.npy @ -Y4SZ7JXDCNps audiocaps An engine booms and hums with constant rattling /home/tiger/nfs/data/audiocaps/test_16k/Y4SZ7JXDCNps.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4SZ7JXDCNps_mel.npy @@ -YzIgGMlZENTs audiocaps A duck quacks followed by a man talking while birds chirp in the distance /home/tiger/nfs/data/audiocaps/test_16k/YzIgGMlZENTs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YzIgGMlZENTs_mel.npy @@@ -Y1GgEpRZDWN0 audiocaps A woman and a man talking as another man talks softly and papers shuffle in the background /home/tiger/nfs/data/audiocaps/test_16k/Y1GgEpRZDWN0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1GgEpRZDWN0_mel.npy @@ -YqF72bT878gw audiocaps A speedboat running as wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YqF72bT878gw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YqF72bT878gw_mel.npy @ -YVeCSHwtkBZU audiocaps An emergency vehicle has the siren on /home/tiger/nfs/data/audiocaps/test_16k/YVeCSHwtkBZU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YVeCSHwtkBZU_mel.npy -YZ1Cyj4N05lk audiocaps A person whistling then a man speaking with plastic tapping /home/tiger/nfs/data/audiocaps/test_16k/YZ1Cyj4N05lk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZ1Cyj4N05lk_mel.npy @@ -YXplKBvZaHXA audiocaps A man talking as a motorbike engine runs and accelerates /home/tiger/nfs/data/audiocaps/test_16k/YXplKBvZaHXA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXplKBvZaHXA_mel.npy @@ -Yhzn_wGlzGpU audiocaps A vehicle engine running smoothly /home/tiger/nfs/data/audiocaps/test_16k/Yhzn_wGlzGpU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yhzn_wGlzGpU_mel.npy -Y0yxEvdnimGg audiocaps A dog barking as a man is talking while birds chirp and wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/Y0yxEvdnimGg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0yxEvdnimGg_mel.npy @@@ -YLKhokVsJhN0 audiocaps A herd of sheep baaing /home/tiger/nfs/data/audiocaps/test_16k/YLKhokVsJhN0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YLKhokVsJhN0_mel.npy -YRp4Ct_TQvAM audiocaps Rain falling as a motor engine runs idle and a man talks /home/tiger/nfs/data/audiocaps/test_16k/YRp4Ct_TQvAM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YRp4Ct_TQvAM_mel.npy @@ -Y9BGLAUSF0sk audiocaps An engine running /home/tiger/nfs/data/audiocaps/test_16k/Y9BGLAUSF0sk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y9BGLAUSF0sk_mel.npy -Y_duNX6Vyd6g audiocaps A speedboat is racing across water with loud wind noise /home/tiger/nfs/data/audiocaps/test_16k/Y_duNX6Vyd6g.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_duNX6Vyd6g_mel.npy @ -YFA11v4SmdBc audiocaps Male speech and then whistling /home/tiger/nfs/data/audiocaps/test_16k/YFA11v4SmdBc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFA11v4SmdBc_mel.npy @ -Y7QN3lwOzfdg audiocaps A man speaking through a telephone speaker as another man is talking /home/tiger/nfs/data/audiocaps/test_16k/Y7QN3lwOzfdg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7QN3lwOzfdg_mel.npy @ -YOUUckswAaNI audiocaps A short hammering sound followed by two men speaking /home/tiger/nfs/data/audiocaps/test_16k/YOUUckswAaNI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YOUUckswAaNI_mel.npy @ -YAgaiowyYt88 audiocaps A loud and forceful bang /home/tiger/nfs/data/audiocaps/test_16k/YAgaiowyYt88.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YAgaiowyYt88_mel.npy -YTdl9SmBbRnA audiocaps Speaking and an engine running /home/tiger/nfs/data/audiocaps/test_16k/YTdl9SmBbRnA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YTdl9SmBbRnA_mel.npy @ -Y6OlHuvJR_Dk audiocaps A helicopter engine working /home/tiger/nfs/data/audiocaps/test_16k/Y6OlHuvJR_Dk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6OlHuvJR_Dk_mel.npy -YbygBWUkpaC8 audiocaps A male speech and wind and then birds chirping /home/tiger/nfs/data/audiocaps/test_16k/YbygBWUkpaC8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YbygBWUkpaC8_mel.npy @@ -YFfUqv0Vv3ME audiocaps A man speaking followed by a woman talking then plastic clacking as footsteps walk on grass and a rooster crows in the distance /home/tiger/nfs/data/audiocaps/test_16k/YFfUqv0Vv3ME.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFfUqv0Vv3ME_mel.npy @@@@ -YWWkhzcmx3VE audiocaps A duck quacking /home/tiger/nfs/data/audiocaps/test_16k/YWWkhzcmx3VE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YWWkhzcmx3VE_mel.npy -YqZEIs6tS5vk audiocaps A crowd of people talking followed by a vehicle engine revving then tires skidding /home/tiger/nfs/data/audiocaps/test_16k/YqZEIs6tS5vk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YqZEIs6tS5vk_mel.npy @@ -Ypgq2KPX5_SA audiocaps A paper is being crumpled /home/tiger/nfs/data/audiocaps/test_16k/Ypgq2KPX5_SA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ypgq2KPX5_SA_mel.npy -YH7rd9bZtbgc audiocaps Church bells ringing /home/tiger/nfs/data/audiocaps/test_16k/YH7rd9bZtbgc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YH7rd9bZtbgc_mel.npy -YO90Qy2xG6oA audiocaps A domestic pet is making noises and a baby cries /home/tiger/nfs/data/audiocaps/test_16k/YO90Qy2xG6oA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YO90Qy2xG6oA_mel.npy @ -YGuizRlAQ8qQ audiocaps Humming and vibrating from a power tool /home/tiger/nfs/data/audiocaps/test_16k/YGuizRlAQ8qQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YGuizRlAQ8qQ_mel.npy -YoOMtaqvQ3_M audiocaps A helicopter flying as wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YoOMtaqvQ3_M.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YoOMtaqvQ3_M_mel.npy @ -Y2ymiXjImwGs audiocaps A crowd murmurs as a siren blares and then stops at a distance /home/tiger/nfs/data/audiocaps/test_16k/Y2ymiXjImwGs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2ymiXjImwGs_mel.npy @@ -Y7fmOlUlwoNg audiocaps Vibrations and rattling with people speaking in the distance /home/tiger/nfs/data/audiocaps/test_16k/Y7fmOlUlwoNg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y7fmOlUlwoNg_mel.npy @ -YWmDe2xbnSY4 audiocaps Several bursts and explosions with grunting and growling /home/tiger/nfs/data/audiocaps/test_16k/YWmDe2xbnSY4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YWmDe2xbnSY4_mel.npy @ -Y-DmjkgWa-rw audiocaps A bell is ringing /home/tiger/nfs/data/audiocaps/test_16k/Y-DmjkgWa-rw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y-DmjkgWa-rw_mel.npy @ -YKtinboYbmHQ audiocaps A vehicle driving by while revving as tires skid and squeal /home/tiger/nfs/data/audiocaps/test_16k/YKtinboYbmHQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YKtinboYbmHQ_mel.npy @@ -YrUq4w4EUSWA audiocaps Loud buzzing followed by rustling and a toilet flushing /home/tiger/nfs/data/audiocaps/test_16k/YrUq4w4EUSWA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YrUq4w4EUSWA_mel.npy @@ -Y0rSETXszQM0 audiocaps Motorcycle starting then driving away /home/tiger/nfs/data/audiocaps/test_16k/Y0rSETXszQM0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0rSETXszQM0_mel.npy @ -YfwhkCnOeyC0 audiocaps Applause and speech followed by a loud high pitched bell and more applause and speech /home/tiger/nfs/data/audiocaps/test_16k/YfwhkCnOeyC0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YfwhkCnOeyC0_mel.npy @@@@ -YEbpOXac13yo audiocaps Vehicles driving by as a muffled engine runs while a man speaks then another man speaking in the distance /home/tiger/nfs/data/audiocaps/test_16k/YEbpOXac13yo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YEbpOXac13yo_mel.npy @@@ -Y_YS5uKWoB6g audiocaps A kid crying as a man and a woman talk followed by a car door opening then closing /home/tiger/nfs/data/audiocaps/test_16k/Y_YS5uKWoB6g.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y_YS5uKWoB6g_mel.npy @@@ -YyhDw7PZje3g audiocaps Two men speaking with loud insects buzzing /home/tiger/nfs/data/audiocaps/test_16k/YyhDw7PZje3g.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyhDw7PZje3g_mel.npy @ -Ydxow2DcTrwk audiocaps Rain falls and a man speaks with distant thunder /home/tiger/nfs/data/audiocaps/test_16k/Ydxow2DcTrwk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ydxow2DcTrwk_mel.npy @@ -Yjlwe9jtu5Gw audiocaps A person whistling /home/tiger/nfs/data/audiocaps/test_16k/Yjlwe9jtu5Gw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yjlwe9jtu5Gw_mel.npy -Y4KObP7cREWw audiocaps A car engine clicks and whines as it tries to start /home/tiger/nfs/data/audiocaps/test_16k/Y4KObP7cREWw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4KObP7cREWw_mel.npy @@ -Y35b9BSmN5JM audiocaps Loud vibrating followed by revving /home/tiger/nfs/data/audiocaps/test_16k/Y35b9BSmN5JM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y35b9BSmN5JM_mel.npy @ -YSGaIvgwwWSE audiocaps Rain falling and thunder roaring in the distance /home/tiger/nfs/data/audiocaps/test_16k/YSGaIvgwwWSE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YSGaIvgwwWSE_mel.npy @ -Y2j8pxiFvElM audiocaps A cat meowing twice /home/tiger/nfs/data/audiocaps/test_16k/Y2j8pxiFvElM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2j8pxiFvElM_mel.npy -Y11SEBDuoqSk audiocaps An aircraft engine flying before becoming louder while several rapid gunshots fire /home/tiger/nfs/data/audiocaps/test_16k/Y11SEBDuoqSk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y11SEBDuoqSk_mel.npy @@ -Yjid4t-FzUn0 audiocaps A man speaking and laughing followed by a goat bleat /home/tiger/nfs/data/audiocaps/test_16k/Yjid4t-FzUn0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yjid4t-FzUn0_mel.npy @@ -Y4Ujigme2IxY audiocaps A motor vehicle is running and vibrating, and a high-pitched squeal occurs /home/tiger/nfs/data/audiocaps/test_16k/Y4Ujigme2IxY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4Ujigme2IxY_mel.npy @ -Ybpv_LneHmfU audiocaps Humming of a nearby jet engine /home/tiger/nfs/data/audiocaps/test_16k/Ybpv_LneHmfU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ybpv_LneHmfU_mel.npy -Y0Rpjl1AO-P0 audiocaps A car engine is revving while driving /home/tiger/nfs/data/audiocaps/test_16k/Y0Rpjl1AO-P0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y0Rpjl1AO-P0_mel.npy -YFKaJsvcyHTk audiocaps An infant crying /home/tiger/nfs/data/audiocaps/test_16k/YFKaJsvcyHTk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFKaJsvcyHTk_mel.npy -YJon_DEFqsfM audiocaps Ducks quacking as birds chirp followed by a flock of ducks quacking /home/tiger/nfs/data/audiocaps/test_16k/YJon_DEFqsfM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJon_DEFqsfM_mel.npy @@ -YxUWSHYoslPQ audiocaps A man speaks with a high frequency hum with some banging and clanking /home/tiger/nfs/data/audiocaps/test_16k/YxUWSHYoslPQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YxUWSHYoslPQ_mel.npy @ -Yg5l3Bz6lWnc audiocaps Wood lightly shuffling as insects buzz while birds chirp in the background /home/tiger/nfs/data/audiocaps/test_16k/Yg5l3Bz6lWnc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yg5l3Bz6lWnc_mel.npy @@ -YXIooZl1QdM4 audiocaps Several loud burps /home/tiger/nfs/data/audiocaps/test_16k/YXIooZl1QdM4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXIooZl1QdM4_mel.npy -YxYwpABpZed4 audiocaps Metal clacking in a pan as a woman talks while food and oil sizzle /home/tiger/nfs/data/audiocaps/test_16k/YxYwpABpZed4.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YxYwpABpZed4_mel.npy @@ -Yu8bQf0SnCVI audiocaps Tapping followed by water spraying and more tapping /home/tiger/nfs/data/audiocaps/test_16k/Yu8bQf0SnCVI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yu8bQf0SnCVI_mel.npy @@ -YbUTOsLXYyxg audiocaps A man talking followed by another man speaking then a group of people laughing and a man speaking a bit in the background /home/tiger/nfs/data/audiocaps/test_16k/YbUTOsLXYyxg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YbUTOsLXYyxg_mel.npy @@@ -Y6eX6bJOFftA audiocaps A crowd of people talking as ducks quack and a motorboat speeds by in the distance /home/tiger/nfs/data/audiocaps/test_16k/Y6eX6bJOFftA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6eX6bJOFftA_mel.npy @@ -Y2ErfX6ZT5pM audiocaps Some child speaking in the distant and a toilet flushing /home/tiger/nfs/data/audiocaps/test_16k/Y2ErfX6ZT5pM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2ErfX6ZT5pM_mel.npy @ -YFDwK7T1JO_0 audiocaps Two men speaking followed by plastic clacking then a power tool drilling /home/tiger/nfs/data/audiocaps/test_16k/YFDwK7T1JO_0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFDwK7T1JO_0_mel.npy @@ -Y40cuHrYfaqA audiocaps Dogs barking and growling followed by a man talking /home/tiger/nfs/data/audiocaps/test_16k/Y40cuHrYfaqA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y40cuHrYfaqA_mel.npy @ -YXrJcmftCY04 audiocaps A crowd of people applauding and cheering /home/tiger/nfs/data/audiocaps/test_16k/YXrJcmftCY04.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXrJcmftCY04_mel.npy @ -Ya_Rjlu50TfA audiocaps A person snoring during a series of thumps followed by a man talking in the background /home/tiger/nfs/data/audiocaps/test_16k/Ya_Rjlu50TfA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ya_Rjlu50TfA_mel.npy @@ -Y1DKLyH3FixM audiocaps Chirping birds near and far /home/tiger/nfs/data/audiocaps/test_16k/Y1DKLyH3FixM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1DKLyH3FixM_mel.npy @ -Y6NBPiArs2-w audiocaps A series of rapid gunshots firing alongside footsteps running on concrete as a man groans while a muffled heart beats in the background /home/tiger/nfs/data/audiocaps/test_16k/Y6NBPiArs2-w.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6NBPiArs2-w_mel.npy @@@ -YuJzAf4PaExI audiocaps A muffled aircraft engine operating as a group of people talk in the background /home/tiger/nfs/data/audiocaps/test_16k/YuJzAf4PaExI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YuJzAf4PaExI_mel.npy @ -Y41D0yXSBqfI audiocaps A group of pigeons cooing as bird wings flap in the background followed by plastic tapping /home/tiger/nfs/data/audiocaps/test_16k/Y41D0yXSBqfI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y41D0yXSBqfI_mel.npy @@ -YJQz40TkjymY audiocaps Typing on a computer keyboard /home/tiger/nfs/data/audiocaps/test_16k/YJQz40TkjymY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJQz40TkjymY_mel.npy -YyVVLq4ao1Ck audiocaps Insects trill in the background, while birds chirp and flies buzz /home/tiger/nfs/data/audiocaps/test_16k/YyVVLq4ao1Ck.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YyVVLq4ao1Ck_mel.npy @@ -Ykdflh3akyH8 audiocaps Small dogs yip and whimper /home/tiger/nfs/data/audiocaps/test_16k/Ykdflh3akyH8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ykdflh3akyH8_mel.npy -YlmPMhs-9IYE audiocaps A vehicle engine revving several times as a man speaks over an intercom along with a crowd of people talking and whistling /home/tiger/nfs/data/audiocaps/test_16k/YlmPMhs-9IYE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YlmPMhs-9IYE_mel.npy @@@ -YhGWarNR6xmg audiocaps Hisses continuously with some static /home/tiger/nfs/data/audiocaps/test_16k/YhGWarNR6xmg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YhGWarNR6xmg_mel.npy @ -YZsf2YvJfCKw audiocaps A toilet is flushed with a loud hum and gurgling water /home/tiger/nfs/data/audiocaps/test_16k/YZsf2YvJfCKw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZsf2YvJfCKw_mel.npy @@ -Ynq0BF9zGkzg audiocaps A low slow groan followed by a crash and men speaking with distant birds /home/tiger/nfs/data/audiocaps/test_16k/Ynq0BF9zGkzg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Ynq0BF9zGkzg_mel.npy @@@ -Yvigslb0kClE audiocaps People talking while herding goats near a fast running stream /home/tiger/nfs/data/audiocaps/test_16k/Yvigslb0kClE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yvigslb0kClE_mel.npy @@ -Y2t82STv2GR8 audiocaps A large bell rings out multiple times /home/tiger/nfs/data/audiocaps/test_16k/Y2t82STv2GR8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2t82STv2GR8_mel.npy -YzoctgurhvHE audiocaps A man speaking as plastic is clanking followed by a door hatch opening and plastic tumbling with a vehicle engine revving in the background /home/tiger/nfs/data/audiocaps/test_16k/YzoctgurhvHE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YzoctgurhvHE_mel.npy @@@@ -YRdC8cviN6Bs audiocaps Rain is splashing on a surface while rustling occurs and a car door shuts, and traffic is discernible in the distance /home/tiger/nfs/data/audiocaps/test_16k/YRdC8cviN6Bs.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YRdC8cviN6Bs_mel.npy @@@ -YB8rdur4aams audiocaps A vehicle engine gurgling followed by a horn tooting as wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YB8rdur4aams.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YB8rdur4aams_mel.npy @@ -Y3wrdPAeqjVI audiocaps A man speaks with some high pitched ringing and some rustling /home/tiger/nfs/data/audiocaps/test_16k/Y3wrdPAeqjVI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y3wrdPAeqjVI_mel.npy @@ -Y6cS0FsUM-cQ audiocaps A cat meowing followed by people speaking /home/tiger/nfs/data/audiocaps/test_16k/Y6cS0FsUM-cQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y6cS0FsUM-cQ_mel.npy @ -YFfUqv0Vv3ME audiocaps A male speaking and rustling /home/tiger/nfs/data/audiocaps/test_16k/YFfUqv0Vv3ME.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YFfUqv0Vv3ME_mel.npy @ -YGOD8Bt5LfDE audiocaps A quiet machine running and a child speaking and then an adult speaks and the child laughs /home/tiger/nfs/data/audiocaps/test_16k/YGOD8Bt5LfDE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YGOD8Bt5LfDE_mel.npy @@@ -Y59VP93Tzjmg audiocaps Train blowing horn then approaching track sounds /home/tiger/nfs/data/audiocaps/test_16k/Y59VP93Tzjmg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y59VP93Tzjmg_mel.npy @ -Y3VHpLxtd498 audiocaps Graveling shuffling followed by a young kid talking as pigeons are cooing and a motor hums in the background /home/tiger/nfs/data/audiocaps/test_16k/Y3VHpLxtd498.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y3VHpLxtd498_mel.npy @@@ -YRtenf2XSXRc audiocaps A mid-size motor vehicle engine idles smoothly and is then revved several times, followed by a car door shutting /home/tiger/nfs/data/audiocaps/test_16k/YRtenf2XSXRc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YRtenf2XSXRc_mel.npy @@ -YJnSwRonB9wI audiocaps People scream with a distant hum and splashing waves /home/tiger/nfs/data/audiocaps/test_16k/YJnSwRonB9wI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJnSwRonB9wI_mel.npy @@ -YT32kii824pA audiocaps Plastic cranking followed by metal rattling then a series of metal falling in the background as a man is talking /home/tiger/nfs/data/audiocaps/test_16k/YT32kii824pA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YT32kii824pA_mel.npy @@@ -YJBWJQCS4SvA audiocaps Bird chirping while waves come in with high wind /home/tiger/nfs/data/audiocaps/test_16k/YJBWJQCS4SvA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJBWJQCS4SvA_mel.npy @@ -Yjf4iyQPJSvk audiocaps Water is falling, splashing and gurgling /home/tiger/nfs/data/audiocaps/test_16k/Yjf4iyQPJSvk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yjf4iyQPJSvk_mel.npy @@ -Yjs4dr5JusdM audiocaps A woman speaks quietly, and man answers much louder, then she speaks again /home/tiger/nfs/data/audiocaps/test_16k/Yjs4dr5JusdM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yjs4dr5JusdM_mel.npy @@ -YalaxBd_EEUc audiocaps A man speaks followed by eructation /home/tiger/nfs/data/audiocaps/test_16k/YalaxBd_EEUc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YalaxBd_EEUc_mel.npy @ -Y-EQByFLFqig audiocaps A man speaking as rain lightly falls followed by thunder /home/tiger/nfs/data/audiocaps/test_16k/Y-EQByFLFqig.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y-EQByFLFqig_mel.npy @@ -YHVz-FJBf_iM audiocaps Toilet flushes and water gurgles as it drains /home/tiger/nfs/data/audiocaps/test_16k/YHVz-FJBf_iM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YHVz-FJBf_iM_mel.npy @ -YBXxlqaDvdaA audiocaps A man talking as ocean waves trickle and splash while wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YBXxlqaDvdaA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YBXxlqaDvdaA_mel.npy @@ -Y1PvMtRIlZNI audiocaps A stream of water trickling as plastic clanks against a metal surface followed by water pouring down a drain alongside a camera muffling /home/tiger/nfs/data/audiocaps/test_16k/Y1PvMtRIlZNI.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1PvMtRIlZNI_mel.npy @@@ -YPMMdAKZxI_I audiocaps Loud burping speech followed by women laughing, alongside a man and woman talking in the background /home/tiger/nfs/data/audiocaps/test_16k/YPMMdAKZxI_I.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YPMMdAKZxI_I_mel.npy @@ -Yj0KvrVE_Oww audiocaps Two adult males speak, a small horn blow, and clattering occurs /home/tiger/nfs/data/audiocaps/test_16k/Yj0KvrVE_Oww.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yj0KvrVE_Oww_mel.npy @@ -YhuMLK0oA3L8 audiocaps A man speaks then whistles with a playing guitar /home/tiger/nfs/data/audiocaps/test_16k/YhuMLK0oA3L8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YhuMLK0oA3L8_mel.npy @@ -YilspW7JRjAg audiocaps A vehicle engine revving a few times /home/tiger/nfs/data/audiocaps/test_16k/YilspW7JRjAg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YilspW7JRjAg_mel.npy @ -YEvZ3jOMYWxk audiocaps A woman speaks while delivering a speech /home/tiger/nfs/data/audiocaps/test_16k/YEvZ3jOMYWxk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YEvZ3jOMYWxk_mel.npy -YL_CNz9Vrtkw audiocaps Brief speech followed by loud applause and cheering /home/tiger/nfs/data/audiocaps/test_16k/YL_CNz9Vrtkw.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YL_CNz9Vrtkw_mel.npy @ -YBL8ksJ0sTXk audiocaps Vibrations of an idling engine with a man speaking /home/tiger/nfs/data/audiocaps/test_16k/YBL8ksJ0sTXk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YBL8ksJ0sTXk_mel.npy @ -YilspW7JRjAg audiocaps A vehicle engine revving numerous times then running idle /home/tiger/nfs/data/audiocaps/test_16k/YilspW7JRjAg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YilspW7JRjAg_mel.npy @ -Y4xrL4TSgHwU audiocaps A vehicle engine starting up then running idle /home/tiger/nfs/data/audiocaps/test_16k/Y4xrL4TSgHwU.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4xrL4TSgHwU_mel.npy @ -YZ1Cyj4N05lk audiocaps A man whistles and then speaks loudly while some rustling and banging in the background /home/tiger/nfs/data/audiocaps/test_16k/YZ1Cyj4N05lk.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZ1Cyj4N05lk_mel.npy @@ -YMe4npKmtchA audiocaps Splashing water and quiet murmuring /home/tiger/nfs/data/audiocaps/test_16k/YMe4npKmtchA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YMe4npKmtchA_mel.npy @ -YITlqMkR5alY audiocaps Wind blowing followed by a scream with people speaking faintly in the distance /home/tiger/nfs/data/audiocaps/test_16k/YITlqMkR5alY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YITlqMkR5alY_mel.npy @@ -YIvg_q4t-3w0 audiocaps A person speaks and then a loud click occurs /home/tiger/nfs/data/audiocaps/test_16k/YIvg_q4t-3w0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YIvg_q4t-3w0_mel.npy @ -YwNiYSYJXssA audiocaps A kid speaking as camera plastic clicking followed by a crowd of people gasping and talking followed by a person whistling /home/tiger/nfs/data/audiocaps/test_16k/YwNiYSYJXssA.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YwNiYSYJXssA_mel.npy @@@ -YhuMLK0oA3L8 audiocaps A man speaks then whistles with a playing guitar /home/tiger/nfs/data/audiocaps/test_16k/YhuMLK0oA3L8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YhuMLK0oA3L8_mel.npy @@ -YzBXoaQ1GVlc audiocaps A woman talking while a group of children shout and talk in the background /home/tiger/nfs/data/audiocaps/test_16k/YzBXoaQ1GVlc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YzBXoaQ1GVlc_mel.npy @ -Yf2fSxfvmkZQ audiocaps A man speaks, a power tool starts and increases in frequency, a clunking noise /home/tiger/nfs/data/audiocaps/test_16k/Yf2fSxfvmkZQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yf2fSxfvmkZQ_mel.npy @@@ -YU90e2P9jy30 audiocaps Squeaking and bouncing followed by a man speaking /home/tiger/nfs/data/audiocaps/test_16k/YU90e2P9jy30.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YU90e2P9jy30_mel.npy @ -YXJba7pTbpD0 audiocaps Spray and then a loud pop and hiss /home/tiger/nfs/data/audiocaps/test_16k/YXJba7pTbpD0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YXJba7pTbpD0_mel.npy @@ -YtjCNwdOUiGc audiocaps A bus engine running followed by a bus horn honking /home/tiger/nfs/data/audiocaps/test_16k/YtjCNwdOUiGc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YtjCNwdOUiGc_mel.npy @ -YMPLZUg89y5U audiocaps A large truck engine running idle as a man is talking and wind blows into a microphone /home/tiger/nfs/data/audiocaps/test_16k/YMPLZUg89y5U.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YMPLZUg89y5U_mel.npy @@ -YelztUCeNQvQ audiocaps A train honks horn and passes by /home/tiger/nfs/data/audiocaps/test_16k/YelztUCeNQvQ.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YelztUCeNQvQ_mel.npy @ -Y4Ujigme2IxY audiocaps Vehicle engine running then a high whistle /home/tiger/nfs/data/audiocaps/test_16k/Y4Ujigme2IxY.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y4Ujigme2IxY_mel.npy @ -YAizmnCDlXos audiocaps A steady ringing with the tick took of a clock /home/tiger/nfs/data/audiocaps/test_16k/YAizmnCDlXos.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YAizmnCDlXos_mel.npy @ -YZBtgrP4vU_w audiocaps Sizzling and crackling are occurring /home/tiger/nfs/data/audiocaps/test_16k/YZBtgrP4vU_w.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YZBtgrP4vU_w_mel.npy @ -Y3qrVku794u0 audiocaps A man talking before and after a young kid talks as plastic rattles followed by an electronic beep /home/tiger/nfs/data/audiocaps/test_16k/Y3qrVku794u0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y3qrVku794u0_mel.npy @@@@ -YMPLZUg89y5U audiocaps Male speaking with rustling in the background /home/tiger/nfs/data/audiocaps/test_16k/YMPLZUg89y5U.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YMPLZUg89y5U_mel.npy @ -Y2t82STv2GR8 audiocaps Bells are ringing with echo repeatedly /home/tiger/nfs/data/audiocaps/test_16k/Y2t82STv2GR8.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y2t82STv2GR8_mel.npy @ -Yzq00Oe1ecpE audiocaps A bus engine driving then slowing down before accelerating /home/tiger/nfs/data/audiocaps/test_16k/Yzq00Oe1ecpE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yzq00Oe1ecpE_mel.npy @@ -Yktc_tJxw8sc audiocaps An infant crying /home/tiger/nfs/data/audiocaps/test_16k/Yktc_tJxw8sc.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yktc_tJxw8sc_mel.npy -YpI_kPedctoo audiocaps Motorcycle engines running and revving as a man talks in the background /home/tiger/nfs/data/audiocaps/test_16k/YpI_kPedctoo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YpI_kPedctoo_mel.npy @@ -Yy93cZqNCtks audiocaps A series of gunshots followed by a man speaking and footsteps running proceeded by more gunshots firing and a dog growling /home/tiger/nfs/data/audiocaps/test_16k/Yy93cZqNCtks.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yy93cZqNCtks_mel.npy @@@@ -YdP5DbAzTl5M audiocaps A motorboat engine running as a man talks followed by wind blowing into a microphone and plastic clacking /home/tiger/nfs/data/audiocaps/test_16k/YdP5DbAzTl5M.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YdP5DbAzTl5M_mel.npy @@@ -YLWng-4PDzPM audiocaps Instrumental music playing followed by heavy fabric being rustled then a man whistling /home/tiger/nfs/data/audiocaps/test_16k/YLWng-4PDzPM.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YLWng-4PDzPM_mel.npy @@ -YIvfaKPDWC00 audiocaps Emergency sirens wailing as a vehicle accelerates in the distance /home/tiger/nfs/data/audiocaps/test_16k/YIvfaKPDWC00.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YIvfaKPDWC00_mel.npy @ -Y-nQHwrRLfc0 audiocaps Chainsaw being run /home/tiger/nfs/data/audiocaps/test_16k/Y-nQHwrRLfc0.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y-nQHwrRLfc0_mel.npy -Y1e98HeU9Vrg audiocaps Waves and wind rake a shore /home/tiger/nfs/data/audiocaps/test_16k/Y1e98HeU9Vrg.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1e98HeU9Vrg_mel.npy @ -Y1slvoNgzBLE audiocaps A subway train signal plays followed by a bell chiming followed by a horn honking as a crowd of people talk in the background /home/tiger/nfs/data/audiocaps/test_16k/Y1slvoNgzBLE.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Y1slvoNgzBLE_mel.npy @@@@ -YfrOqlk0Wm5Y audiocaps A man talking as metal clacks followed by metal scrapping against a metal surface /home/tiger/nfs/data/audiocaps/test_16k/YfrOqlk0Wm5Y.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YfrOqlk0Wm5Y_mel.npy @@ -YJZloTOdIY_c audiocaps Horses growl and clop hooves /home/tiger/nfs/data/audiocaps/test_16k/YJZloTOdIY_c.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YJZloTOdIY_c_mel.npy @ -YIJ6pm5Kns8A audiocaps A woman speaks, then a phone chimes, then there is a burp followed by laughter /home/tiger/nfs/data/audiocaps/test_16k/YIJ6pm5Kns8A.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/YIJ6pm5Kns8A_mel.npy @@@ -Yh0M4RS8p_mo audiocaps Audio static followed by a man laughing before an electronic device motor slides then an infant cries /home/tiger/nfs/data/audiocaps/test_16k/Yh0M4RS8p_mo.wav /home/tiger/nfs/data/audiocaps/features/test/melspec_pad_16000hz/Yh0M4RS8p_mo_mel.npy