Uberduck/base_no-no · Hugging Face

Warmstarted from the "Chills" single-speaker male model (not available on HF as of right now), then trained for 25 (de facto 50) epochs. Batch size 16, learning rate (√2)e-3 for the first 15(?) epochs and (5√2)e-4 for the next 10.

Dataset: NST Norwegian Speech Synthesis (CC0), augmented like so:

Make a copy of the dataset.
Join the two shortest clips of the copy with 100ms of silence between them, then replace them with the joined version. Repeat until the shortest clip is at least 6 seconds long.
Shuffle the original together with the copy.