Evaluation

Install packages for evaluation:

pip install -e .[eval]

Generating Samples for Evaluation

Prepare Test Datasets

Seed-TTS testset: Download from seed-tts-eval.
LibriSpeech test-clean: Download from OpenSLR.
Unzip the downloaded datasets and place them in the data/ directory.
Update the path for LibriSpeech test-clean data in src/f5_tts/eval/eval_infer_batch.py
Our filtered LibriSpeech-PC 4-10s subset: data/librispeech_pc_test_clean_cross_sentence.lst

Batch Inference for Test Set

To run batch inference for evaluations, execute the following commands:

# batch inference for evaluations
accelerate config  # if not set before
bash src/f5_tts/eval/eval_infer_batch.sh

Objective Evaluation on Generated Results

Download Evaluation Model Checkpoints

Chinese ASR Model: Paraformer-zh
English ASR Model: Faster-Whisper
WavLM Model: Download from Google Drive.

Then update in the following scripts with the paths you put evaluation model ckpts to.

Objective Evaluation

Update the path with your batch-inferenced results, and carry out WER / SIM / UTMOS evaluations:

# Evaluation [WER] for Seed-TTS test [ZH] set
python src/f5_tts/eval/eval_seedtts_testset.py --eval_task wer --lang zh --gen_wav_dir <GEN_WAV_DIR> --gpu_nums 8

# Evaluation [SIM] for LibriSpeech-PC test-clean (cross-sentence)
python src/f5_tts/eval/eval_librispeech_test_clean.py --eval_task sim --gen_wav_dir <GEN_WAV_DIR> --librispeech_test_clean_path <TEST_CLEAN_PATH>

# Evaluation [UTMOS]. --ext: Audio extension
python src/f5_tts/eval/eval_utmos.py --audio_dir <WAV_DIR> --ext wav