Sucial
/

Dereverb-Echo_Mel_Band_Roformer

Model card Files Files and versions Community

Sucial commited on Dec 27, 2024

Commit

4d84731

verified ·

1 Parent(s): d0c5fc0

Update README.md

Browse files

Files changed (1) hide show

README.md +42 -2

README.md CHANGED Viewed

@@ -4,8 +4,8 @@ license: cc-by-nc-sa-4.0
 ### Description
-This model is used to separate reverb and delay effects in vocals. In addition, it can also separate partial harmony, but it cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so the model's handling of high frequencies is not particularly aggressive.<br>
-You can try listening to the performance of this model [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/examples)!
 ### How to use the model?
@@ -13,6 +13,46 @@ Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTu
 ### Model
 ### V2 Models
 Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>

 ### Description
+These models are used to separate reverb and delay effects in vocals. In addition, they can also separate partial harmony, but they cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so these model's handling of high frequencies is not particularly aggressive.<br>
+You can try listening to the performance of these models [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/example)!
 ### How to use the model?
 ### Model
+### Fused Models
+I used [a model fusion script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) to fuse three models with the same model structure. The three models and their corresponding fusion ratios are as follows:<br>
+**0.5 * dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt + 0.25 * de_big_reverb_mbr_ep_362.ckpt + 0.25 * de_super_big_reverb_mbr_ep_346.ckpt**<br>
+Therefore, the fused model has the ability to remove both small and large reverberations simultaneously. However, I did not carefully adjust the fusion ratio of each model. If any experts are willing to help me adjust it carefully, I would be very grateful!
+config: the same as v2 models and big reverb models: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
+fused_model: [dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt](./dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt)
+### Big reverb Models
+There are two models for removing large reverberation in total: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt) and [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt). In general, for large reverberations, using the `de_big_reverb_mbr` model is sufficient. The `de_super_big_reverb_mbr` model is trained for extremely large reverberations and is generally less commonly used. The configuration files of these two models and the v2 model share the same configuration file. And they are all finetuned from `dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt`.
+config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
+Model_de_big_reverb: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt)<br>
+Model_de_super_big_reverb: [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt)
+In order to better validate the model's performance, I have added two indicators, `f0_fitness` and `uv_fitness`, as follows:<br>
+Calculate the F0 and voiced/unvoiced (UV) fitness between a reference and an estimated audio signal. These two metrics are only of reference value for vocals.<br>
+The F0 fitness measures how similar the fundamental frequency (F0) of the reference and estimated signals are, while the UV fitness evaluates the accuracy of voiced/unvoiced detection between the two signals. Both are computed by extracting F0 and UV information using pitch analysis and then calculating the Pearson correlation between the corresponding F0 and UV sequences. The F0 fitness can also be used to compare the completeness of the extracted fundamental frequency (F0) for human voice signals. The values of these two metrics are both -1 to 1, and the closer the value is to 1, the better the fit.
+For these two models, I used different validation sets for verification (so SDR has no practical reference significance), and the validation results are as follows:
+```
+de_big_reverb_mbr_ep_362.ckpt
+Num overlap: 2
+Instr dry sdr: 14.0030 (Std: 2.9492)
+Instr dry bleedless: 43.6501 (Std: 10.1362)
+Instr dry fullness: 21.7776 (Std: 5.9445)
+Instr dry f0_fitness: 0.8405 (Std: 0.1520)
+Instr dry uv_fitness: 0.9759 (Std: 0.0162)
+de_super_big_reverb_mbr_ep_346.ckpt
+Num overlap: 2
+Instr dry sdr: 11.3164 (Std: 2.4877)
+Instr dry bleedless: 43.3989 (Std: 10.7918)
+Instr dry fullness: 17.5554 (Std: 4.0178)
+Instr dry f0_fitness: 0.7845 (Std: 0.1864)
+Instr dry uv_fitness: 0.9662 (Std: 0.0172)
+```
 ### V2 Models
 Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>