Sucial commited on
Commit
4d84731
·
verified ·
1 Parent(s): d0c5fc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -4,8 +4,8 @@ license: cc-by-nc-sa-4.0
4
 
5
  ### Description
6
 
7
- This model is used to separate reverb and delay effects in vocals. In addition, it can also separate partial harmony, but it cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so the model's handling of high frequencies is not particularly aggressive.<br>
8
- You can try listening to the performance of this model [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/examples)!
9
 
10
  ### How to use the model?
11
 
@@ -13,6 +13,46 @@ Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTu
13
 
14
  ### Model
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ### V2 Models
17
 
18
  Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
 
4
 
5
  ### Description
6
 
7
+ These models are used to separate reverb and delay effects in vocals. In addition, they can also separate partial harmony, but they cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so these model's handling of high frequencies is not particularly aggressive.<br>
8
+ You can try listening to the performance of these models [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/example)!
9
 
10
  ### How to use the model?
11
 
 
13
 
14
  ### Model
15
 
16
+ ### Fused Models
17
+
18
+ I used [a model fusion script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) to fuse three models with the same model structure. The three models and their corresponding fusion ratios are as follows:<br>
19
+ **0.5 * dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt + 0.25 * de_big_reverb_mbr_ep_362.ckpt + 0.25 * de_super_big_reverb_mbr_ep_346.ckpt**<br>
20
+ Therefore, the fused model has the ability to remove both small and large reverberations simultaneously. However, I did not carefully adjust the fusion ratio of each model. If any experts are willing to help me adjust it carefully, I would be very grateful!
21
+
22
+ config: the same as v2 models and big reverb models: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
23
+ fused_model: [dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt](./dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt)
24
+
25
+ ### Big reverb Models
26
+
27
+ There are two models for removing large reverberation in total: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt) and [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt). In general, for large reverberations, using the `de_big_reverb_mbr` model is sufficient. The `de_super_big_reverb_mbr` model is trained for extremely large reverberations and is generally less commonly used. The configuration files of these two models and the v2 model share the same configuration file. And they are all finetuned from `dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt`.
28
+
29
+ config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
30
+ Model_de_big_reverb: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt)<br>
31
+ Model_de_super_big_reverb: [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt)
32
+
33
+ In order to better validate the model's performance, I have added two indicators, `f0_fitness` and `uv_fitness`, as follows:<br>
34
+ Calculate the F0 and voiced/unvoiced (UV) fitness between a reference and an estimated audio signal. These two metrics are only of reference value for vocals.<br>
35
+ The F0 fitness measures how similar the fundamental frequency (F0) of the reference and estimated signals are, while the UV fitness evaluates the accuracy of voiced/unvoiced detection between the two signals. Both are computed by extracting F0 and UV information using pitch analysis and then calculating the Pearson correlation between the corresponding F0 and UV sequences. The F0 fitness can also be used to compare the completeness of the extracted fundamental frequency (F0) for human voice signals. The values of these two metrics are both -1 to 1, and the closer the value is to 1, the better the fit.
36
+
37
+ For these two models, I used different validation sets for verification (so SDR has no practical reference significance), and the validation results are as follows:
38
+ ```
39
+ de_big_reverb_mbr_ep_362.ckpt
40
+ Num overlap: 2
41
+ Instr dry sdr: 14.0030 (Std: 2.9492)
42
+ Instr dry bleedless: 43.6501 (Std: 10.1362)
43
+ Instr dry fullness: 21.7776 (Std: 5.9445)
44
+ Instr dry f0_fitness: 0.8405 (Std: 0.1520)
45
+ Instr dry uv_fitness: 0.9759 (Std: 0.0162)
46
+
47
+ de_super_big_reverb_mbr_ep_346.ckpt
48
+ Num overlap: 2
49
+ Instr dry sdr: 11.3164 (Std: 2.4877)
50
+ Instr dry bleedless: 43.3989 (Std: 10.7918)
51
+ Instr dry fullness: 17.5554 (Std: 4.0178)
52
+ Instr dry f0_fitness: 0.7845 (Std: 0.1864)
53
+ Instr dry uv_fitness: 0.9662 (Std: 0.0172)
54
+ ```
55
+
56
  ### V2 Models
57
 
58
  Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>