Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.6.0
faiss tuning TIPS
about faiss
faiss is a library of neighborhood searches for dense vectors, developed by facebook research, which efficiently implements many approximate neighborhood search methods. Approximate Neighbor Search finds similar vectors quickly while sacrificing some accuracy.
faiss in RVC
In RVC, for the embedding of features converted by HuBERT, we search for embeddings similar to the embedding generated from the training data and mix them to achieve a conversion that is closer to the original speech. However, since this search takes time if performed naively, high-speed conversion is realized by using approximate neighborhood search.
implementation overview
In '/logs/your-experiment/3_feature256' where the model is located, features extracted by HuBERT from each voice data are located. From here we read the npy files in order sorted by filename and concatenate the vectors to create big_npy. (This vector has shape [N, 256].) After saving big_npy as /logs/your-experiment/total_fea.npy, train it with faiss.
In this article, I will explain the meaning of these parameters.
Explanation of the method
index factory
An index factory is a unique faiss notation that expresses a pipeline that connects multiple approximate neighborhood search methods as a string. This allows you to try various approximate neighborhood search methods simply by changing the index factory string. In RVC it is used like this:
index = faiss.index_factory(256, "IVF%s,Flat" % n_ivf)
Among the arguments of index_factory, the first is the number of dimensions of the vector, the second is the index factory string, and the third is the distance to use.
For more detailed notation https://github.com/facebookresearch/faiss/wiki/The-index-factory
index for distance
There are two typical indexes used as similarity of embedding as follows.
- Euclidean distance (METRIC_L2)
- inner product (METRIC_INNER_PRODUCT)
Euclidean distance takes the squared difference in each dimension, sums the differences in all dimensions, and then takes the square root. This is the same as the distance in 2D and 3D that we use on a daily basis. The inner product is not used as an index of similarity as it is, and the cosine similarity that takes the inner product after being normalized by the L2 norm is generally used.
Which is better depends on the case, but cosine similarity is often used in embedding obtained by word2vec and similar image retrieval models learned by ArcFace. If you want to do l2 normalization on vector X with numpy, you can do it with the following code with eps small enough to avoid 0 division.
X_normed = X / np.maximum(eps, np.linalg.norm(X, ord=2, axis=-1, keepdims=True))
Also, for the index factory, you can change the distance index used for calculation by choosing the value to pass as the third argument.
index = faiss.index_factory(dimention, text, faiss.METRIC_INNER_PRODUCT)
IVF
IVF (Inverted file indexes) is an algorithm similar to the inverted index in full-text search. During learning, the search target is clustered with kmeans, and Voronoi partitioning is performed using the cluster center. Each data point is assigned a cluster, so we create a dictionary that looks up the data points from the clusters.
For example, if clusters are assigned as follows
index | Cluster |
---|---|
1 | A |
2 | B |
3 | A |
4 | C |
5 | B |
The resulting inverted index looks like this:
cluster | index |
---|---|
A | 1, 3 |
B | 2, 5 |
C | 4 |
When searching, we first search n_probe clusters from the clusters, and then calculate the distances for the data points belonging to each cluster.
recommend parameter
There are official guidelines on how to choose an index, so I will explain accordingly. https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
For datasets below 1M, 4bit-PQ is the most efficient method available in faiss as of April 2023. Combining this with IVF, narrowing down the candidates with 4bit-PQ, and finally recalculating the distance with an accurate index can be described by using the following index factory.
index = faiss.index_factory(256, "IVF1024,PQ128x4fs,RFlat")
Recommended parameters for IVF
Consider the case of too many IVFs. For example, if coarse quantization by IVF is performed for the number of data, this is the same as a naive exhaustive search and is inefficient. For 1M or less, IVF values are recommended between 4sqrt(N) ~ 16sqrt(N) for N number of data points.
Since the calculation time increases in proportion to the number of n_probes, please consult with the accuracy and choose appropriately. Personally, I don't think RVC needs that much accuracy, so n_probe = 1 is fine.
FastScan
FastScan is a method that enables high-speed approximation of distances by Cartesian product quantization by performing them in registers. Cartesian product quantization performs clustering independently for each d dimension (usually d = 2) during learning, calculates the distance between clusters in advance, and creates a lookup table. At the time of prediction, the distance of each dimension can be calculated in O(1) by looking at the lookup table. So the number you specify after PQ usually specifies half the dimension of the vector.
For a more detailed description of FastScan, please refer to the official documentation. https://github.com/facebookresearch/faiss/wiki/Fast-accumulation-of-PQ-and-AQ-codes-(FastScan)
RFlat
RFlat is an instruction to recalculate the rough distance calculated by FastScan with the exact distance specified by the third argument of index factory. When getting k neighbors, k*k_factor points are recalculated.