pretrained model EPT-MoE
Browse filesThe Mixture-of-Experts (MoE) is a widely known deep neural architecture where an ensemble of specialized sub-models (a group of experts) optimizes the overall performance with a constant computational cost. Especially with the rise of Mixture-of-Experts with Mixtral-8x7B Transformers, MoE architectures have gained popularity in Large Language Modeling (LLM) and Computer Vision. In this paper, we propose the Efficient Parallel Transformers of Mixture-of-Experts (EPT-MoE) coupled with Spatial Feed Forward Neural Networks (SFFN) to enhance the ability of parallel Transformer models with Mixture-of-Experts layers for graph learning of 3D skeleton-data hand gesture recognition. Nowadays, 3D hand gesture recognition is an attractive field of research in human-computer interaction, VR/AR and pattern recognition. For this purpose, our proposed EPT-MoE model decouples the spatial and temporal graph learning of 3D hand gestures by integrating mixture-of-experts layers into parallel Transformer models. The main idea is to combine the powerful layers of mixture-of-experts that process the initial spatial features of intra-frame interactions to extract powerful features from different hand joints, and then, to recognize 3D hand gestures within the parallel Transformer encoders with layers of Mixture-of-Experts. Finally, we conduct extensive experiments on benchmarks of the SHREC'17 Track dataset in order to evaluate the performance of EPT-MoE model variations. EPT-MoE greatly improves the overall performance, the training stability and reduces the computational cost. The experimental results show the efficiency of several variants of the proposed model (EPT-MoE), which achieves or outperforms the state-of-the-art.
- modelEPTMoE.ckpt +3 -0
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:da0eb9671b16bd0e7b45e0a7d4f0224c5aafba88f51ce26293424fe695e26cac
|
3 |
+
size 85295518
|