Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time ...
Abstract: With the emergence of transformer-based feature extractors, the effect of image quality assessment (IQA) has improved, but its interpretability is limited. In addition, images repaired by ...