Abstract: Recently, transformer based model has been widely employed for audio classification due to its capability of modelling long feature dependencies in audio representations such as spectrograms ...