Orken Mamyrbayev, Dina Oralbekova, Mohamed Othman, Tolganay Turdalykyzy, Bagashar Zhumazhanov, Kuralai Mukhsina
End-to-end models have come to the field of speech recognition, replacing traditional and hybrid ones. The basic principle of operation of modern end-to-end models is the generation of the output sequence from left to right, applying an autoregressive function during decoding. Until this time, it has not been proven that this decoding method is the best in text-to-speech technology. In addition, end-to-end models only consider previous information to predict the next output. This approach does not address the issue of speech conversion when the previous information was slurred. Thus, we began to apply the insertion method, which uses non-autoregressive generation of output data in random order. In this work, the model was trained on the basis of the insertion method and connectionist temporal classification for Kazakh speech recognition. The conducted experiments showed that this model improves the quality of Kazakh speech recognition.
automatic speech recognition, end-to-end, insertion-based, connectionist temporal classification, Transformer
Cite this paper
Orken Mamyrbayev, Dina Oralbekova, Mohamed Othman, Tolganay Turdalykyzy, Bagashar Zhumazhanov, Kuralai Mukhsina. (2022) Investigation of Insertion-based Speech Recognition Method. International Journal of Signal Processing, 7, 32-35