Open Access

Authors: R. Madana Mohana , A. Rama Mohan Reddy

PDFPDF

Abstract: Spoken Language Identification (SLId) is the process of identifying the language of an utterance from an anonymous speaker, irrespective of gender, pronunciation and accent. In this paper we present acoustics based learning model for spoken language identification. An acoustic feature representing the short term power spectrum of sound called Mel Frequency Cepstral Coefficients (MFCC) is used as a part of the investigation in this paper. The proposed system uses a combination of Gaussian Mixture Model (GMM) and the Support Vector Machines (SVM) to handle the problem of multi class classification. The model aims at detecting English, Japanese, French, Hindi, and Telugu. A speech corpus was built using speech samples obtained from a plethora of online podcasts and audio books. This corpus comprised of utterances spanning over a uniform duration of 10 seconds. Preliminary results indicate an overall accuracy of 96%. A more comprehensive and rigorous test indicates an overall accuracy of 80%. The acoustic model combined with learning techniques hence proposed proves to be a viable approach for Language Identification.

Keywords: MFCC, Language Identification, SVM, GMM, LongRun technique.

Cite this paper

R. Madana Mohana, A. Rama Mohan Reddy. (2017) SLID: Hybrid Learning Model and Acoustic Approach to Spoken Language Identification using Machine Learning. International Journal of Signal Processing, 2 , 183-195

Creative Commons

Copyright © 2017 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0