Speech-to-text for Vietnamese language recognition with DSP & FPGA applications
Abstract
This thesis investigates the use of two methods - Hidden Markov Model (HMM) and
feed-forward multi-layer perceptrons trained by back-propagation - in Vietnamese
speech-to-text recognition. Besides this, the thesis also proposes an automatic technique
for both training and recognition. The use of HMM and neural networks for speaker
independent isolated word recognition on small vocabularies is studied. Mel-scale
Frequency Cepstral Coefficient (MFCC) has been applied to extract speech signal
features. Since the neural network recognizer must have fixed number of input, here we propose a simple method to solve the variable size of the feature vector of an isolated word into a constant size. Features are used to train the recognition system. The same routine is applied to the speech signal during the recognition stage and unknown test patterns are classified to the nearest patterns. The analysis, design and development of the system are prototyped and tested using MATLAB, before being implemented on DSP (TMS320C6711) and FPGA (Virtex II Pro), in which an isolated word speaker independent recognizer is developed.