Deep Learning For Speech Analysis And Evaluation
Abstract
For speech analysis, the process involves learning and recognizing the various features
of an audio clip that can be used to analyze a language. This procedure works by converting
audio files to spectrograms and then applying a neural network to learn and recognize the
various features of the audio.
The main objective of this thesis is to analyze the languages out of the various speakers
that were recorded in the Mozilla’s Common Voice1 and VIVOS Corpus2 dataset. The
recordings were analyzed by recording 10 seconds of each utterance. The datasets are then
split into training and test sets. The results of these tests reveal an overall accuracy of 99%.