Paper Anaylsis

Before working on any NNs, I wanted to go back over the articles I've posted on this blog to see how they help shape my topic and have a nice little reference post to go back to instead of jumping around many different blog posts.

Neural Networks Used for Speech Recognition

Pre-processing audio data

Spectrograms

Spectrograms are useful visual tools to analyze audio data.

Plots frequency over time, with the z-axis (color) representing magnitude

Not as efficient when audio is said at different times, longer or shorter, etc.

Cepstrum & Mel Frequency Cepstrum Coefficients

Is the Fourier transform of a spectrogram

Words will have the same shape in the transform.

Uses fewer data points than a spectrogram

Take Away

Maybe look more into Cepstrum usage in speech recognition and if it's supported in Tensorflow/PocketSphinx

Preprocessining is important for training

Training

Used Matlab's Neural Network tollbox

Used 100 samples for training and 100 samples for testing

Trained using a Multilayer Forward Feedback using Back Propagation algorithm

Also features the Oja Rule of Thumb for figuring out how many hidden layers may be needed

Hidden layers = (Training set size) / (5*(input layers + output layers))

Thesis Blog

Search This Blog

Paper Anaylsis

Neural Networks Used for Speech Recognition

Comments

Post a Comment