Paper Anaylsis

Before working on any NNs, I wanted to go back over the articles I've posted on this blog to see how they help shape my topic and have a nice little reference post to go back to instead of jumping around many different blog posts.

Neural Networks Used for Speech Recognition

  • Pre-processing audio data
    • Spectrograms

      • Spectrograms are useful visual tools to analyze audio data.
      • Plots frequency over time, with the z-axis (color) representing magnitude
      • Not as efficient when audio is said at different times, longer or shorter, etc. 
    • Cepstrum & Mel Frequency Cepstrum Coefficients

      • Is the Fourier transform of a spectrogram
      • Words will have the same shape in the transform.
      • Uses fewer data points than a spectrogram
    • Take Away
      • Maybe look more into Cepstrum usage in speech recognition and if it's supported in Tensorflow/PocketSphinx
      • Preprocessining is important for training
  • Training
    • Used Matlab's Neural Network tollbox
    • Used 100 samples for training and 100 samples for testing
    • Trained using a Multilayer Forward Feedback using Back Propagation algorithm
  • Also features the Oja Rule of Thumb for figuring out how many hidden layers may be needed
    • Hidden layers = (Training set size) / (5*(input layers + output layers))



Comments