Before working on any NNs, I wanted to go back over the articles I've posted on this blog to see how they help shape my topic and have a nice little reference post to go back to instead of jumping around many different blog posts.
Neural Networks Used for Speech Recognition
- Pre-processing audio data
- Spectrograms
- Spectrograms are useful visual tools to analyze audio data.
- Plots frequency over time, with the z-axis (color) representing magnitude
- Not as efficient when audio is said at different times, longer or shorter, etc.
- Cepstrum & Mel Frequency Cepstrum Coefficients
- Is the Fourier transform of a spectrogram
- Words will have the same shape in the transform.
- Uses fewer data points than a spectrogram
- Take Away
- Maybe look more into Cepstrum usage in speech recognition and if it's supported in Tensorflow/PocketSphinx
- Preprocessining is important for training
- Training
- Used Matlab's Neural Network tollbox
- Used 100 samples for training and 100 samples for testing
- Trained using a Multilayer Forward Feedback using Back Propagation algorithm
- Also features the Oja Rule of Thumb for figuring out how many hidden layers may be needed
- Hidden layers = (Training set size) / (5*(input layers + output layers))
- Spectrograms
- Spectrograms are useful visual tools to analyze audio data.
- Plots frequency over time, with the z-axis (color) representing magnitude
- Not as efficient when audio is said at different times, longer or shorter, etc.
- Cepstrum & Mel Frequency Cepstrum Coefficients
- Is the Fourier transform of a spectrogram
- Words will have the same shape in the transform.
- Uses fewer data points than a spectrogram
- Take Away
- Maybe look more into Cepstrum usage in speech recognition and if it's supported in Tensorflow/PocketSphinx
- Preprocessining is important for training
- Used Matlab's Neural Network tollbox
- Used 100 samples for training and 100 samples for testing
- Trained using a Multilayer Forward Feedback using Back Propagation algorithm
- Hidden layers = (Training set size) / (5*(input layers + output layers))
Comments
Post a Comment