So I was a little reluctant to change my plans to do a neural net that could both take in audio input and read intent because researching into multilayer networks looked to be quite the undertaking. But turns out I was just studying a little too broad and I needed to think of a more specific NN for my needs. I had come across all sorts of tutorials that covered more of how to do speech recognition from scratch using neural networks, which isn't what I was after. But then I came across something tangentially related that actually worked really well with that I wanted to do:
Chat bots.
Chat bots also are machine learning programs that have to determine the intent of the speaker and provide a proper response, right? So why don't I use a similar type of code to just figure out the emotions of the speaker?
So I found this tutorial that covers Text Classification and reading the intent of a sentence which is just what I wanted.
Chat bots.
Chat bots also are machine learning programs that have to determine the intent of the speaker and provide a proper response, right? So why don't I use a similar type of code to just figure out the emotions of the speaker?
So I found this tutorial that covers Text Classification and reading the intent of a sentence which is just what I wanted.
Problem To Solve
Can a NN determine emotion/intent? I decided to do a simple test by asking "How was your day?" and compiling a set of answers mapped to certain emotions and seeing of the NN could determine the correct emotion.
The code I set up is a single hidden layer neural network that uses a "bag of word" approach for determining intent. Core problems with this method:
- It can only use what is labeled within a class. It cannot learn what is not explicitly labeled. I wanted to do something initially that was recognize keywords, but have the rest of the sentence be considered "neutral". This wasn't possible with this method.
- Very large training sets can distort the output if the classifications are uneven. I ran into this problem and will explain shortly.
Coding
I think since last week I resolved my main issue with install python packages so that's cool. I've decided to switch to using Jupyter Notebook/iPython Notebook since it's very clean and I did test my very first neural network on it.
The tutorial I linked provided a very helpful code I could repurpose for this problem. It uses the natural language toolkit to stem words to their core parts so they can be classified and analyzed easier.
Training data is compiled as sentences separated into emotion classes.
The sentences (documents) are then run through the stemmer so that all variations of a word are considered for the analysis.
The words are then transformed into an array that the neural network can work with. And then output the class the sentence belongs to (also listed as an array).
This time, the amount of times I trained the neural network was around 100000 times, but the percent loss was still 1.97%. I did a testing amount of 500000, but percent loss was still at 1.90% So the amount of times the NN will be incorrect is still fairly high, but my data set is small so I feel like with more thought placed into the data set, the fewer errors there will be.
Then I tested the data with text only:
It's clear that multiple classifications are occurring. I had to fine tune the data to get it to this point. I found "key phrases" worked out the best and removing superfluous words is vital. EX. " my day went so well!" in the training data could make "my day went so poorly" read as happy because of such similar word order, even if the key word "poorly" triggers some sad classification.
I also reused the Speech Recognition code I worked with last week to take my microphone input and feed it through the classifier after all the testing was done.
Results
Here are my results with the live speech recognition and the class recognition.Issues
The main thing I had to do was come up with a data set that would produce the proper results. This code can recognize sentences and try to figure out what the intent is, but the original idea was to see if the neural network can recognize intent through words alone:
The answer is no. When I used a training set that was just "good", "great", "mad", "upset". Errors like this would occur.
So!
I had to come up with a data set of sentences that could be used to help even this out. Even using phrases like "not good" or "bad news" could produce confused results because of how "bag of words" coding works. So I have to be specific in my keyword usage.
I had to drop words like "very" or "really" in the training data and focus on the core of the sentence that would decide the emotion or intent of it. Words that were shared among classes could trigger both classes, but this also didn't happen all the time.
Having a full sentence like "my day went great"! could also trigger the sentence "my day went awful!" to read as happy if "my day went" is only included in the happy classification. So superfluous words have to be removed. Verbs can remain as long as they're important, but if they're universally used they should also be in every classification.
Conclusion
This probably isn't the best way to go about this, but it also does respond fairly well with the right training data. It would take a ton of data to get this to apply to more than one question which may be the real issue for it. However I could see this being a way I approach the topic if only to compare it to another method in the future.
Comments
Post a Comment