Speech Recognition Test

Set Up and Issues

For this week, I tried out a very simple Python code to allow the program to record audio from my microphone and convert the speech to text. I didn't want to try to attach a neural network to this because I have having more issues getting the speech recognition to work, so I focused purely on that.

I decided to use the SpeechRecognition package because it allows for microphone input and reading prerecorded audio files and can transcribe them. It doesn't have a CUDA dependency and only requires the PyAudio package to function for microphone input. There are many features that I would like to play around with in the future and it actually worked for me unlike TensorFlow!

But once again, it was surprisingly difficult to get the SpeechRecognition package to work in the virtual environment I had set up. One thing to note for the future, since I use Anaconda to create my envs and work in Python, I do have to double check Anaconda's specialized code for installing SpeechRecognition and PyAudio. Just using regular "conda install (x)" code for some reason doesn't entirely work. The two packages won't recognize each other this way even if they're in the same environment.

So for the future, the specific codes for those two are here:
conda install -c conda-forge speechrecognition
conda install -c anaconda pyaudio

The other issue is that even if they are properly installed, in IDEs such as PyCharm or Visual Studio Code, the SpeechRecognition, PyAudio and portaudio(which comes with PyAudio) packages still won't recognize each other even if I'm using the right interpreter and environment. I'm not sure what's going on there, but it worked fine in the simple Anaconda terminal, so I stuck with that for this test. I imagine the IDEs just weren't actually using the right interpreter somehow. I'll have to troubleshoot that at a later point because once again it gave me issues for days to solve.

Code

Once again, I stuck to something incredibly simple since the set up itself gave me so much difficulty. However, I actually got it to work, so that's something for the future!

So I used a very simple code:

Which basically sets up the Microphone to automatically record when it hears speech. SpeechRecognition's Recognizer can be set up to only record for a period of time, or have a certain threshold of what will be recorded if there's background noise. Those are option for the future I should consider if other issues pop up. But for now, I have it set to the default, where it will record for as long as it hears audio and then will end the recording after a few seconds of silence.

The second part of the code I used:

This part analyzed the audio using Google's Speech Recognition library, which requires an internet connection and then converts the speech to text. If I want to eventually use an offline library, PocketSphinx is available, but would have to be present on my own computer.

If the speech could be analyzed, Python would then quickly print the result back out, but it if couldn't, it would print an error message.

Results

The results were a success although because I was using the basic Anaconda terminal, I had to copy and paste the code constantly because I would get errors running a .py file. I think I have to double check to make sure I have the right virtual environment linked again, but I've been consistently getting errors across multiple IDEs so who's to say really.

I think I could have also pasted the entire code to get it to run, but I took it piece by piece for the recording process.

Anyway, the code itself was pretty simple. I think I could find a way to implement this package into a neural network. But I believe for next week, I would like to look into coding a neural network with hidden layers. I think progressing step by step is the way to go considering the multitude of troubleshooting problems I have to combat just to get to this point. What a troublesome journey, really/

Thesis Blog

Search This Blog