Relevant Articles and Answering Questions

What's your elevator one-liner?

I want to make a system that allows for real-time facial animation driven by audio input.

Where does this fit in?

My main goal is to allow for cooperative game player models to have expressive faces while using voice chat. This can also be applied to virtual reality settings, making communicating with other players more engaging.

What sources help to limit your research and determine your issue?

  1. Speech Driven Facial Animation with Temporal GANs (2018)
    Still no way to directly transform raw audio to video. Current methods only focus on the mouth and fail to affect the facial expression.
    Also a lack of method for use in real time applications.
    Uses real photos for video synthesis.
  2. Audio Driven Facial Animation by Joint End-to-End Learning (2017)
    Goals: To make a network architecture that can process speech over different speakers, that can determine facial expression through the training data that audio can't explain, and make a loss function that is temporally stable and responsive.
  3. Audio-Driven Animator-Centric Animation (2018)
    Maps audio data to visemes and uses JALI.
    Doesn't seem to be focused on full facial animation.
  4. Web-based Live Speech-Driven Lip-Sync (2016)
    Focus is on web-based lipsync, but is more speaker dependent with editable parameters to increase lip sync accuracy. Unfortunately doesn't go into full facial animation, but is heavily focused on the the live aspect.
From what I can gather, my research should now be centered in deep learning and neural networks to understand how to make this program..! Guess I have a lot of research to do.

I think my end game for this term should at least be to get or understand current lip sync methods before moving onto recognizing emotion and translation into facial animation.

Comments