LipNet – The most accurate lip reading software

 Researchers from an Oxford university have created the most accurate lip reading software - LipNet.

The software achieved an accuracy of 93,4%, compared to only 52% the maximum achieved by an expert researcher in the field.

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separate the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end.

The scientific explanation is quite difficult for us, "ordinary mortals", to understand, but it is certain that LipNet could serve as an extraordinary tool for people with hearing impairments. The software does not analyze the recording word by word, but the entire sentence. It uses the Deep Learning system to decipher each individual word. Even if people with disabilities already know how to read lips, it could help increase their understanding of those around them. Thus, those without lip-reading skills would no longer have problems interacting with a person who does not know sign language.