Bubbles 😄

Deep-Learning and Speech

Machine learning and speech recognition

Machine learning and speech recognition has advanced significantly in the last few years particularly in well-documented and widely spoken languages such as English and Standard Chinese. Most ASR (Automatic Speech Recognition) Systems make use of Hidden Markov Models or Dynamic Time Warping techniques, but in recent times Deep Learning based systems are getting far lower word error rates and producing sound results.

Deep Learning (From Wikipedia)

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised.

Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases superior to human experts.

Deep learning models are vaguely inspired by information processing and communication patterns in biological nervous systems yet have various differences from the structural and functional properties of biological brains (especially human brains), which make them incompatible with neuroscience evidences.

What are some of the areas in which ASR supported by Deep Learning can improve?

  1. Recognition in understudied, under documented and endangered languages.
  2. Prosody of speech in general, across languages
  3. The recognition of emotion.