Speech Recognition

Multimodal Grounding for Sequence-to-sequence Speech Recognition

Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or to recall …