Skip to Main content Skip to Navigation
Conference papers

Investigating Self-supervised Pre-training for End-to-end Speech Translation

Abstract : Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predic-tive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that self-supervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language. Index Terms: self-supervised learning from speech, automatic speech translation, end-to-end models, low resource settings.
Document type :
Conference papers
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02962186
Contributor : Laurent Besacier <>
Submitted on : Friday, October 9, 2020 - 9:01:11 AM
Last modification on : Tuesday, November 24, 2020 - 4:00:18 PM

File

Paper_Template_for_INTERSPEECH...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02962186, version 1

Citation

Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Yannick Estève, Laurent Besacier. Investigating Self-supervised Pre-training for End-to-end Speech Translation. Interspeech 2020, Oct 2020, Shangai (Virtual Conf), China. ⟨hal-02962186⟩

Share

Metrics

Record views

48

Files downloads

71