Self-supervised Perturbation-Invariant Representation Learning for speech pre-training. Learns denoising representations of perturbed data in a teacher-student framework. Achieves competitive or better results than wav2vec 2.0 with 80% training cost reduction for BASE and 65% for LARGE models. Published at ICLR 2022.

Outputs 2

SPIRAL

model

Variants

Name Parameters Notes
SPIRAL-base
SPIRAL-Large

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

paper

arXiv: 2201.10207

audioself-supervisedtrainingopen-source