Precision Varying Prediction - Robustifying ASR systems against adversarial attacks

A selection of benign and adversarial data employed in our experiments.

This project is maintained by blindconf

Adversarial example demo

Supplementary material containing a selection of benign, adversarial, and noisy data employed in our [paper].

For each sample, we include the word error rate (WER) as an accuracy metric and the segmental signal-to-noise ratio (SNRseg) as a quality noise metric. An SNRseg exceeding 0 dB indicates a stronger signal presence compared to noise. These samples are sourced from the Librispeech corpus dataset.

Librispeech

Sample 1 - FP32
Benign transcription:       ROBIN FITZOOTH
Adversarial transcription:  AND ONE MORE THIS MORNING

  [Benign: WER=0.00],               

[C&W adversarial: WER=0.00, SNRseg=24.50],   [Psychoacoustic adversarial: WER=0.00, SNRseg=25.36]

Sample 2 - FP16
Benign transcription:       WILL YOU FORGIVE ME NOW
Adversarial transcription:  PAUL STICKS TO HIS THEME

  [benign: WER=0.00],               

[C&W adversarial: WER=0.00, SNRseg=22.04],   [Psychoacoustic adversarial: WER=0.00, SNRseg=22.95]

Sample 3 - BF16
Benign transcription:       IT WILL BE NO DISAPPOINTMENT TO ME
Adversarial transcription:  AH VERY WELL

  [benign: WER=0.00],              

[C&W adversarial: WER=0.00, SNRseg=8.86],   [Psychoacoustic adversarial: WER=0.00, SNRseg=11.23]