Synthetic Audio Example Demo

This page presents supplementary material with a selection of synthetic audio samples generated by the models used in our experiments (paper).
Real audio samples are derived from the LJ Speech Dataset.

Experiment Summary

We investigate open-world single-model attribution using Residual Statistical Fingerprints (RSFs).
RSFs achieve near-perfect AUROC (≈1.0) in distinguishing target synthesis systems from unseen generative models and real speech, demonstrating strong generalization.

Under realistic audio perturbations — such as noise, echo, and compression — RSFs maintain high attribution accuracy. When perturbations are severe, performance can be effectively restored through simple data augmentation during RSF construction.

Real Audio Sample: LJ001-0001.wav

 Transcription:       Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition.

Synthetic Audio Samples:

MelGAN Large:

Avocodo:

Big V GAN:

HiFi GAN:

Multi-band MelGAN:

Parallel Wave GAN:

Wave Glow:

Harmonic Noise source Filter:

Fast Diff:

Pro Diff:

Audio Corruption Effects:

Echo Effect (strength 0.5 and 500ms delay):

Background Noise (Avg. SNR of 17.96 dB):

Reverberation:

MP3 Compression:

Real Audio Sample: LJ001-0002.wav

 Transcription:       In being comparatively modern.