Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

AudioSeal is based on peer-reviewed research published at ICML 2024. This page provides information about the paper, citations, and related work.

Paper Details

Publication Information

Proactive Detection of Voice Cloning with Localized Watermarking

Authors: Robin San Roman, Pierre Fernandez, Hady Elsahar, Alexandre Défossez, Teddy Furon, Tuan TranConference: International Conference on Machine Learning (ICML) 2024Acceptance: May 31, 2024arXiv: 2401.17264

arXiv Paper

Read the full paper on arXiv

Project Webpage

Interactive demos and visualizations

Official Blog

Meta AI announcement and overview

Press Coverage

MIT Technology Review article

Abstract

AudioSeal introduces a novel audio watermarking technique using localized watermarking and a novel perceptual loss. The method jointly trains two components:
  1. Generator: Embeds an imperceptible watermark into audio
  2. Detector: Identifies watermark fragments in long or edited audio files

Key Innovations

AudioSeal performs watermarking at the sample level (1/16,000 of a second), enabling precise detection even in heavily edited audio. This localized approach allows identification of which specific segments of audio contain watermarks, making it robust against:
  • Audio splicing and editing
  • Concatenation with non-watermarked audio
  • Partial audio extraction
The model works well with multiple sampling rates including 16kHz, 24kHz, 44.1kHz, and 48kHz.
AudioSeal uses a novel perceptual loss function that ensures watermarks remain imperceptible to human listeners while maintaining detectability. The watermarking process:
  • Has minimal impact on audio quality
  • Preserves the naturalness of speech and music
  • Maintains audio fidelity across different content types
The model demonstrates state-of-the-art robustness against various audio manipulations:
  • Compression: MP3, AAC, Opus at various bitrates
  • Re-encoding: Multiple encode-decode cycles
  • Noise addition: Background noise, distortion
  • Re-sampling: Sample rate conversions
  • Speed changes: Time stretching and compression
  • Filtering: Low-pass, high-pass, band-pass filters
AudioSeal achieves detection speeds two orders of magnitude faster than existing models through:
  • Single-pass detection architecture
  • Efficient neural network design
  • Optimized inference pipeline
  • Real-time processing capabilities
This makes AudioSeal ideal for large-scale and real-time applications where millions of audio files need to be processed.

Citation

If you use AudioSeal in your research, please cite:
@article{sanroman2024proactive,
  title={Proactive Detection of Voice Cloning with Localized Watermarking},
  author={San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and D\'efossez, Alexandre and Furon, Teddy and Tran, Tuan},
  journal={ICML},
  year={2024}
}
Please use this citation format in academic papers, technical reports, and publications that build upon or evaluate AudioSeal.

Key Contributions

The paper makes several significant contributions to the field of audio watermarking:

1. Novel Architecture

  • First localized audio watermarking system operating at sample-level precision
  • Joint training of generator and detector for optimal performance
  • Efficient neural network design enabling real-time processing

2. Perceptual Loss Function

  • Custom loss function balancing imperceptibility and robustness
  • Multi-scale perceptual evaluation
  • Quality preservation across diverse audio content

3. Optional Message Embedding

  • Support for 16-bit secret messages (65,536 possible values)
  • Message embedding without affecting detection performance
  • Useful for model versioning and content tracking

4. Comprehensive Evaluation

  • Extensive robustness testing against common attacks
  • Comparison with state-of-the-art methods
  • Speed benchmarks demonstrating 100x improvement

5. Open Source Release

  • Full implementation released under MIT license
  • Pre-trained models on Hugging Face Hub
  • Training code and evaluation tools provided
The AudioSeal team has also developed other open-source watermarking solutions for different media types:

WMAR

Autoregressive watermarking for imagesAdvanced image watermarking using autoregressive models for imperceptible and robust watermark embedding.

Video Seal

Open and efficient video watermarkingExtend watermarking techniques to video content with temporal consistency and efficient processing.

WAM

Watermark Anything with LocalizationGeneral-purpose watermarking framework that can be applied to any image with localization capabilities.
These projects share similar design philosophies emphasizing robustness, imperceptibility, and open-source availability.

Use Cases

The research enables several practical applications:

Voice Cloning Detection

Proactively detect AI-generated voice clones by watermarking synthetic speech at generation time.

Content Authentication

Verify the authenticity of audio recordings by checking for watermarks embedded by trusted sources. Protect audio content from unauthorized distribution while maintaining audio quality.

Model Version Tracking

Embed model version information in generated audio for traceability and accountability.

Forensic Analysis

Identify which portions of edited audio contain watermarks for forensic investigations.

Press and Media Coverage

MIT Technology Review

“Meta has created a way to watermark AI-generated speech”June 18, 2024In-depth coverage of AudioSeal’s technology and implications for AI-generated content detection.

Additional Coverage

Updates and Timeline

1

January 2024

Initial paper submitted to arXiv (2401.17264)
2

April 2024

License updated to full MIT license for code and model weights, enabling commercial use
3

May 2024

Paper accepted at ICML 2024
4

June 2024

Training code released with comprehensive documentation
5

December 2024

AudioSeal 0.2 released with streaming support and improvements

Technical Resources

For researchers and developers:
When using AudioSeal for research, ensure you cite the paper and acknowledge the use of pre-trained models from Meta AI.

Contact and Collaboration

For research collaborations, questions about the paper, or technical discussions:
The research team welcomes contributions, bug reports, and suggestions for improvements. See the Contributing Guide for details.