Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AudioSeal is a neural audio watermarking system that jointly trains two deep learning models:Generator
Embeds imperceptible watermarks into audio signals
Detector
Identifies watermarked segments with sample-level precision
Architecture
SEANet Encoder-Decoder Foundation
Both the generator and detector are built on the SEANet (Sound Enhancement Audio Network) architecture, which provides efficient audio processing through:Key Architecture Features
Key Architecture Features
- Residual Blocks: SEANetResnetBlock components with dilated convolutions
- Strided Convolutions: Efficient temporal downsampling using configurable ratios (e.g., [8, 5, 4, 2])
- Streaming Support: Maintains convolutional cache for real-time processing
- Causal Processing: Optional causal convolutions for streaming applications
Architecture Parameters
Architecture Parameters
The SEANet architecture is configured through:
n_filters: Base channel width (typically 32)dimension: Hidden representation size (typically 128)n_residual_layers: Depth of residual processing (typically 3)ratios: Temporal compression factorskernel_size: Convolution window sizes
Generator Architecture
The watermark generator (AudioSealWM class) consists of three main components:
Detector Architecture
The watermark detector (AudioSealDetector class) uses:
SEANetEncoderKeepDimension to preserve temporal resolution, enabling localized detection at every audio frame.
Training Methodology
AudioSeal uses a joint training approach with several key innovations:1. Joint Training
Generate Watermark
The generator creates a watermark signal from clean audio and an optional message
Apply Augmentations
Random audio transformations simulate real-world edits (compression, noise, etc.)
2. Perceptual Loss Function
The training uses a novel perceptual loss that balances multiple objectives:The perceptual loss is designed to ensure watermarks are imperceptible while remaining detectable and robust to audio transformations.
- Perceptual Similarity: Ensures watermarked audio sounds identical to the original
- Detection Loss: Maximizes detector confidence on watermarked audio
- Message Decoding Loss: Ensures accurate message recovery when present
- Robustness Loss: Maintains detection after augmentations (compression, noise, resampling)
3. Training Data
AudioSeal is trained on large-scale speech datasets:- VoxPopuli: 400K hours of unlabeled speech data
- Sample Rate: 16 kHz (with support for 24 kHz, 44.1 kHz, 48 kHz)
- Augmentations: AAC compression, MP3 compression, additive noise, resampling, time stretching
Message Embedding
The optional message embedding system allows encoding up to 65,536 unique identifiers (2^16):- Takes a binary message of shape
(batch, 16) - Uses an embedding layer to map each bit to a hidden vector
- Adds the message representation to the encoder output
- The decoder then generates a watermark that encodes this message
The message is optional and does not affect detection. It can be used to identify model versions, track audio sources, or embed metadata.
Performance Characteristics
Detection Speed
2 orders of magnitude faster than competing methods, enabling real-time processing
Robustness
Survives compression, re-encoding, noise addition, and various audio edits
Quality
Minimal perceptual impact on audio quality
Localization
Sample-level precision (1/16,000 second at 16 kHz)
Key Innovations
- Localized Watermarking: Unlike traditional methods that watermark entire files, AudioSeal operates at the sample level
- Single-Pass Detection: Fast forward pass through a convolutional network (no iterative decoding)
- Streaming Support: Can process audio in real-time using convolutional caching
- Joint Training: Generator and detector are trained together for optimal performance
AudioSeal’s architecture enables it to be both imperceptible and robust, solving a key challenge in audio watermarking.
Next Steps
Watermark Generation
Learn how the generator creates watermarks
Watermark Detection
Understand the detection process
Localized Watermarking
Explore sample-level precision
Training Guide
Train your own model
