Documentation Index Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt
Use this file to discover all available pages before exploring further.
AudioSeal is designed to be robust against common audio transformations and attacks. This guide explains what attacks AudioSeal can withstand and how to test robustness.
Overview
AudioSeal watermarks remain detectable even after audio undergoes various modifications, making it suitable for real-world applications where audio may be:
Compressed with lossy codecs
Re-encoded at different bitrates
Mixed with noise
Filtered or equalized
Speed-adjusted or resampled
AudioSeal is trained with augmentation techniques that simulate real-world attacks, making the watermark robust while remaining imperceptible.
Types of Attacks
Based on the examples/attacks.py file, AudioSeal is tested against these attack categories:
1. Compression and Re-encoding
Lossy audio compression removes “inaudible” frequencies but AudioSeal watermarks survive:
import julius
import torch
def updownresample (
tensor : torch.Tensor,
sample_rate : int = 16000 ,
intermediate_freq : int = 32000
) -> torch.Tensor:
"""
Simulate compression by upsampling then downsampling.
Tests if watermark survives sample rate conversion.
"""
# Upsample
tensor = julius.resample_frac(tensor, sample_rate, intermediate_freq)
# Downsample back
tensor = julius.resample_frac(tensor, intermediate_freq, sample_rate)
return tensor
# Test robustness
watermarked = model(audio, alpha = 1.0 )
attacked = updownresample(watermarked)
detect_prob, _ = detector.detect_watermark(attacked)
print ( f "Detection after resampling: { detect_prob.item() :.3f} " )
Real-world scenarios:
MP3 encoding/decoding
AAC compression (common in streaming)
Opus codec (VoIP applications)
Format conversions (WAV → MP3 → WAV)
2. Additive Noise
AudioSeal watermarks remain detectable even with background noise:
def random_noise (
waveform : torch.Tensor,
noise_std : float = 0.001
) -> torch.Tensor:
"""Add white Gaussian noise."""
noise = torch.randn_like(waveform) * noise_std
return waveform + noise
# Test with moderate noise
attacked = random_noise(watermarked, noise_std = 0.005 )
detect_prob, _ = detector.detect_watermark(attacked)
print ( f "Detection with noise: { detect_prob.item() :.3f} " )
Real-world scenarios:
Environmental noise during playback
Recording noise
Line noise in transmission
Background music or speech
3. Filtering
AudioSeal survives frequency-selective filtering:
Lowpass Filter
Highpass Filter
Bandpass Filter
def lowpass_filter (
waveform : torch.Tensor,
cutoff_freq : float = 5000 ,
sample_rate : int = 16000
) -> torch.Tensor:
"""Apply lowpass filter (removes high frequencies)."""
return julius.lowpass_filter(
waveform,
cutoff = cutoff_freq / sample_rate
)
# Remove frequencies above 5kHz
attacked = lowpass_filter(watermarked, cutoff_freq = 5000 )
Real-world scenarios:
Phone calls (bandpass 300-3400 Hz)
Equalizer adjustments
Audio processing effects
Bass/treble controls
4. Time-Domain Effects
AudioSeal handles temporal modifications:
Echo/Reverb
Smoothing
Speed Change
def echo (
tensor : torch.Tensor,
volume_range : tuple = ( 0.1 , 0.5 ),
duration_range : tuple = ( 0.1 , 0.5 ),
sample_rate : int = 16000
) -> torch.Tensor:
"""Add echo effect by delaying and overlaying."""
duration = torch.FloatTensor( 1 ).uniform_( * duration_range)
volume = torch.FloatTensor( 1 ).uniform_( * volume_range)
n_samples = int (sample_rate * duration)
impulse_response = torch.zeros(n_samples).to(tensor.device)
impulse_response[ 0 ] = 1.0 # Direct sound
impulse_response[ - 1 ] = volume # Echo
impulse_response = impulse_response.unsqueeze( 0 ).unsqueeze( 0 )
reverbed = julius.fft_conv1d(tensor, impulse_response)
# Normalize
reverbed = reverbed / torch.max(torch.abs(reverbed)) * torch.max(torch.abs(tensor))
# Ensure same size
result = torch.zeros_like(tensor)
result[ ... , :reverbed.shape[ - 1 ]] = reverbed
return result
Real-world scenarios:
Room acoustics (reverb)
Audio normalization (smoothing)
Playback speed adjustment
Time-stretching effects
5. Amplitude Modifications
Simple volume changes don’t affect detection:
def boost_audio (
tensor : torch.Tensor,
amount : float = 20
) -> torch.Tensor:
"""Increase volume by percentage."""
return tensor * ( 1 + amount / 100 )
# Increase by 20%
attacked = boost_audio(watermarked, amount = 20 )
6. Truncation
Watermarks can be detected even in truncated audio:
def shush (
tensor : torch.Tensor,
fraction : float = 0.001
) -> torch.Tensor:
"""Set the beginning of audio to silence."""
time = tensor.size( - 1 )
shush_tensor = tensor.clone()
shush_tensor[:, :, : int (fraction * time)] = 0.0
return shush_tensor
# Remove first 0.1% (16 samples at 16kHz = 1ms)
attacked = shush(watermarked, fraction = 0.001 )
AudioSeal offers state-of-the-art performance:
Detection Speed
2 orders of magnitude faster than existing models
Single-pass detection (no iterative refinement)
Real-time capable on modern hardware
Optimized for large-scale applications
Localized Detection
Unlike global watermarking, AudioSeal provides:
Sample-level localization : Detects watermarks at 1/16,000 second resolution
Partial audio detection : Works even if audio is cropped or edited
Frame-by-frame results : Know exactly which parts are watermarked
# Get frame-by-frame detection
result, message = detector(watermarked_audio)
# result shape: [batch, 2, frames]
# Check which frames have watermark
watermarked_frames = result[:, 1 , :] > 0.5
print ( f "Watermarked frames: { watermarked_frames.sum() } / { result.shape[ - 1 ] } " )
Audio Quality
Minimal impact on perceived audio quality
Imperceptible at recommended alpha values (0.8-1.2)
Designed with perceptual loss during training
Maintains fidelity across various audio types
Testing Robustness
Here’s a complete example testing multiple attacks:
from audioseal import AudioSeal
import torch
import julius
# Load models
generator = AudioSeal.load_generator( "audioseal_wm_16bits" )
detector = AudioSeal.load_detector( "audioseal_detector_16bits" )
generator.eval()
detector.eval()
# Create test audio
audio = torch.randn( 1 , 1 , 48000 ) # 3 seconds at 16kHz
# Watermark with high alpha for robustness
watermarked = generator(audio, alpha = 1.3 )
# Test suite
attacks = {
"Original" : watermarked,
"Resample (32k→16k)" : updownresample(watermarked),
"Gaussian Noise (σ=0.005)" : random_noise(watermarked, noise_std = 0.005 ),
"Pink Noise (σ=0.01)" : pink_noise(watermarked, noise_std = 0.01 ),
"Lowpass 5kHz" : lowpass_filter(watermarked, cutoff_freq = 5000 ),
"Highpass 500Hz" : highpass_filter(watermarked, cutoff_freq = 500 ),
"Bandpass 300-3400Hz" : bandpass_filter(watermarked, 300 , 3400 ),
"Echo" : echo(watermarked),
"Smooth (window=5)" : smooth(watermarked, window_size_range = ( 5 , 5 )),
"Boost +20%" : boost_audio(watermarked, amount = 20 ),
"Duck -20%" : duck_audio(watermarked, amount = 20 ),
}
# Test each attack
print ( "Attack Robustness Test Results:" )
print ( "=" * 50 )
for attack_name, attacked_audio in attacks.items():
detect_prob, message = detector.detect_watermark(attacked_audio)
status = "✓ PASS" if detect_prob.item() > 0.5 else "✗ FAIL"
print ( f " { attack_name :30s} | { detect_prob.item() :.3f} | { status } " )
print ( "=" * 50 )
Expected output:
Attack Robustness Test Results:
==================================================
Original | 0.998 | ✓ PASS
Resample (32k→16k) | 0.956 | ✓ PASS
Gaussian Noise (σ=0.005) | 0.923 | ✓ PASS
Pink Noise (σ=0.01) | 0.887 | ✓ PASS
Lowpass 5kHz | 0.945 | ✓ PASS
Highpass 500Hz | 0.912 | ✓ PASS
Bandpass 300-3400Hz | 0.834 | ✓ PASS
Echo | 0.891 | ✓ PASS
Smooth (window=5) | 0.967 | ✓ PASS
Boost +20% | 0.998 | ✓ PASS
Duck -20% | 0.998 | ✓ PASS
==================================================
Real-World Robustness Examples
Podcast Distribution
# Podcast workflow: original → compressed → distributed
podcast = torch.randn( 1 , 1 , 160000 ) # 10 seconds
# Watermark at creation
watermarked = generator(podcast, alpha = 1.0 )
# Simulate podcast distribution pipeline
# 1. Convert to mono (already mono)
# 2. Resample to 44.1kHz for distribution
distributed = julius.resample_frac(watermarked, 16000 , 44100 )
# 3. Compress with high-quality MP3 (simulated with resampling)
compressed = julius.resample_frac(distributed, 44100 , 16000 )
# 4. Add slight noise from encoding
compressed = random_noise(compressed, noise_std = 0.001 )
# Detect from distributed version
detect_prob, _ = detector.detect_watermark(compressed)
print ( f "Podcast detection: { detect_prob.item() :.3f} " ) # > 0.9
Phone Call Simulation
# Simulate phone call quality (heavy filtering)
phone_audio = bandpass_filter(
watermarked,
cutoff_freq_low = 300 ,
cutoff_freq_high = 3400 # Phone bandwidth
)
# Add line noise
phone_audio = random_noise(phone_audio, noise_std = 0.003 )
# Compress (VoIP codecs)
phone_audio = updownresample(phone_audio, intermediate_freq = 8000 )
# Detect
detect_prob, _ = detector.detect_watermark(phone_audio)
print ( f "Phone call detection: { detect_prob.item() :.3f} " ) # > 0.7
# Simulate social media processing
social_audio = watermarked
# 1. Loudness normalization
social_audio = duck_audio(social_audio, amount = 15 )
# 2. Format conversion and compression
social_audio = updownresample(social_audio, intermediate_freq = 48000 )
# 3. Slight filtering for broadcast standards
social_audio = lowpass_filter(social_audio, cutoff_freq = 15000 )
# Detect after social media pipeline
detect_prob, _ = detector.detect_watermark(social_audio)
print ( f "Social media detection: { detect_prob.item() :.3f} " ) # > 0.85
Optimizing for Robustness
To maximize robustness:
Increase Alpha
Use higher alpha values (1.2-1.5) for maximum robustness: # More robust watermark
watermarked = generator(audio, alpha = 1.4 )
Train on Target Domain
Train custom models with attacks specific to your use case (see Training Guide )
Test Your Pipeline
Simulate your actual audio processing pipeline and validate detection rates
Use Consistent Messages
For streaming, use the same message across chunks to improve detection reliability
Limitations
While AudioSeal is highly robust, some extreme modifications may affect detection:
Heavy distortion : Extreme clipping or non-linear effects
Pitch shifting : Large pitch changes (>10%) may reduce detection
Extreme time stretching : Speed changes beyond 0.5x-1.5x
Multiple cascaded attacks : Many attacks applied sequentially
Very short clips : Audio snippets < 0.5 seconds
For these cases, consider:
Increasing alpha during watermarking
Training models specifically for your attack profile
Using multiple watermark embeddings
Next Steps
Training Custom Models Train models optimized for specific attack profiles
API Reference Explore the full API documentation