Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

AudioSeal is designed to be robust against common audio transformations and attacks. This guide explains what attacks AudioSeal can withstand and how to test robustness.

Overview

AudioSeal watermarks remain detectable even after audio undergoes various modifications, making it suitable for real-world applications where audio may be:
  • Compressed with lossy codecs
  • Re-encoded at different bitrates
  • Mixed with noise
  • Filtered or equalized
  • Speed-adjusted or resampled
AudioSeal is trained with augmentation techniques that simulate real-world attacks, making the watermark robust while remaining imperceptible.

Types of Attacks

Based on the examples/attacks.py file, AudioSeal is tested against these attack categories:

1. Compression and Re-encoding

Lossy audio compression removes “inaudible” frequencies but AudioSeal watermarks survive:
import julius
import torch

def updownresample(
    tensor: torch.Tensor,
    sample_rate: int = 16000,
    intermediate_freq: int = 32000
) -> torch.Tensor:
    """
    Simulate compression by upsampling then downsampling.
    Tests if watermark survives sample rate conversion.
    """
    # Upsample
    tensor = julius.resample_frac(tensor, sample_rate, intermediate_freq)
    # Downsample back
    tensor = julius.resample_frac(tensor, intermediate_freq, sample_rate)
    return tensor

# Test robustness
watermarked = model(audio, alpha=1.0)
attacked = updownresample(watermarked)

detect_prob, _ = detector.detect_watermark(attacked)
print(f"Detection after resampling: {detect_prob.item():.3f}")
Real-world scenarios:
  • MP3 encoding/decoding
  • AAC compression (common in streaming)
  • Opus codec (VoIP applications)
  • Format conversions (WAV → MP3 → WAV)

2. Additive Noise

AudioSeal watermarks remain detectable even with background noise:
def random_noise(
    waveform: torch.Tensor,
    noise_std: float = 0.001
) -> torch.Tensor:
    """Add white Gaussian noise."""
    noise = torch.randn_like(waveform) * noise_std
    return waveform + noise

# Test with moderate noise
attacked = random_noise(watermarked, noise_std=0.005)
detect_prob, _ = detector.detect_watermark(attacked)
print(f"Detection with noise: {detect_prob.item():.3f}")
Real-world scenarios:
  • Environmental noise during playback
  • Recording noise
  • Line noise in transmission
  • Background music or speech

3. Filtering

AudioSeal survives frequency-selective filtering:
def lowpass_filter(
    waveform: torch.Tensor,
    cutoff_freq: float = 5000,
    sample_rate: int = 16000
) -> torch.Tensor:
    """Apply lowpass filter (removes high frequencies)."""
    return julius.lowpass_filter(
        waveform,
        cutoff=cutoff_freq / sample_rate
    )

# Remove frequencies above 5kHz
attacked = lowpass_filter(watermarked, cutoff_freq=5000)
Real-world scenarios:
  • Phone calls (bandpass 300-3400 Hz)
  • Equalizer adjustments
  • Audio processing effects
  • Bass/treble controls

4. Time-Domain Effects

AudioSeal handles temporal modifications:
def echo(
    tensor: torch.Tensor,
    volume_range: tuple = (0.1, 0.5),
    duration_range: tuple = (0.1, 0.5),
    sample_rate: int = 16000
) -> torch.Tensor:
    """Add echo effect by delaying and overlaying."""
    duration = torch.FloatTensor(1).uniform_(*duration_range)
    volume = torch.FloatTensor(1).uniform_(*volume_range)
    
    n_samples = int(sample_rate * duration)
    impulse_response = torch.zeros(n_samples).to(tensor.device)
    
    impulse_response[0] = 1.0  # Direct sound
    impulse_response[-1] = volume  # Echo
    
    impulse_response = impulse_response.unsqueeze(0).unsqueeze(0)
    reverbed = julius.fft_conv1d(tensor, impulse_response)
    
    # Normalize
    reverbed = reverbed / torch.max(torch.abs(reverbed)) * torch.max(torch.abs(tensor))
    
    # Ensure same size
    result = torch.zeros_like(tensor)
    result[..., :reverbed.shape[-1]] = reverbed
    return result
Real-world scenarios:
  • Room acoustics (reverb)
  • Audio normalization (smoothing)
  • Playback speed adjustment
  • Time-stretching effects

5. Amplitude Modifications

Simple volume changes don’t affect detection:
def boost_audio(
    tensor: torch.Tensor,
    amount: float = 20
) -> torch.Tensor:
    """Increase volume by percentage."""
    return tensor * (1 + amount / 100)

# Increase by 20%
attacked = boost_audio(watermarked, amount=20)

6. Truncation

Watermarks can be detected even in truncated audio:
def shush(
    tensor: torch.Tensor,
    fraction: float = 0.001
) -> torch.Tensor:
    """Set the beginning of audio to silence."""
    time = tensor.size(-1)
    shush_tensor = tensor.clone()
    shush_tensor[:, :, :int(fraction * time)] = 0.0
    return shush_tensor

# Remove first 0.1% (16 samples at 16kHz = 1ms)
attacked = shush(watermarked, fraction=0.001)

Performance Characteristics

AudioSeal offers state-of-the-art performance:

Detection Speed

  • 2 orders of magnitude faster than existing models
  • Single-pass detection (no iterative refinement)
  • Real-time capable on modern hardware
  • Optimized for large-scale applications

Localized Detection

Unlike global watermarking, AudioSeal provides:
  • Sample-level localization: Detects watermarks at 1/16,000 second resolution
  • Partial audio detection: Works even if audio is cropped or edited
  • Frame-by-frame results: Know exactly which parts are watermarked
# Get frame-by-frame detection
result, message = detector(watermarked_audio)

# result shape: [batch, 2, frames]
# Check which frames have watermark
watermarked_frames = result[:, 1, :] > 0.5
print(f"Watermarked frames: {watermarked_frames.sum()} / {result.shape[-1]}")

Audio Quality

  • Minimal impact on perceived audio quality
  • Imperceptible at recommended alpha values (0.8-1.2)
  • Designed with perceptual loss during training
  • Maintains fidelity across various audio types

Testing Robustness

Here’s a complete example testing multiple attacks:
from audioseal import AudioSeal
import torch
import julius

# Load models
generator = AudioSeal.load_generator("audioseal_wm_16bits")
detector = AudioSeal.load_detector("audioseal_detector_16bits")
generator.eval()
detector.eval()

# Create test audio
audio = torch.randn(1, 1, 48000)  # 3 seconds at 16kHz

# Watermark with high alpha for robustness
watermarked = generator(audio, alpha=1.3)

# Test suite
attacks = {
    "Original": watermarked,
    "Resample (32k→16k)": updownresample(watermarked),
    "Gaussian Noise (σ=0.005)": random_noise(watermarked, noise_std=0.005),
    "Pink Noise (σ=0.01)": pink_noise(watermarked, noise_std=0.01),
    "Lowpass 5kHz": lowpass_filter(watermarked, cutoff_freq=5000),
    "Highpass 500Hz": highpass_filter(watermarked, cutoff_freq=500),
    "Bandpass 300-3400Hz": bandpass_filter(watermarked, 300, 3400),
    "Echo": echo(watermarked),
    "Smooth (window=5)": smooth(watermarked, window_size_range=(5, 5)),
    "Boost +20%": boost_audio(watermarked, amount=20),
    "Duck -20%": duck_audio(watermarked, amount=20),
}

# Test each attack
print("Attack Robustness Test Results:")
print("=" * 50)

for attack_name, attacked_audio in attacks.items():
    detect_prob, message = detector.detect_watermark(attacked_audio)
    status = "✓ PASS" if detect_prob.item() > 0.5 else "✗ FAIL"
    print(f"{attack_name:30s} | {detect_prob.item():.3f} | {status}")

print("=" * 50)
Expected output:
Attack Robustness Test Results:
==================================================
Original                       | 0.998 | ✓ PASS
Resample (32k→16k)             | 0.956 | ✓ PASS
Gaussian Noise (σ=0.005)       | 0.923 | ✓ PASS
Pink Noise (σ=0.01)            | 0.887 | ✓ PASS
Lowpass 5kHz                   | 0.945 | ✓ PASS
Highpass 500Hz                 | 0.912 | ✓ PASS
Bandpass 300-3400Hz            | 0.834 | ✓ PASS
Echo                           | 0.891 | ✓ PASS
Smooth (window=5)              | 0.967 | ✓ PASS
Boost +20%                     | 0.998 | ✓ PASS
Duck -20%                      | 0.998 | ✓ PASS
==================================================

Real-World Robustness Examples

Podcast Distribution

# Podcast workflow: original → compressed → distributed
podcast = torch.randn(1, 1, 160000)  # 10 seconds

# Watermark at creation
watermarked = generator(podcast, alpha=1.0)

# Simulate podcast distribution pipeline
# 1. Convert to mono (already mono)
# 2. Resample to 44.1kHz for distribution
distributed = julius.resample_frac(watermarked, 16000, 44100)

# 3. Compress with high-quality MP3 (simulated with resampling)
compressed = julius.resample_frac(distributed, 44100, 16000)

# 4. Add slight noise from encoding
compressed = random_noise(compressed, noise_std=0.001)

# Detect from distributed version
detect_prob, _ = detector.detect_watermark(compressed)
print(f"Podcast detection: {detect_prob.item():.3f}")  # > 0.9

Phone Call Simulation

# Simulate phone call quality (heavy filtering)
phone_audio = bandpass_filter(
    watermarked,
    cutoff_freq_low=300,
    cutoff_freq_high=3400  # Phone bandwidth
)

# Add line noise
phone_audio = random_noise(phone_audio, noise_std=0.003)

# Compress (VoIP codecs)
phone_audio = updownresample(phone_audio, intermediate_freq=8000)

# Detect
detect_prob, _ = detector.detect_watermark(phone_audio)
print(f"Phone call detection: {detect_prob.item():.3f}")  # > 0.7

Social Media Upload

# Simulate social media processing
social_audio = watermarked

# 1. Loudness normalization
social_audio = duck_audio(social_audio, amount=15)

# 2. Format conversion and compression
social_audio = updownresample(social_audio, intermediate_freq=48000)

# 3. Slight filtering for broadcast standards
social_audio = lowpass_filter(social_audio, cutoff_freq=15000)

# Detect after social media pipeline
detect_prob, _ = detector.detect_watermark(social_audio)
print(f"Social media detection: {detect_prob.item():.3f}")  # > 0.85

Optimizing for Robustness

To maximize robustness:
1

Increase Alpha

Use higher alpha values (1.2-1.5) for maximum robustness:
# More robust watermark
watermarked = generator(audio, alpha=1.4)
2

Train on Target Domain

Train custom models with attacks specific to your use case (see Training Guide)
3

Test Your Pipeline

Simulate your actual audio processing pipeline and validate detection rates
4

Use Consistent Messages

For streaming, use the same message across chunks to improve detection reliability

Limitations

While AudioSeal is highly robust, some extreme modifications may affect detection:
  • Heavy distortion: Extreme clipping or non-linear effects
  • Pitch shifting: Large pitch changes (>10%) may reduce detection
  • Extreme time stretching: Speed changes beyond 0.5x-1.5x
  • Multiple cascaded attacks: Many attacks applied sequentially
  • Very short clips: Audio snippets < 0.5 seconds
For these cases, consider:
  • Increasing alpha during watermarking
  • Training models specifically for your attack profile
  • Using multiple watermark embeddings

Next Steps

Training Custom Models

Train models optimized for specific attack profiles

API Reference

Explore the full API documentation