Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The NormalizationProcessor class provides audio normalization utilities that improve watermark imperceptibility and detection robustness. It includes methods for fitting watermarks within audio envelopes and normalizing loudness levels.

Initialization

from audioseal.models import NormalizationProcessor

normalizer = NormalizationProcessor(
    window_size=5,
    reference_rms=0.1
)

Parameters

window_size
int
default:"5"
Size of the processing window in samples. Smaller windows provide finer-grained control but may introduce artifacts. Typical values range from 3 to 10.
reference_rms
float
default:"0.1"
Reference RMS (root mean square) value for loudness normalization. Audio will be scaled to match this target RMS level.

Methods

compute_rms

Compute the root mean square (RMS) energy of an audio signal.
import torch

audio = torch.randn(1, 1, 16000)  # 1 second at 16kHz
rms = normalizer.compute_rms(audio)
print(f"RMS energy: {rms.item():.4f}")

Parameters

signal
torch.Tensor
required
Input audio tensor of shape (batch, channels, timesteps).

Returns

rms
torch.Tensor
RMS value tensor of shape (batch, channels, 1). Represents the energy level of the signal.

Example

import torch
import torchaudio
from audioseal.models import NormalizationProcessor

normalizer = NormalizationProcessor()

# Load audio
audio, sr = torchaudio.load("audio.wav")

# Compute RMS
rms = normalizer.compute_rms(audio)

print(f"Audio RMS: {rms.item():.4f}")
print(f"Audio dB: {20 * torch.log10(rms).item():.2f} dB")

fit_inside_envelope

Normalize a watermark signal to fit inside the envelope of the original audio.
# Ensure watermark doesn't exceed audio envelope
normalized_watermark = normalizer.fit_inside_envelope(
    wav1=original_audio,
    wav2=watermark
)

Parameters

wav1
torch.Tensor
required
Reference audio tensor of shape (batch, channels, timesteps). This defines the target envelope that wav2 should fit within.
wav2
torch.Tensor
required
Signal to be normalized of shape (batch, channels, timesteps). Typically the watermark signal that needs to be scaled down.

Returns

normalized_signal
torch.Tensor
Normalized version of wav2 with shape (batch, channels, timesteps). The signal is scaled to fit within the envelope of wav1.

How It Works

  1. Windowing: Divides both signals into overlapping windows using a Hann window for smooth transitions
  2. RMS Computation: Calculates RMS for each window in both signals
  3. Gain Calculation: Computes gain to fit wav2 inside wav1’s envelope (clamped between 0.01 and 1.0)
  4. Application: Applies gain to each window with Hann window weighting
  5. Reconstruction: Reconstructs the signal using overlap-add

Example

import torch
import torchaudio
from audioseal.models import NormalizationProcessor

normalizer = NormalizationProcessor(window_size=5)

# Load audio and generate watermark
audio, sr = torchaudio.load("audio.wav")
watermark = torch.randn_like(audio) * 0.1  # Random watermark

# Fit watermark inside audio envelope
normalized_watermark = normalizer.fit_inside_envelope(
    wav1=audio,
    wav2=watermark
)

# Apply normalized watermark
watermarked = audio + normalized_watermark

# Save result
torchaudio.save("watermarked.wav", watermarked, sr)

print(f"Original watermark RMS: {normalizer.compute_rms(watermark).item():.4f}")
print(f"Normalized watermark RMS: {normalizer.compute_rms(normalized_watermark).item():.4f}")
print(f"Audio RMS: {normalizer.compute_rms(audio).item():.4f}")

loudness_normalization

Normalize the loudness of an audio signal to match a reference RMS level.
# Normalize audio to consistent loudness
normalized_audio = normalizer.loudness_normalization(audio)

Parameters

wav
torch.Tensor
required
Input audio tensor of shape (batch, channels, timesteps) to be normalized.

Returns

normalized_audio
torch.Tensor
Loudness-normalized audio of shape (batch, channels, timesteps). The RMS level is adjusted to match the reference RMS value.

How It Works

  1. Windowing: Divides signal into overlapping windows with Hann window
  2. RMS Computation: Calculates RMS for each window
  3. Gain Calculation: Computes gain to achieve reference RMS (clamped between 1.0 and 10.0)
  4. Application: Applies gain to each window with Hann window weighting
  5. Reconstruction: Reconstructs using overlap-add

Example

import torch
import torchaudio
from audioseal.models import NormalizationProcessor

# Create normalizer with target RMS
normalizer = NormalizationProcessor(
    window_size=5,
    reference_rms=0.1
)

# Load audio files with different loudness levels
audio1, sr = torchaudio.load("quiet_audio.wav")
audio2, sr = torchaudio.load("loud_audio.wav")

print(f"Audio 1 RMS: {normalizer.compute_rms(audio1).item():.4f}")
print(f"Audio 2 RMS: {normalizer.compute_rms(audio2).item():.4f}")

# Normalize both to same loudness
norm_audio1 = normalizer.loudness_normalization(audio1)
norm_audio2 = normalizer.loudness_normalization(audio2)

print(f"Normalized audio 1 RMS: {normalizer.compute_rms(norm_audio1).item():.4f}")
print(f"Normalized audio 2 RMS: {normalizer.compute_rms(norm_audio2).item():.4f}")

# Both should now have similar RMS values close to 0.1

Attributes

window_size
int
Size of the processing window used for normalization.
reference_rms
float
Target RMS value for loudness normalization.

Integration in AudioSeal

The NormalizationProcessor is optionally used in both generator and detector:

In AudioSealWM (Generator)

# After generating watermark
watermark = self.decoder(hidden)

# Fit watermark inside audio envelope
if self.normalizer is not None:
    watermark = self.normalizer.fit_inside_envelope(x, watermark)

In AudioSealDetector (Detector)

# Before detection
if self.normalizer is not None:
    x = self.normalizer.loudness_normalization(x)

result = self.detector(x)

Complete Example

import torch
import torchaudio
from audioseal.models import NormalizationProcessor

# Initialize normalizer
normalizer = NormalizationProcessor(
    window_size=5,
    reference_rms=0.1
)

# Load original audio
audio, sr = torchaudio.load("audio.wav")
print(f"Original audio RMS: {normalizer.compute_rms(audio).item():.4f}")

# Step 1: Normalize audio loudness for consistent processing
normalized_audio = normalizer.loudness_normalization(audio)
print(f"Normalized audio RMS: {normalizer.compute_rms(normalized_audio).item():.4f}")

# Step 2: Generate a watermark signal (simulated)
watermark = torch.randn_like(audio) * 0.2
print(f"Raw watermark RMS: {normalizer.compute_rms(watermark).item():.4f}")

# Step 3: Fit watermark inside audio envelope
fitted_watermark = normalizer.fit_inside_envelope(
    wav1=normalized_audio,
    wav2=watermark
)
print(f"Fitted watermark RMS: {normalizer.compute_rms(fitted_watermark).item():.4f}")

# Step 4: Apply watermark
watermarked = normalized_audio + fitted_watermark

# Save results
torchaudio.save("normalized.wav", normalized_audio, sr)
torchaudio.save("watermarked.wav", watermarked, sr)

print("\nProcessing complete!")
print(f"Final watermarked audio RMS: {normalizer.compute_rms(watermarked).item():.4f}")

Use Cases

Imperceptible Watermarking

# Ensure watermark never exceeds audio signal
normalizer = NormalizationProcessor(window_size=5)
watermark = normalizer.fit_inside_envelope(audio, raw_watermark)

Robust Detection

# Normalize loudness before detection for consistency
normalizer = NormalizationProcessor(reference_rms=0.1)
normalized = normalizer.loudness_normalization(audio_to_detect)
result = detector(normalized)

Audio Preprocessing

# Standardize audio files to same loudness level
normalizer = NormalizationProcessor(reference_rms=0.15)
for audio_file in audio_files:
    audio, sr = torchaudio.load(audio_file)
    normalized = normalizer.loudness_normalization(audio)
    torchaudio.save(f"normalized_{audio_file}", normalized, sr)

Technical Notes

  • Overlap-Add: Uses 50% overlap between windows for smooth reconstruction
  • Hann Windowing: Applies Hann window to avoid boundary artifacts
  • Gain Limiting: Clamps gain values to prevent extreme amplification or attenuation
  • TorchScript Support: Methods are JIT-exportable for optimized inference
  • Eager Mode Only: fit_inside_envelope only works in eager mode (not with torch.jit.script)

See Also