Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt
Use this file to discover all available pages before exploring further.
Overview
TheNormalizationProcessor class provides audio normalization utilities that improve watermark imperceptibility and detection robustness. It includes methods for fitting watermarks within audio envelopes and normalizing loudness levels.
Initialization
Parameters
Size of the processing window in samples. Smaller windows provide finer-grained control but may introduce artifacts. Typical values range from 3 to 10.
Reference RMS (root mean square) value for loudness normalization. Audio will be scaled to match this target RMS level.
Methods
compute_rms
Compute the root mean square (RMS) energy of an audio signal.Parameters
Input audio tensor of shape
(batch, channels, timesteps).Returns
RMS value tensor of shape
(batch, channels, 1). Represents the energy level of the signal.Example
fit_inside_envelope
Normalize a watermark signal to fit inside the envelope of the original audio.Parameters
Reference audio tensor of shape
(batch, channels, timesteps). This defines the target envelope that wav2 should fit within.Signal to be normalized of shape
(batch, channels, timesteps). Typically the watermark signal that needs to be scaled down.Returns
Normalized version of
wav2 with shape (batch, channels, timesteps). The signal is scaled to fit within the envelope of wav1.How It Works
- Windowing: Divides both signals into overlapping windows using a Hann window for smooth transitions
- RMS Computation: Calculates RMS for each window in both signals
- Gain Calculation: Computes gain to fit
wav2insidewav1’s envelope (clamped between 0.01 and 1.0) - Application: Applies gain to each window with Hann window weighting
- Reconstruction: Reconstructs the signal using overlap-add
Example
loudness_normalization
Normalize the loudness of an audio signal to match a reference RMS level.Parameters
Input audio tensor of shape
(batch, channels, timesteps) to be normalized.Returns
Loudness-normalized audio of shape
(batch, channels, timesteps). The RMS level is adjusted to match the reference RMS value.How It Works
- Windowing: Divides signal into overlapping windows with Hann window
- RMS Computation: Calculates RMS for each window
- Gain Calculation: Computes gain to achieve reference RMS (clamped between 1.0 and 10.0)
- Application: Applies gain to each window with Hann window weighting
- Reconstruction: Reconstructs using overlap-add
Example
Attributes
Size of the processing window used for normalization.
Target RMS value for loudness normalization.
Integration in AudioSeal
TheNormalizationProcessor is optionally used in both generator and detector:
In AudioSealWM (Generator)
In AudioSealDetector (Detector)
Complete Example
Use Cases
Imperceptible Watermarking
Robust Detection
Audio Preprocessing
Technical Notes
- Overlap-Add: Uses 50% overlap between windows for smooth reconstruction
- Hann Windowing: Applies Hann window to avoid boundary artifacts
- Gain Limiting: Clamps gain values to prevent extreme amplification or attenuation
- TorchScript Support: Methods are JIT-exportable for optimized inference
- Eager Mode Only:
fit_inside_envelopeonly works in eager mode (not withtorch.jit.script)
See Also
- AudioSealWM - Generator that uses normalization
- AudioSealDetector - Detector that uses normalization
- Attack Robustness Guide - Handling audio modifications
