Overview
The NormalizationProcessor class provides audio normalization utilities that improve watermark imperceptibility and detection robustness. It includes methods for fitting watermarks within audio envelopes and normalizing loudness levels.
Initialization
from audioseal.models import NormalizationProcessor
normalizer = NormalizationProcessor(
window_size=5,
reference_rms=0.1
)
Parameters
Size of the processing window in samples. Smaller windows provide finer-grained control but may introduce artifacts. Typical values range from 3 to 10.
Reference RMS (root mean square) value for loudness normalization. Audio will be scaled to match this target RMS level.
Methods
compute_rms
Compute the root mean square (RMS) energy of an audio signal.
import torch
audio = torch.randn(1, 1, 16000) # 1 second at 16kHz
rms = normalizer.compute_rms(audio)
print(f"RMS energy: {rms.item():.4f}")
Parameters
Input audio tensor of shape (batch, channels, timesteps).
Returns
RMS value tensor of shape (batch, channels, 1). Represents the energy level of the signal.
Example
import torch
import torchaudio
from audioseal.models import NormalizationProcessor
normalizer = NormalizationProcessor()
# Load audio
audio, sr = torchaudio.load("audio.wav")
# Compute RMS
rms = normalizer.compute_rms(audio)
print(f"Audio RMS: {rms.item():.4f}")
print(f"Audio dB: {20 * torch.log10(rms).item():.2f} dB")
fit_inside_envelope
Normalize a watermark signal to fit inside the envelope of the original audio.
# Ensure watermark doesn't exceed audio envelope
normalized_watermark = normalizer.fit_inside_envelope(
wav1=original_audio,
wav2=watermark
)
Parameters
Reference audio tensor of shape (batch, channels, timesteps). This defines the target envelope that wav2 should fit within.
Signal to be normalized of shape (batch, channels, timesteps). Typically the watermark signal that needs to be scaled down.
Returns
Normalized version of wav2 with shape (batch, channels, timesteps). The signal is scaled to fit within the envelope of wav1.
How It Works
- Windowing: Divides both signals into overlapping windows using a Hann window for smooth transitions
- RMS Computation: Calculates RMS for each window in both signals
- Gain Calculation: Computes gain to fit
wav2 inside wav1’s envelope (clamped between 0.01 and 1.0)
- Application: Applies gain to each window with Hann window weighting
- Reconstruction: Reconstructs the signal using overlap-add
Example
import torch
import torchaudio
from audioseal.models import NormalizationProcessor
normalizer = NormalizationProcessor(window_size=5)
# Load audio and generate watermark
audio, sr = torchaudio.load("audio.wav")
watermark = torch.randn_like(audio) * 0.1 # Random watermark
# Fit watermark inside audio envelope
normalized_watermark = normalizer.fit_inside_envelope(
wav1=audio,
wav2=watermark
)
# Apply normalized watermark
watermarked = audio + normalized_watermark
# Save result
torchaudio.save("watermarked.wav", watermarked, sr)
print(f"Original watermark RMS: {normalizer.compute_rms(watermark).item():.4f}")
print(f"Normalized watermark RMS: {normalizer.compute_rms(normalized_watermark).item():.4f}")
print(f"Audio RMS: {normalizer.compute_rms(audio).item():.4f}")
loudness_normalization
Normalize the loudness of an audio signal to match a reference RMS level.
# Normalize audio to consistent loudness
normalized_audio = normalizer.loudness_normalization(audio)
Parameters
Input audio tensor of shape (batch, channels, timesteps) to be normalized.
Returns
Loudness-normalized audio of shape (batch, channels, timesteps). The RMS level is adjusted to match the reference RMS value.
How It Works
- Windowing: Divides signal into overlapping windows with Hann window
- RMS Computation: Calculates RMS for each window
- Gain Calculation: Computes gain to achieve reference RMS (clamped between 1.0 and 10.0)
- Application: Applies gain to each window with Hann window weighting
- Reconstruction: Reconstructs using overlap-add
Example
import torch
import torchaudio
from audioseal.models import NormalizationProcessor
# Create normalizer with target RMS
normalizer = NormalizationProcessor(
window_size=5,
reference_rms=0.1
)
# Load audio files with different loudness levels
audio1, sr = torchaudio.load("quiet_audio.wav")
audio2, sr = torchaudio.load("loud_audio.wav")
print(f"Audio 1 RMS: {normalizer.compute_rms(audio1).item():.4f}")
print(f"Audio 2 RMS: {normalizer.compute_rms(audio2).item():.4f}")
# Normalize both to same loudness
norm_audio1 = normalizer.loudness_normalization(audio1)
norm_audio2 = normalizer.loudness_normalization(audio2)
print(f"Normalized audio 1 RMS: {normalizer.compute_rms(norm_audio1).item():.4f}")
print(f"Normalized audio 2 RMS: {normalizer.compute_rms(norm_audio2).item():.4f}")
# Both should now have similar RMS values close to 0.1
Attributes
Size of the processing window used for normalization.
Target RMS value for loudness normalization.
Integration in AudioSeal
The NormalizationProcessor is optionally used in both generator and detector:
In AudioSealWM (Generator)
# After generating watermark
watermark = self.decoder(hidden)
# Fit watermark inside audio envelope
if self.normalizer is not None:
watermark = self.normalizer.fit_inside_envelope(x, watermark)
In AudioSealDetector (Detector)
# Before detection
if self.normalizer is not None:
x = self.normalizer.loudness_normalization(x)
result = self.detector(x)
Complete Example
import torch
import torchaudio
from audioseal.models import NormalizationProcessor
# Initialize normalizer
normalizer = NormalizationProcessor(
window_size=5,
reference_rms=0.1
)
# Load original audio
audio, sr = torchaudio.load("audio.wav")
print(f"Original audio RMS: {normalizer.compute_rms(audio).item():.4f}")
# Step 1: Normalize audio loudness for consistent processing
normalized_audio = normalizer.loudness_normalization(audio)
print(f"Normalized audio RMS: {normalizer.compute_rms(normalized_audio).item():.4f}")
# Step 2: Generate a watermark signal (simulated)
watermark = torch.randn_like(audio) * 0.2
print(f"Raw watermark RMS: {normalizer.compute_rms(watermark).item():.4f}")
# Step 3: Fit watermark inside audio envelope
fitted_watermark = normalizer.fit_inside_envelope(
wav1=normalized_audio,
wav2=watermark
)
print(f"Fitted watermark RMS: {normalizer.compute_rms(fitted_watermark).item():.4f}")
# Step 4: Apply watermark
watermarked = normalized_audio + fitted_watermark
# Save results
torchaudio.save("normalized.wav", normalized_audio, sr)
torchaudio.save("watermarked.wav", watermarked, sr)
print("\nProcessing complete!")
print(f"Final watermarked audio RMS: {normalizer.compute_rms(watermarked).item():.4f}")
Use Cases
Imperceptible Watermarking
# Ensure watermark never exceeds audio signal
normalizer = NormalizationProcessor(window_size=5)
watermark = normalizer.fit_inside_envelope(audio, raw_watermark)
Robust Detection
# Normalize loudness before detection for consistency
normalizer = NormalizationProcessor(reference_rms=0.1)
normalized = normalizer.loudness_normalization(audio_to_detect)
result = detector(normalized)
Audio Preprocessing
# Standardize audio files to same loudness level
normalizer = NormalizationProcessor(reference_rms=0.15)
for audio_file in audio_files:
audio, sr = torchaudio.load(audio_file)
normalized = normalizer.loudness_normalization(audio)
torchaudio.save(f"normalized_{audio_file}", normalized, sr)
Technical Notes
- Overlap-Add: Uses 50% overlap between windows for smooth reconstruction
- Hann Windowing: Applies Hann window to avoid boundary artifacts
- Gain Limiting: Clamps gain values to prevent extreme amplification or attenuation
- TorchScript Support: Methods are JIT-exportable for optimized inference
- Eager Mode Only:
fit_inside_envelope only works in eager mode (not with torch.jit.script)
See Also