Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

AudioSeal supports embedding secret messages within watermarks, allowing you to encode up to 65,536 unique identifiers or metadata values.

What are Secret Messages?

Secret messages are optional 16-bit binary values embedded in the watermark alongside the detection signal. They enable:
  • Model versioning: Identify which model version generated the audio
  • Unique identifiers: Track individual audio files with unique IDs (0 to 65,535)
  • Metadata encoding: Embed timestamps, user IDs, or other information
  • Batch tracking: Identify batches of generated audio
The message is optional and has no influence on watermark detection. It’s only used to carry additional information.

Creating Messages

Messages are 16-bit binary tensors with shape [batch, 16]:

Random Messages

import torch

# Generate a random 16-bit message
batch_size = 1
message = torch.randint(0, 2, (batch_size, 16))

print(message)
# Output: tensor([[0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1]])

Custom Messages from Integers

Convert an integer (0 to 65,535) to a 16-bit binary message:
def int_to_message(value: int, batch_size: int = 1) -> torch.Tensor:
    """
    Convert an integer to a 16-bit binary message.
    
    Args:
        value: Integer between 0 and 65,535
        batch_size: Number of copies in the batch
    
    Returns:
        Binary tensor of shape [batch_size, 16]
    """
    assert 0 <= value < 2**16, f"Value must be between 0 and {2**16-1}"
    
    # Convert to binary string and pad to 16 bits
    binary_str = format(value, '016b')
    
    # Convert to tensor
    bits = [int(b) for b in binary_str]
    message = torch.tensor([bits], dtype=torch.int32)
    
    # Repeat for batch
    if batch_size > 1:
        message = message.repeat(batch_size, 1)
    
    return message

# Example: Encode model version ID
model_version = 42
message = int_to_message(model_version)
print(f"Model version {model_version}: {message}")
# Output: Model version 42: tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]])

Decoding Messages to Integers

def message_to_int(message: torch.Tensor) -> int:
    """
    Convert a 16-bit binary message back to an integer.
    
    Args:
        message: Binary tensor of shape [16] or [1, 16]
    
    Returns:
        Integer value (0 to 65,535)
    """
    if message.ndim == 2:
        message = message.squeeze(0)
    
    # Convert binary tensor to integer
    binary_str = ''.join(str(bit.item()) for bit in message)
    return int(binary_str, 2)

# Decode a message
message = torch.tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]])
value = message_to_int(message)
print(f"Decoded value: {value}")  # Output: 42

Embedding Messages

Embed a message when watermarking audio:
from audioseal import AudioSeal
import torch

# Load model
model = AudioSeal.load_generator("audioseal_wm_16bits")
model.eval()

# Your audio
wav = torch.randn(1, 1, 16000)  # 1 second at 16kHz

# Create a message
message = torch.randint(0, 2, (1, 16))

# Method 1: Using get_watermark
watermark = model.get_watermark(wav, message=message)
watermarked_audio = wav + watermark

# Method 2: Using forward
watermarked_audio = model(wav, message=message, alpha=1.0)
If no message is provided, AudioSeal will generate a random 16-bit message automatically.

Decoding Messages

Retrieve the embedded message from watermarked audio:

High-Level Detection

detector = AudioSeal.load_detector("audioseal_detector_16bits")
detector.eval()

# Detect watermark and decode message
detect_prob, decoded_message = detector.detect_watermark(watermarked_audio)

print(f"Detection probability: {detect_prob.item():.3f}")
print(f"Original message: {message}")
print(f"Decoded message:  {decoded_message}")

# Check if messages match
if torch.equal(message, decoded_message):
    print("✓ Message decoded successfully!")
else:
    print("✗ Message mismatch")

Low-Level Detection

# Get detailed detection results
result, decoded_message = detector(watermarked_audio)

# result shape: [batch, 2, frames]
# result[:, 1, :] contains watermark probabilities per frame

# decoded_message shape: [batch, 16]
# Each value is the probability of that bit being 1

print(f"Message probabilities: {decoded_message}")
# Convert to binary (threshold at 0.5)
binary_message = (decoded_message > 0.5).int()
print(f"Binary message: {binary_message}")
If the detector does not find a watermark, the decoded message will be random. Always check the detection probability before trusting the message.

Use Cases

1

Model Version Tracking

Embed model version IDs to track which model generated the audio:
model_id = 1  # Version 1
message = int_to_message(model_id)
watermarked = model(audio, message=message)
2

Unique File IDs

Assign a unique ID to each generated audio file:
import uuid

# Use lower 16 bits of UUID
file_id = uuid.uuid4().int & 0xFFFF  # Get 16-bit value
message = int_to_message(file_id)
watermarked = model(audio, message=message)
3

Timestamp Encoding

Embed a compact timestamp (limited to 16 bits):
import time

# Use minutes since epoch (fits in 16 bits for ~45 days)
timestamp = int(time.time() // 60) & 0xFFFF
message = int_to_message(timestamp)
watermarked = model(audio, message=message)
4

User or Session IDs

Associate audio with specific users or sessions:
user_id = 12345  # Your user ID (0-65535)
message = int_to_message(user_id)
watermarked = model(audio, message=message)

Complete Example

from audioseal import AudioSeal
import torch

def int_to_message(value: int, batch_size: int = 1) -> torch.Tensor:
    """Convert integer to 16-bit binary message."""
    assert 0 <= value < 2**16
    binary_str = format(value, '016b')
    bits = [int(b) for b in binary_str]
    message = torch.tensor([bits], dtype=torch.int32)
    if batch_size > 1:
        message = message.repeat(batch_size, 1)
    return message

def message_to_int(message: torch.Tensor) -> int:
    """Convert 16-bit binary message to integer."""
    if message.ndim == 2:
        message = message.squeeze(0)
    binary_str = ''.join(str(bit.item()) for bit in message)
    return int(binary_str, 2)

# Load models
generator = AudioSeal.load_generator("audioseal_wm_16bits")
detector = AudioSeal.load_detector("audioseal_detector_16bits")
generator.eval()
detector.eval()

# Create audio
audio = torch.randn(1, 1, 48000)  # 3 seconds at 16kHz

# Embed a secret ID
secret_id = 42
message = int_to_message(secret_id)
print(f"Embedding secret ID: {secret_id}")
print(f"Binary message: {message}")

# Watermark the audio
watermarked = generator(audio, message=message, alpha=1.0)

# Detect and decode
detect_prob, decoded_message = detector.detect_watermark(watermarked)

print(f"\nDetection probability: {detect_prob.item():.3f}")
print(f"Decoded message: {decoded_message}")

# Convert back to integer
decoded_id = message_to_int(decoded_message)
print(f"Decoded secret ID: {decoded_id}")

if secret_id == decoded_id:
    print("✓ Secret message successfully embedded and recovered!")
else:
    print(f"✗ Message mismatch: expected {secret_id}, got {decoded_id}")

Message Behavior

Without Watermark

# Test detection on clean (non-watermarked) audio
clean_audio = torch.randn(1, 1, 48000)

detect_prob, message = detector.detect_watermark(clean_audio)

print(f"Detection probability: {detect_prob.item():.3f}")  # ~0.0
print(f"Message: {message}")  # Random values!
When no watermark is detected (low detection probability), the message will be random. Always validate the detection probability before using the decoded message.

Handling Detection Failures

def safe_decode_message(audio, detector, threshold=0.5):
    """
    Safely decode a message, returning None if no watermark is detected.
    """
    detect_prob, message = detector.detect_watermark(audio)
    
    if detect_prob.item() < threshold:
        return None, detect_prob.item()
    
    # Convert to integer
    decoded_value = message_to_int(message)
    return decoded_value, detect_prob.item()

# Usage
value, prob = safe_decode_message(watermarked_audio, detector)

if value is not None:
    print(f"Decoded ID: {value} (confidence: {prob:.3f})")
else:
    print(f"No watermark detected (probability: {prob:.3f})")

Streaming with Messages

Use the same message across all chunks in streaming mode:
model = AudioSeal.load_generator("audioseal_wm_streaming")
model.eval()

# Create a consistent message for all chunks
secret_message = int_to_message(123)

audio_chunks = [...]  # Your chunks
watermarked_chunks = []

with model.streaming(batch_size=1):
    for chunk in audio_chunks:
        # Use same message for all chunks
        watermarked = model(
            chunk,
            message=secret_message,  # Same message
            alpha=1.0
        )
        watermarked_chunks.append(watermarked)

full_audio = torch.cat(watermarked_chunks, dim=-1)

# Decode from full audio
detect_prob, decoded = detector.detect_watermark(full_audio)
print(f"Decoded ID: {message_to_int(decoded)}")

Next Steps

Training Custom Models

Train your own watermarking models with custom configurations

Attack Robustness

Learn how secret messages survive audio attacks