> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt > Use this file to discover all available pages before exploring further. # Training Custom Models > Train your own AudioSeal watermarking models from scratch This guide shows you how to train custom AudioSeal models using your own datasets and configurations. The training pipeline was developed using [AudioCraft](https://github.com/facebookresearch/audiocraft) (version 1.4.0a1 and later) with PyTorch 2.1.0 and torchaudio 2.1.0. ## Prerequisites Before starting, ensure you have the required dependencies: AudioCraft >=1.4.0a1 is required. Install from source for maximum flexibility: ```bash theme={null} git clone https://github.com/facebookresearch/audiocraft.git cd audiocraft pip install -e . ``` ffmpeg (version less than 5.0.0) is **mandatory** for AAC augmentation during training: ```bash theme={null} # On Ubuntu/Debian sudo apt-get install ffmpeg # Or with Anaconda/Miniconda conda install "ffmpeg<5" -c conda-forge ``` Training will fail without ffmpeg, as AAC augmentation depends on it. Check that AudioCraft and ffmpeg are properly installed: ```bash theme={null} python -c "import audiocraft; print(audiocraft.__version__)" ffmpeg -version ``` ## Dataset Preparation AudioSeal requires datasets in AudioCraft's format. Here's how to prepare them: ### Using VoxPopuli (Paper Dataset) VoxPopuli is the dataset used in the AudioSeal paper: ```bash theme={null} # Download the VoxPopuli tools git clone https://github.com/facebookresearch/voxpopuli.git cd voxpopuli # Download and segment the raw audio python -m voxpopuli.download_audios --root [ROOT] --subset 400k python -m voxpopuli.get_unlabelled_data --root [ROOT] --subset 400k # Prepare the manifest with AudioCraft cd [PATH_TO_AUDIOCRAFT] python -m audiocraft.data.audio_dataset [ROOT] egs/voxpopuli/data.jsonl.gz ``` ### Dataset Configuration File Create a dataset configuration file at `[audiocraft_root]/configs/dset/audio/voxpopuli.yaml`: ```yaml voxpopuli.yaml theme={null} # @package __global__ datasource: max_sample_rate: 16000 max_channels: 1 train: egs/voxpopuli valid: egs/voxpopuli evaluate: egs/voxpopuli generate: egs/voxpopuli ``` ### Using Custom Datasets For your own dataset: Collect your audio files in a directory structure Use AudioCraft's data tool to create a manifest: ```bash theme={null} python -m audiocraft.data.audio_dataset \ /path/to/your/audio/files \ /path/to/output/manifest.jsonl.gz ``` Create a YAML config file in `configs/dset/audio/` with your dataset paths See the [AudioCraft dataset documentation](https://github.com/facebookresearch/audiocraft/blob/main/docs/DATASETS.md) for detailed instructions. ## Training with Dora AudioSeal uses [Dora](https://github.com/facebookresearch/dora) for experiment management and hyperparameter tuning. ### Basic Training Command Test the training pipeline locally: ```bash theme={null} # Navigate to AudioCraft directory cd [PATH_TO_AUDIOCRAFT] # Run training with example dataset dora run solver=watermark/robustness dset=audio/example # Run with VoxPopuli dora run solver=watermark/robustness dset=audio/voxpopuli ``` By default, checkpoints and experiment files are stored in `/tmp/audiocraft_$USER/outputs`. ### Custom Dora Configuration To customize output directories and run on a SLURM cluster, create a config file: ```yaml my_config.yaml theme={null} default: dora_dir: /path/to/your/dora/experiments partitions: global: your_slurm_partition team: your_slurm_partition reference_dir: /tmp darwin: # Mac-specific config for local testing dora_dir: /path/to/local/dora/experiments partitions: global: your_slurm_partition team: your_slurm_partition reference_dir: /path/to/reference ``` Run training with custom config: ```bash theme={null} AUDIOCRAFT_CONFIG=my_config.yaml dora run \ solver=watermark/robustness \ dset=audio/voxpopuli ``` ### Training Parameters Common parameters you can override: ```bash theme={null} # Train with specific number of bits dora run solver=watermark/robustness \ dset=audio/voxpopuli \ +dummy_watermarker.nbits=16 # Adjust model architecture dora run solver=watermark/robustness \ dset=audio/voxpopuli \ seanet.detector.output_dim=32 # Multi-GPU training dora run solver=watermark/robustness \ dset=audio/voxpopuli \ device=cuda \ ddp.world_size=4 ``` ### Running Training Grids For hyperparameter sweeps, use Dora grids: ```bash theme={null} # Reproduce the HuggingFace AudioSeal model (from ICML paper) AUDIOCRAFT_CONFIG=my_config.yaml \ AUDIOCRAFT_DSET=audio/voxpopuli \ dora grid watermarking.1315_kbits_seeds ``` This runs multiple experiments with different hyperparameter combinations. See the [AudioCraft watermarking grid](https://github.com/facebookresearch/audiocraft/blob/main/audiocraft/grids/watermarking/kbits.py) for details. ## Checkpoint Evaluation After training completes, evaluate your checkpoints: ### Locate Checkpoints Checkpoints are saved to: ``` [DORA_DIR]/xps/[HASH-ID]/checkpoint_XXX.th ``` The `HASH-ID` is shown in the output log when running `dora run`. ### Evaluate a Checkpoint ```bash theme={null} AUDIOCRAFT_CONFIG=my_config.yaml dora run \ solver=watermark/robustness \ execute_only=evaluate \ dset=audio/voxpopuli \ continue_from=/path/to/checkpoint_XXX.th \ +dummy_watermarker.nbits=16 \ seanet.detector.output_dim=32 ``` Evaluate with different `nbits` settings to find the best configuration for your use case. ## Converting Checkpoints for Inference Training checkpoints contain both the generator and detector. Extract them separately for use with AudioSeal API: ### Run Conversion Script ```bash theme={null} python [AUDIOSEAL_PATH]/src/scripts/checkpoints.py \ --checkpoint=/path/to/checkpoint_XXX.th \ --outdir=/path/to/output \ --suffix=my_model ``` This creates: * `generator_my_model.pth` * `detector_my_model.pth` ### Use Converted Checkpoints ```python theme={null} from audioseal import AudioSeal # Load your custom models model = AudioSeal.load_generator( "/path/to/output/generator_my_model.pth", nbits=16 ) detector = AudioSeal.load_detector( "/path/to/output/detector_my_model.pth", nbits=16 ) # Use as normal watermark = model.get_watermark(wav) result, message = detector.detect_watermark(watermarked_audio) ``` ## Training Configuration Key hyperparameters in the training configuration: ```yaml Model Architecture theme={null} # SEANet encoder/decoder configuration seanet: channels: 1 dimension: 128 n_filters: 32 n_residual_layers: 1 ratios: [8, 5, 4, 2] activation: ELU norm: weight_norm encoder: output_dim: 128 decoder: output_dim: 1 detector: output_dim: 32 # 2 + nbits ``` ```yaml Training Parameters theme={null} # Training configuration optim: optimizer: adam lr: 1e-4 beta1: 0.9 beta2: 0.999 weight_decay: 0.0 batch_size: 16 max_epochs: 100 gradient_clip_val: 1.0 ``` ```yaml Watermark Settings theme={null} # Watermark-specific settings dummy_watermarker: nbits: 16 # Number of bits for secret message # Augmentation settings audio_augmentation: - aac_compression - mp3_compression - pink_noise - echo - lowpass_filter ``` ## Troubleshooting ### Unsupported Formats Error (Linux) If you encounter `Unsupported formats` errors: ```bash theme={null} # Add your conda environment libs to LD_LIBRARY_PATH LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH \ AUDIOCRAFT_CONFIG=my_config.yaml \ dora run solver=watermark/robustness dset=audio/voxpopuli ``` ### ffmpeg Not Found Ensure ffmpeg is installed and accessible: ```bash theme={null} which ffmpeg # Should show path to ffmpeg ffmpeg -version # Should show version < 5.0.0 ``` ### Out of Memory Reduce batch size or sequence length: ```bash theme={null} dora run solver=watermark/robustness \ dset=audio/voxpopuli \ batch_size=8 \ max_segment_length=10.0 ``` ### SLURM Issues Verify partition names in your config: ```yaml theme={null} partitions: global: correct_partition_name team: correct_partition_name ``` ## Complete Training Workflow Here's the complete workflow from dataset to inference: ```bash theme={null} # Download and process VoxPopuli python -m voxpopuli.download_audios --root /data/voxpopuli --subset 400k python -m voxpopuli.get_unlabelled_data --root /data/voxpopuli --subset 400k # Create manifest python -m audiocraft.data.audio_dataset \ /data/voxpopuli \ /data/audiocraft/egs/voxpopuli/data.jsonl.gz ``` ```yaml theme={null} # configs/dset/audio/voxpopuli.yaml datasource: max_sample_rate: 16000 max_channels: 1 train: egs/voxpopuli valid: egs/voxpopuli evaluate: egs/voxpopuli generate: egs/voxpopuli ``` ```bash theme={null} AUDIOCRAFT_CONFIG=my_config.yaml dora run \ solver=watermark/robustness \ dset=audio/voxpopuli \ +dummy_watermarker.nbits=16 ``` ```bash theme={null} AUDIOCRAFT_CONFIG=my_config.yaml dora run \ solver=watermark/robustness \ execute_only=evaluate \ dset=audio/voxpopuli \ continue_from=[DORA_DIR]/xps/[HASH]/checkpoint_best.th ``` ```bash theme={null} python src/scripts/checkpoints.py \ --checkpoint=[DORA_DIR]/xps/[HASH]/checkpoint_best.th \ --outdir=./models \ --suffix=custom_16bit ``` ```python theme={null} from audioseal import AudioSeal model = AudioSeal.load_generator("./models/generator_custom_16bit.pth", nbits=16) detector = AudioSeal.load_detector("./models/detector_custom_16bit.pth", nbits=16) ``` ## Next Steps Learn how to evaluate your model's robustness against attacks Explore the complete API for model loading and usage