Skip to main content
If you see Cuda out of memory during a sweep, refactor your code to use process-based execution. Rewrite your code as a Python script and call the sweep agent from the CLI instead of the Python SDK.
  1. Add your training logic to a Python script (for example, train.py):
    if __name__ == "__main__":
        train()
    
  2. Reference the script in your YAML sweep configuration:
    program: train.py
    method: bayes
    metric:
      name: validation_loss
      goal: maximize
    parameters:
      learning_rate:
        min: 0.0001
        max: 0.1
      optimizer:
        values: ["adam", "sgd"]
    
  3. Initialize the sweep with the CLI:
    wandb sweep config.yaml
    
  4. Start the sweep agent with the CLI, replacing sweep_ID with the ID returned in the previous step:
    wandb agent sweep_ID
    
Using the CLI-based agent (wandb agent) instead of the Python SDK (wandb.agent) ensures each run is a separate process with its own memory allocation, preventing CUDA memory from accumulating across runs. For more information, see Sweeps troubleshooting.
Sweeps Run Crashes