Cuda out of memory during a sweep, refactor your code to use process-based execution. Rewrite your code as a Python script and call the sweep agent from the CLI instead of the Python SDK.
-
Add your training logic to a Python script (for example,
train.py): -
Reference the script in your YAML sweep configuration:
-
Initialize the sweep with the CLI:
-
Start the sweep agent with the CLI, replacing
sweep_IDwith the ID returned in the previous step:
wandb agent) instead of the Python SDK (wandb.agent) ensures each run is a separate process with its own memory allocation, preventing CUDA memory from accumulating across runs.
For more information, see Sweeps troubleshooting.
Sweeps Run Crashes