Skip to main content

How can I fix an error like `AttributeError: module 'wandb' has no attribute ...`?

If you encounter an error like AttributeError: module ‘wandb’ has no attribute ‘init’ or AttributeError: module ‘wandb’ …

How do I fix `Cuda out of memory` during a sweep?

If you see Cuda out of memory during a sweep, refactor your code to use process-based execution. Rewrite your code as a …

How do I kill a job with wandb?

Press Ctrl+D on the keyboard to stop a script instrumented with W&B.

How do I resolve a run initialization timeout error in wandb?

To resolve a run initialization timeout error, follow these steps: Retry initialization: Attempt to restart the run. Che …

If wandb crashes, will it possibly crash my training run?

It is critical to avoid interference with training runs. W&B operates in a separate process, ensuring that training cont …

InitStartError: Error communicating with wandb process

This error indicates that the library encounters an issue launching the process that synchronizes data to the server. Th …

My run's state is `crashed` on the UI but is still running on my machine. What do I do to get my data back?

You likely lost connection to your machine during training. Recover data by running wandb sync PATH_TO_RUN (/models/ref/ …

Why does my process hang when using Hydra with W&B?

If your process hangs when started with Hydra, this is likely caused by a multiprocessing conflict between Hydra and W&B …

Why does my training hang with distributed training?

There are two common reasons training hangs when using W&B with distributed training: 1. Hanging at the beginning of tra …

Why is a run marked crashed in W&B when it’s training fine locally?

This indicates a connection problem. If the server loses internet access and data stops syncing to W&B, the system marks …