-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
The readme says:
The model was not fine-tuned on a specific voice. Hence, you will get different voices every time you run the model. You can keep speaker consistency by either adding an audio prompt, or fixing the seed.
So one would think that by fixing the seed we would get the same voice each time, but this is demonstrably false at least with cpu and mps. Even with an expanded seed setting for modern devices like:
def set_seed(seed: int):
"""Sets the random seed for reproducibility."""
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.use_deterministic_algorithms(True)
if hasattr(torch.backends, "cuda") and torch.cuda.is_available():
torch.Generator(device=torch.device("cuda")).manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
# Ensure deterministic behavior for cuDNN (if used)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
if os.getenv("CUBLAS_WORKSPACE_CONFIG") is None:
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
torch.Generator(device=torch.device("mps")).manual_seed(seed)
torch.mps.manual_seed(seed)
# Enable deterministic behavior for MPS
os.environ["PYTORCH_MPS_DETERMINISTIC"] = "1"
else:
torch.Generator(device=torch.device("cpu")).manual_seed(seed)This does not work when run each time across multiple calls to generate() on the same instantiated Dia.from_pretrained (this is using the nari-labs/Dia-1.6B-0626 model).
specific use can be seen in https://github.com/robbiemu/bomdia in the src/components/audio_generator/tts.py file.
I realize that I can generate from a list (and I'm about to see if making that is an improvement) instead of separate calls to generate per "chunk" of the transcript to generate, but I was also logging the seed for reproducibility/resumability and that doesn't seem possible.