Releases: pyannote/pyannote-audio
4.0.1
pyannote.audio 4.0
Version 4.0.0
TL;DR
Improved speaker assignment and counting
pyannote/speaker-diarization-community-1 pretrained pipeline relies on VBx clustering instead of agglomerative hierarchical clustering (as suggested by BUT Speech@FIT researchers Petr Pálka and Jiangyu Han).
Exclusive speaker diarization
pyannote/speaker-diarization-community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization.
This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1", token="huggingface-access-token")
output = pipeline("/path/to/conversation.wav")
print(output.speaker_diarization) # regular speaker diarization
print(output.exclusive_speaker_diarization) # exclusive speaker diarizationFaster training
Metadata caching and optimized dataloaders make training on large scale datasets much faster.
This led to a 15x speed up on pyannoteAI internal large scale training.
pyannoteAI premium speaker diarization
Change one line of code to use pyannoteAI premium models and enjoy more accurate speaker diarization.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
- "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
+ "pyannote/speaker-diarization-precision-2, token="pyannoteAI-api-key")
diarization = pipeline("/path/to/conversation.wav")Offline (air-gapped) use
Pipelines can now be stored alongside their internal models in the same repository, streamlining fully offline use.
-
Accept
pyannote/speaker-diarization-community-1pipeline user agreement -
Clone the pipeline repository from Huggingface (if prompted for a password, use a Huggingface access token with correct permissions)
$ git lfs install $ git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1
-
Enjoy!
# load pipeline from disk (works without internet connection) from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1') # run the pipeline locally on your computer diarization = pipeline("audio.wav")
Telemetry
With the optional telemetry feature in pyannote.audio, you can choose to send anonymous usage metrics to help the pyannote team improve the library.
Breaking changes
- BREAKING(io): remove support for
soxandsoundfileaudio I/O backends (onlyffmpegor in-memory audio is supported) - BREAKING(setup): drop support to
Python< 3.10 - BREAKING(hub): rename
use_auth_tokentotoken - BREAKING(hub): drop support for
{pipeline_name}@{revision}syntax inModel.from_pretrained(...)andPipeline.from_pretrained(...)-- use newrevisionkeyword argument instead - BREAKING(task): remove
OverlappedSpeechDetectiontask (part ofSpeakerDiarizationtask) - BREAKING(pipeline): remove
OverlappedSpeechDetectionandResegmentationunmaintained pipelines (part ofSpeakerDiarization) - BREAKING(cache): rely on
huggingface_hubcaching directory (PYANNOTE_CACHEis no longer used) - BREAKING(inference):
Inferencenow only supports already instantiated models - BREAKING(task): drop support for
multilabeltraining inSpeakerDiarizationtask - BREAKING(task): drop support for
warm_upoption inSpeakerDiarizationtask - BREAKING(task): drop support for
weigh_by_cardinalityoption inSpeakerDiarizationtask - BREAKING(task): drop support for
vad_lossoption inSpeakerDiarizationtask - BREAKING(chore): switch to native namespace package
- BREAKING(cli): remove deprecated
pyannote-audio-trainCLI
New features
- feat(io): switch from
torchaudiototorchcodecfor audio I/O - feat(pipeline): add support for VBx clustering (@Selesnyan and jyhan03)
- feat(pyannoteAI): add wrapper around pyannoteAI SDK
- improve(hub): add support for pipeline repos that also include underlying models
- feat(clustering): add support for
k-meansclustering - feat(model): add
wav2vec_frozenoption to freeze/unfreezewav2vecinSSeRiouSSarchitecture - feat(task): add support for manual optimization in
SpeakerDiarizationtask - feat(utils): add
hiddenoption toProgressHook - feat(utils): add
FilterByNumberOfSpeakersprotocol files filter - feat(core): add
Calibrationclass to calibrate logits/distances into probabilities - feat(metric): add
DetectionErrorRate,SegmentationErrorRate,DiarizationPrecision, andDiarizationRecallmetrics - feat(cli): add CLI to download, apply, benchmark, and optimize pipelines
- feat(cli): add CLI to strip checkpoints to their bare inference minimum
Improvements
- improve(model): improve WavLM (un)freezing support for
SSeRiouSSarchitecture (@clement-pages) - improve(task): improve
SpeakerDiarizationtraining with manual optimization (@clement-pages) - improve(train): speed up dataloaders
- improve(setup): switch to
uv - improve(setup): switch to
lightningfrompytorch-lightning - improve(utils): improve dependency check when loading pretrained models and/or pipeline
- improve(utils): add option to skip dependency check
- improve(utils): add option to load a pretrained model checkpoint from an
io.BytesIObuffer - improve(pipeline): add option to load a pretrained pipeline from a
dict(@benniekiss)
Fixes
- fix(model): improve WavLM (un)freezing support for
ToTaToNetarchitecture (@clement-pages) - fix(separation): fix clipping issue in speech separation pipeline (@joonaskalda)
- fix(separation): fix alignment between separated sources and diarization (@Lebourdais and @clement-pages)
- fix(separation): prevent leakage removal collar from being applied to diarization (@clement-pages)
- fix(separation): fix
PixITtraining with manual optimization (@clement-pages) - fix(doc): fix link to pytorch (@emmanuel-ferdman)
- fix(task): fix corner case with small (<9) number of validation samples (@antoinelaurent)
- fix(doc): fix default embedding in
SpeechSeparationandSpeakerDiarizationdocstring (@razi-tm).
Version 3.4.0
Maintenance release
Upcoming major releases of pyannote.{core,database,metrics,pipeline} dependencies will break 3.x branch.
Version 3.4.0 pins those dependencies to compatible versions.
Version 3.3.1
Breaking changes
- setup: drop support for Python 3.8
Fixes
- fix: fix support for
numpy==2.x(@ibevers) - fix: fix support for
speechbrain==1.x(@Adel-Moumen)
Version 3.3.0
TL;DR
pyannote.audio does speech separation: multi-speaker audio in, one audio channel per speaker out!
pip install pyannote.audio[separation]==3.3.0New features
- feat(task): add
PixITjoint speaker diarization and speech separation task (with @joonaskalda) - feat(model): add
ToTaToNetjoint speaker diarization and speech separation model (with @joonaskalda) - feat(pipeline): add
SpeechSeparationpipeline (with @joonaskalda) - feat(io): add option to select torchaudio
backend
Fixes
- fix(task): fix wrong train/development split when training with (some) meta-protocols (#1709)
- fix(task): fix metadata preparation with missing validation subset (@clement-pages)
Improvements
- improve(io): when available, default to using
soundfilebackend - improve(pipeline): do not extract embeddings when
max_speakersis set to 1 - improve(pipeline): optimize memory usage of most pipelines (#1713 by @benniekiss)
Version 3.2.0
New features
- feat(task): add option to cache task training metadata to speed up training (with @clement-pages)
- feat(model): add
receptive_field,num_framesanddimensionto models (with @Bilal-Rahou) - feat(model): add
fbank_onlyproperty toWeSpeakermodels - feat(util): add
Powerset.permutation_mappingto help with permutation in powerset space (with @FrenchKrab) - feat(sample): add sample file at
pyannote.audio.sample.SAMPLE_FILE - feat(metric): add
reduceoption todiarization_error_ratemetric (with @Bilal-Rahou) - feat(pipeline): add
WaveformandSampleRatepreprocessors
Fixes
- fix(task): fix random generators and their reproducibility (with @FrenchKrab)
- fix(task): fix estimation of training set size (with @FrenchKrab)
- fix(hook): fix
torch.Tensorsupport inArtifactHook - fix(doc): fix typo in
Powersetdocstring (with @lukasstorck)
Improvements
- improve(metric): add support for number of speakers mismatch in
diarization_error_ratemetric - improve(pipeline): track both
Modelandnn.Moduleattributes inPipeline.to(device) - improve(io): switch to
torchaudio >= 2.2.0 - improve(doc): update tutorials (with @clement-pages)
Breaking changes
- BREAKING(model): get rid of
Model.example_outputin favor ofnum_framesmethod,receptive_fieldproperty, anddimensionproperty - BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)
Community contributions
- community: add tutorial for offline use of
pyannote/speaker-diarization-3.1(by @simonottenhauskenbun)
Version 3.1.1
TL;DR
Providing num_speakers to pyannote/speaker-diarization-3.1 now works as expected.
Full changelog
Fixes
- fix(pipeline): fix support for setting
num_speakersinpyannote/speaker-diarization-3.1pipeline
Version 3.1.0
TL;DR
pyannote/speaker-diarization-3.1 no longer requires unpopular ONNX runtime
Full changelog
New features
- feat(model): add WeSpeaker embedding wrapper based on PyTorch
- feat(model): add support for multi-speaker statistics pooling
- feat(pipeline): add
TimingHookfor profiling processing time - feat(pipeline): add
ArtifactHookfor saving internal steps - feat(pipeline): add support for list of hooks with
Hooks - feat(utils): add
"soft"option toPowerset.to_multilabel
Fixes
- fix(pipeline): add missing "embedding" hook call in
SpeakerDiarization - fix(pipeline): fix
AgglomerativeClusteringto honornum_clusterswhen provided - fix(pipeline): fix frame-wise speaker count exceeding
max_speakersor detectednum_speakersinSpeakerDiarizationpipeline
Improvements
- improve(pipeline): compute
fbankon GPU when requested
Breaking changes
- BREAKING(pipeline): rename
WeSpeakerPretrainedSpeakerEmbeddingtoONNXWeSpeakerPretrainedSpeakerEmbedding - BREAKING(setup): remove
onnxruntimedependency.
You can still use ONNXhbredin/wespeaker-voxceleb-resnet34-LMbut you will have to installonnxruntimeyourself. - BREAKING(pipeline): remove
logging_hook(useArtifactHookinstead) - BREAKING(pipeline): remove
onsetandoffsetparameter inSpeakerDiarizationMixin.speaker_count
You should now binarize segmentations before passing them tospeaker_count
Version 3.0.1
TL;DR
pyannote/speaker-diarization-3.0 is now much faster when sent to GPU.
import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0")
pipeline.to(torch.device("cuda"))Full changelog
Fixes and improvements
- fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support
Dependencies update
- setup: switch from
onnxruntimetoonnxruntime-gpu
Version 3.0.0
TL;DR
Better pretrained pipeline and model
- Much better overlapping speech detection with powerset pyannote/segmentation-3.0
- Much better speaker diarization performance with pyannote/speaker-diarization-3.0
| Benchmark (DER %) | v2.1 | v3.0 |
|---|---|---|
| AISHELL-4 | 14.1 | 12.3 |
| AliMeeting (channel 1) | 27.4 | 24.3 |
| AMI (IHM) | 18.9 | 19.0 |
| AMI (SDM) | 27.1 | 22.2 |
| AVA-AVD | - | 49.1 |
| DIHARD 3 (full) | 26.9 | 21.7 |
| MSDWild | - | 24.6 |
| REPERE (phase2) | 8.2 | 7.8 |
| VoxConverse (v0.3) | 11.2 | 11.3 |
Major breaking changes
- BREAKING: pipelines now run on CPU by default
Usepipeline.to(torch.device('cuda'))to use GPU - BREAKING: removed
SpeakerSegmentationpipeline
UseSpeakerDiarizationpipeline instead - BREAKING: removed support for
prodi.gyrecipes
Full changelog
Features and improvements
- feat(pipeline): send pipeline to device with
pipeline.to(device) - feat(pipeline): add
return_embeddingsoption toSpeakerDiarizationpipeline - feat(pipeline): make
segmentation_batch_sizeandembedding_batch_sizemutable inSpeakerDiarizationpipeline (they now default to1) - feat(pipeline): add progress hook to pipelines
- feat(task): add powerset support to
SpeakerDiarizationtask - feat(task): add support for multi-task models
- feat(task): add support for label scope in speaker diarization task
- feat(task): add support for missing classes in multi-label segmentation task
- feat(model): add segmentation model based on torchaudio self-supervised representation
- feat(pipeline): check version compatibility at load time
- improve(task): load metadata as tensors rather than pyannote.core instances
- improve(task): improve error message on missing specifications
Breaking changes
- BREAKING(task): rename
Segmentationtask toSpeakerDiarization - BREAKING(pipeline): pipeline defaults to CPU (use
pipeline.to(device)) - BREAKING(pipeline): remove
SpeakerSegmentationpipeline (useSpeakerDiarizationpipeline) - BREAKING(pipeline): remove
segmentation_durationparameter fromSpeakerDiarizationpipeline (defaults todurationof segmentation model) - BREAKING(task): remove support for variable chunk duration for segmentation tasks
- BREAKING(pipeline): remove support for
FINCHClusteringandHiddenMarkovModelClustering - BREAKING(setup): drop support for Python 3.7
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update howpyannote.audio.core.io.Audiois instantiated:- replace
Audio()byAudio(mono="downmix"); - replace
Audio(mono=True)byAudio(mono="downmix"); - replace
Audio(mono=False)byAudio().
- replace
- BREAKING(model): get rid of (flaky)
Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely onModel.example_output. - BREAKING(interactive): remove support for Prodigy recipes
Fixes and improvements
- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
- fix(pipeline): fix support for IOBase audio
- fix(pipeline): fix corner case with no speaker
- fix(train): prevent metadata preparation to happen twice
- fix(task): fix support for "balance" option
- improve(task): shorten and improve structure of Tensorboard tags
Dependencies update
- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
- setup: switch to speechbrain 0.5.14+