10 Oct 12:26

f8efe0a

4.0.1 Latest

Latest

Version 4.0.1

Assets 6

29 Sep 12:04

github-actions

4.0.0

a1ed3bb

pyannote.audio 4.0

Version 4.0.0

TL;DR

Improved speaker assignment and counting

pyannote/speaker-diarization-community-1 pretrained pipeline relies on VBx clustering instead of agglomerative hierarchical clustering (as suggested by BUT Speech@FIT researchers Petr Pálka and Jiangyu Han).

Exclusive speaker diarization

pyannote/speaker-diarization-community-1 pretrained pipeline returns a new exclusive speaker diarization, on top of the regular speaker diarization.
This is a feature which is backported from our latest commercial model that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
output = pipeline("/path/to/conversation.wav")
print(output.speaker_diarization)            # regular speaker diarization
print(output.exclusive_speaker_diarization)  # exclusive speaker diarization

Faster training

Metadata caching and optimized dataloaders make training on large scale datasets much faster.
This led to a 15x speed up on pyannoteAI internal large scale training.

pyannoteAI premium speaker diarization

Change one line of code to use pyannoteAI premium models and enjoy more accurate speaker diarization.

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
-    "pyannote/speaker-diarization-community-1", token="huggingface-access-token")
+    "pyannote/speaker-diarization-precision-2, token="pyannoteAI-api-key")
diarization = pipeline("/path/to/conversation.wav")

Offline (air-gapped) use

Pipelines can now be stored alongside their internal models in the same repository, streamlining fully offline use.

Accept pyannote/speaker-diarization-community-1 pipeline user agreement

Clone the pipeline repository from Huggingface (if prompted for a password, use a Huggingface access token with correct permissions)

$ git lfs install
$ git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1

Enjoy!

# load pipeline from disk (works without internet connection)
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')

# run the pipeline locally on your computer
diarization = pipeline("audio.wav")

Telemetry

With the optional telemetry feature in pyannote.audio, you can choose to send anonymous usage metrics to help the pyannote team improve the library.

Breaking changes

BREAKING(io): remove support for sox and soundfile audio I/O backends (only ffmpeg or in-memory audio is supported)
BREAKING(setup): drop support to Python < 3.10
BREAKING(hub): rename use_auth_token to token
BREAKING(hub): drop support for {pipeline_name}@{revision} syntax in Model.from_pretrained(...) and Pipeline.from_pretrained(...) -- use new revision keyword argument instead
BREAKING(task): remove OverlappedSpeechDetection task (part of SpeakerDiarization task)
BREAKING(pipeline): remove OverlappedSpeechDetection and Resegmentation unmaintained pipelines (part of SpeakerDiarization)
BREAKING(cache): rely on huggingface_hub caching directory (PYANNOTE_CACHE is no longer used)
BREAKING(inference): Inference now only supports already instantiated models
BREAKING(task): drop support for multilabel training in SpeakerDiarization task
BREAKING(task): drop support for warm_up option in SpeakerDiarization task
BREAKING(task): drop support for weigh_by_cardinality option in SpeakerDiarization task
BREAKING(task): drop support for vad_loss option in SpeakerDiarization task
BREAKING(chore): switch to native namespace package
BREAKING(cli): remove deprecated pyannote-audio-train CLI

New features

feat(io): switch from torchaudio to torchcodec for audio I/O
feat(pipeline): add support for VBx clustering (@Selesnyan and jyhan03)
feat(pyannoteAI): add wrapper around pyannoteAI SDK
improve(hub): add support for pipeline repos that also include underlying models
feat(clustering): add support for k-means clustering
feat(model): add wav2vec_frozen option to freeze/unfreeze wav2vec in SSeRiouSS architecture
feat(task): add support for manual optimization in SpeakerDiarization task
feat(utils): add hidden option to ProgressHook
feat(utils): add FilterByNumberOfSpeakers protocol files filter
feat(core): add Calibration class to calibrate logits/distances into probabilities
feat(metric): add DetectionErrorRate, SegmentationErrorRate, DiarizationPrecision, and DiarizationRecall metrics
feat(cli): add CLI to download, apply, benchmark, and optimize pipelines
feat(cli): add CLI to strip checkpoints to their bare inference minimum

Improvements

improve(model): improve WavLM (un)freezing support for SSeRiouSS architecture (@clement-pages)
improve(task): improve SpeakerDiarization training with manual optimization (@clement-pages)
improve(train): speed up dataloaders
improve(setup): switch to uv
improve(setup): switch to lightning from pytorch-lightning
improve(utils): improve dependency check when loading pretrained models and/or pipeline
improve(utils): add option to skip dependency check
improve(utils): add option to load a pretrained model checkpoint from an io.BytesIO buffer
improve(pipeline): add option to load a pretrained pipeline from a dict (@benniekiss)

Fixes

fix(model): improve WavLM (un)freezing support for ToTaToNet architecture (@clement-pages)
fix(separation): fix clipping issue in speech separation pipeline (@joonaskalda)
fix(separation): fix alignment between separated sources and diarization (@Lebourdais and @clement-pages)
fix(separation): prevent leakage removal collar from being applied to diarization (@clement-pages)
fix(separation): fix PixIT training with manual optimization (@clement-pages)
fix(doc): fix link to pytorch (@emmanuel-ferdman)
fix(task): fix corner case with small (<9) number of validation samples (@antoinelaurent)
fix(doc): fix default embedding in SpeechSeparation and SpeakerDiarization docstring (@razi-tm).

Assets 6

09 Sep 07:11

hbredin

3.4.0

853b2ab

Version 3.4.0

Maintenance release

Upcoming major releases of pyannote.{core,database,metrics,pipeline} dependencies will break 3.x branch.
Version 3.4.0 pins those dependencies to compatible versions.

Assets 2

0 Join discussion

23 Jun 00:30

hbredin

3.3.1

4dd55a5

Version 3.3.1

Breaking changes

setup: drop support for Python 3.8

Fixes

fix: fix support for numpy==2.x (@ibevers)
fix: fix support for speechbrain==1.x (@Adel-Moumen)

Assets 2

14 Jun 08:41

hbredin

3.3.0

adaf770

Version 3.3.0

TL;DR

pyannote.audio does speech separation: multi-speaker audio in, one audio channel per speaker out!

pip install pyannote.audio[separation]==3.3.0

New features

feat(task): add PixIT joint speaker diarization and speech separation task (with @joonaskalda)
feat(model): add ToTaToNet joint speaker diarization and speech separation model (with @joonaskalda)
feat(pipeline): add SpeechSeparation pipeline (with @joonaskalda)
feat(io): add option to select torchaudio backend

Fixes

fix(task): fix wrong train/development split when training with (some) meta-protocols (#1709)
fix(task): fix metadata preparation with missing validation subset (@clement-pages)

Improvements

improve(io): when available, default to using soundfile backend
improve(pipeline): do not extract embeddings when max_speakers is set to 1
improve(pipeline): optimize memory usage of most pipelines (#1713 by @benniekiss)

Assets 2

08 May 09:51

hbredin

3.2.0

70a8507

Version 3.2.0

New features

feat(task): add option to cache task training metadata to speed up training (with @clement-pages)
feat(model): add receptive_field, num_frames and dimension to models (with @Bilal-Rahou)
feat(model): add fbank_only property to WeSpeaker models
feat(util): add Powerset.permutation_mapping to help with permutation in powerset space (with @FrenchKrab)
feat(sample): add sample file at pyannote.audio.sample.SAMPLE_FILE
feat(metric): add reduce option to diarization_error_rate metric (with @Bilal-Rahou)
feat(pipeline): add Waveform and SampleRate preprocessors

Fixes

fix(task): fix random generators and their reproducibility (with @FrenchKrab)
fix(task): fix estimation of training set size (with @FrenchKrab)
fix(hook): fix torch.Tensor support in ArtifactHook
fix(doc): fix typo in Powerset docstring (with @lukasstorck)

Improvements

improve(metric): add support for number of speakers mismatch in diarization_error_rate metric
improve(pipeline): track both Model and nn.Module attributes in Pipeline.to(device)
improve(io): switch to torchaudio >= 2.2.0
improve(doc): update tutorials (with @clement-pages)

Breaking changes

BREAKING(model): get rid of Model.example_output in favor of num_frames method, receptive_field property, and dimension property
BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)

Community contributions

community: add tutorial for offline use of pyannote/speaker-diarization-3.1 (by @simonottenhauskenbun)

Assets 2

01 Dec 13:26

hbredin

3.1.1

6a972c0

Version 3.1.1

TL;DR

Providing num_speakers to pyannote/speaker-diarization-3.1 now works as expected.

Full changelog

Fixes

fix(pipeline): fix support for setting num_speakers in pyannote/speaker-diarization-3.1 pipeline

Assets 2

16 Nov 12:37

hbredin

3.1.0

f45da71

Version 3.1.0

TL;DR

pyannote/speaker-diarization-3.1 no longer requires unpopular ONNX runtime

Full changelog

New features

feat(model): add WeSpeaker embedding wrapper based on PyTorch
feat(model): add support for multi-speaker statistics pooling
feat(pipeline): add TimingHook for profiling processing time
feat(pipeline): add ArtifactHook for saving internal steps
feat(pipeline): add support for list of hooks with Hooks
feat(utils): add "soft" option to Powerset.to_multilabel

Fixes

fix(pipeline): add missing "embedding" hook call in SpeakerDiarization
fix(pipeline): fix AgglomerativeClustering to honor num_clusters when provided
fix(pipeline): fix frame-wise speaker count exceeding max_speakers or detected num_speakers in SpeakerDiarization pipeline

Improvements

improve(pipeline): compute fbank on GPU when requested

Breaking changes

BREAKING(pipeline): rename WeSpeakerPretrainedSpeakerEmbedding to ONNXWeSpeakerPretrainedSpeakerEmbedding
BREAKING(setup): remove onnxruntime dependency.
You can still use ONNX hbredin/wespeaker-voxceleb-resnet34-LM but you will have to install onnxruntime yourself.
BREAKING(pipeline): remove logging_hook (use ArtifactHook instead)
BREAKING(pipeline): remove onset and offset parameter in SpeakerDiarizationMixin.speaker_count
You should now binarize segmentations before passing them to speaker_count

Assets 2

28 Sep 19:47

hbredin

3.0.1

28fcf50

Version 3.0.1

TL;DR

pyannote/speaker-diarization-3.0 is now much faster when sent to GPU.

import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0")
pipeline.to(torch.device("cuda"))

Full changelog

Fixes and improvements

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support

Dependencies update

setup: switch from onnxruntime to onnxruntime-gpu

Assets 2

0 Join discussion

26 Sep 13:00

hbredin

3.0.0

795b92a

Version 3.0.0

TL;DR

Better pretrained pipeline and model

Much better overlapping speech detection with powerset pyannote/segmentation-3.0
Much better speaker diarization performance with pyannote/speaker-diarization-3.0

Benchmark (DER %)	v2.1	v3.0
AISHELL-4	14.1	12.3
AliMeeting (channel 1)	27.4	24.3
AMI (IHM)	18.9	19.0
AMI (SDM)	27.1	22.2
AVA-AVD	-	49.1
DIHARD 3 (full)	26.9	21.7
MSDWild	-	24.6
REPERE (phase2)	8.2	7.8
VoxConverse (v0.3)	11.2	11.3

Major breaking changes

BREAKING: pipelines now run on CPU by default
Use pipeline.to(torch.device('cuda')) to use GPU
BREAKING: removed SpeakerSegmentation pipeline
Use SpeakerDiarization pipeline instead
BREAKING: removed support for prodi.gy recipes

Full changelog

Features and improvements

feat(pipeline): send pipeline to device with pipeline.to(device)
feat(pipeline): add return_embeddings option to SpeakerDiarization pipeline
feat(pipeline): make segmentation_batch_size and embedding_batch_size mutable in SpeakerDiarization pipeline (they now default to 1)
feat(pipeline): add progress hook to pipelines
feat(task): add powerset support to SpeakerDiarization task
feat(task): add support for multi-task models
feat(task): add support for label scope in speaker diarization task
feat(task): add support for missing classes in multi-label segmentation task
feat(model): add segmentation model based on torchaudio self-supervised representation
feat(pipeline): check version compatibility at load time
improve(task): load metadata as tensors rather than pyannote.core instances
improve(task): improve error message on missing specifications

Breaking changes

BREAKING(task): rename Segmentation task to SpeakerDiarization
BREAKING(pipeline): pipeline defaults to CPU (use pipeline.to(device))
BREAKING(pipeline): remove SpeakerSegmentation pipeline (use SpeakerDiarization pipeline)
BREAKING(pipeline): remove segmentation_duration parameter from SpeakerDiarization pipeline (defaults to duration of segmentation model)
BREAKING(task): remove support for variable chunk duration for segmentation tasks
BREAKING(pipeline): remove support for FINCHClustering and HiddenMarkovModelClustering
BREAKING(setup): drop support for Python 3.7
BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how pyannote.audio.core.io.Audio is instantiated:
- replace Audio() by Audio(mono="downmix");
- replace Audio(mono=True) by Audio(mono="downmix");
- replace Audio(mono=False) by Audio().
BREAKING(model): get rid of (flaky) Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on Model.example_output.
BREAKING(interactive): remove support for Prodigy recipes

Fixes and improvements

fix(pipeline): fix reproducibility issue with Ampere CUDA devices
fix(pipeline): fix support for IOBase audio
fix(pipeline): fix corner case with no speaker
fix(train): prevent metadata preparation to happen twice
fix(task): fix support for "balance" option
improve(task): shorten and improve structure of Tensorboard tags

Dependencies update

setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
setup: switch to speechbrain 0.5.14+

Assets 2

0 Join discussion

Releases: pyannote/pyannote-audio

4.0.1

Uh oh!

pyannote.audio 4.0

Version 4.0.0

TL;DR

Improved speaker assignment and counting

Exclusive speaker diarization

Faster training

pyannoteAI premium speaker diarization

Offline (air-gapped) use

Telemetry

Breaking changes

New features

Improvements

Fixes

Uh oh!

Version 3.4.0

Maintenance release

Uh oh!

Version 3.3.1

Breaking changes

Fixes

Uh oh!

Version 3.3.0

TL;DR

New features

Fixes

Improvements

Uh oh!

Version 3.2.0

New features

Fixes

Improvements

Breaking changes

Community contributions

Uh oh!

Version 3.1.1

TL;DR

Full changelog

Fixes

Uh oh!

Version 3.1.0

TL;DR

Full changelog

New features

Fixes

Improvements

Breaking changes

Uh oh!

Version 3.0.1

TL;DR

Full changelog

Fixes and improvements

Dependencies update

Uh oh!

Version 3.0.0

TL;DR

Better pretrained pipeline and model

Major breaking changes

Full changelog

Features and improvements

Breaking changes

Fixes and improvements

Dependencies update

Uh oh!