Skip to content

Releases: wenet-e2e/wespeaker

WeSpeaker 1.2.0

23 Jul 11:15
820acb4
Compare
Choose a tag to compare

What's Changed

  • Add a recipe for the validation set of VoxSRC-23's diarization track by @xx205 in #166
  • Support the SphereFace2 loss function by @Hunterhuan in #173
  • Support the Self-Supervised Learning (SSL) recipe on Voxceleb dataset, including DINO, MoCo, and SimCLR, by @czy97 and @Hunterhuan in #180
  • Support the NIST SRE16 recipe by @czy97 in #177
  • Support the kaldi-compatible PLDA and unsupervised adaptation by @wsstriving in #186

WeSpeaker 1.1.0

23 May 10:12
aff33ff
Compare
Choose a tag to compare

What's Changed

WeSpeaker 1.0.0

12 Nov 10:59
296449a
Compare
Choose a tag to compare

Highlight

  • Competitive results: compared with SpeechBrain, ASV-Subtools, etc
  • Light-weight: clean and simple codes, no Kaldi dependency
  • Unified IO (UIO): designed for large-scale training data
  • On-the-fly feature preparation: provide different data augmentation methods
  • Distributed training: adopted for multi-node multi-GPU scalability
  • Production ready: support TensorRT or ONNX exporting format, with a triton inference server demo
  • Pre-trained models: provide the python bindings, and a Hugging face interactive demo on speaker verification

Overall Structure

Recipes

We provide three well-structured recipes:

  • Speaker Verification: VoxCeleb an CNCeleb (SOTA results)
  • Speaker Diarization: VoxConverse (An example of using pre-trained speaker model)

Support List

  • SOTA Models: TDNN-based x-vector, ResNet-based r-vector, and ECAPA_TDNN
  • Pooling Functions: statistics-based TAP/TSDP/TSTP, and attention-based ASTP
  • Criteria: standard Softmax, and margin-based A-/AM-/AAM-Softmax
  • Scoring: Cosine, PLDA, and Score Normalization (AS-Norm)
  • Metric: EER, minDCF (DET curve), and DER
  • Online Augmentation: Resample, Noise && RIR, Speed Perturb, and SpecAug
  • Training strategies: Well-designed learning-rate and margin schedulers, Large margin fine-tuning