Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition

This is a repository of guided neural fast full-rank spatial covariance analysis (guided neural FastFCA).

Installation

pip install git+https://github.com/b-sigpro/neural-gfca.git

Inference

Pre-trained models are available at the release page. One utterance in a mixture recording [src_file].wav can be extracted to [dst_file].wav the following command.

python -m neural_gfca.separate one ./neural-gfca.16ch-qini-nsfsim.Ns=6/ [src_file].wav [dst_file].wav --target --n_mic=16 --drop_context --normalize=exceed --use_mvdr

The script automatically reads [src_file].info, which must be a Python pickle file of a dictionary with the following format:

{
    "act": np.ndarray([T, N]),  # binary activations of N speakers, the 1st speaker (n=0) is the target.
    "start": int,  # start time sample of the target,
    "end": int,  # end time sample of the target,
}

If you have out of memory issue, you can use the following option:

task.encoder.diagonalizer._target_=neural_gfca.diagonalizers.iss_nrmxt_zhang3_cnt_fblk_diagonalizer.ISSDiagonalizer

This option diagonalizes the mixture for each block of frequency bins, which will takes less memory but more computational time.

Reference

@inproceedings{bando2025investigation,
  title={Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition},
  author={Yoshiaki Bando and Samuele Cornell and Satoru Fukayama and Shinji Watanabe},
  booktitle={IEEE ICASSP 2025},
  year={2025}
}

Acknowledgement

This work is based on results obtained from a project, Programs for Bridging the gap between R&D and the IDeal society (society 5.0) and Generating Economic and social value (BRIDGE)/Practical Global Research in the AI × Robotics Services, implemented by the Cabinet Office, Government of Japan.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
neural_gfca		neural_gfca
recipes/chime8/scripts		recipes/chime8/scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition

Installation

Inference

Reference

Acknowledgement

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

b-sigpro/neural-gfca

Folders and files

Latest commit

History

Repository files navigation

Investigation of Spatial Self-Supervised Learning and Its Application to Target Speaker Speech Recognition

Installation

Inference

Reference

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages