Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new task StreamingSpeakerDiarization #1544

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4578a3d
using the newest version of segmentation and diarization
Bilal-Rahou Oct 6, 2023
1760cf5
intro tutorial with pyannote version 3.0
Bilal-Rahou Oct 6, 2023
c23e7c0
setting pyannote import to version 3.0.1
Bilal-Rahou Oct 6, 2023
8f700ac
Using GPU in intro.ipynb when available
Bilal-Rahou Oct 10, 2023
1f19793
Using GPU in intro.ipynb when available
Bilal-Rahou Oct 10, 2023
81a0b48
Using GPU in intro.ipynb when available
Bilal-Rahou Oct 10, 2023
d36beac
Merge branch 'pyannote:develop' into develop
Bilal-Rahou Oct 20, 2023
191535b
Copying the content of SpeakerDiarization task into StreamingSpeakerD…
Bilal-Rahou Nov 15, 2023
1fd6e60
Create StreamingSpeakerDiarization pipeline to use models with latenc…
Bilal-Rahou Nov 30, 2023
0cb9cef
Merge branch 'pyannote:develop' into StreamingSpeakerDiarization
Bilal-Rahou Nov 30, 2023
0a874e8
implement multilatency model
Bilal-Rahou Jan 17, 2024
9ac7294
add latency_index parameter in inference.py to be able to use the Inf…
Bilal-Rahou Feb 8, 2024
9baefb5
Implement guided model and guided task
Bilal-Rahou Feb 8, 2024
b603690
create a guided inference to use with a guided model
Bilal-Rahou Feb 8, 2024
8838a6d
remove the second version of multilatency model to keep only the last…
Bilal-Rahou Feb 8, 2024
b2364db
make multilatency model more generic so that it can be use directly w…
Bilal-Rahou Mar 21, 2024
770ac8a
add a 'streaming' flag to SincNet
Bilal-Rahou Mar 25, 2024
542d104
change the MultilatencyPyanNet output structure from (num_latencies, …
Bilal-Rahou Mar 25, 2024
e90bfb4
add a StreamingInference class that concatenate the end of chunks ins…
Bilal-Rahou Mar 25, 2024
b6903c7
Merge branch 'develop' into StreamingSpeakerDiarization
Bilal-Rahou Mar 25, 2024
71043fc
add StreamingInference
Bilal-Rahou Mar 25, 2024
d038e62
remove unnecessary files (guided model and streaming pipelines)
Bilal-Rahou Mar 25, 2024
84d8887
SegmentationTaskMixin does not exist anymore, replace the heritage in…
Bilal-Rahou Mar 25, 2024
693e70f
adapting the new model and task to the newest pyannote implementation
Bilal-Rahou Mar 27, 2024
0c64b5a
add comments
Bilal-Rahou Mar 29, 2024
0cac2b1
add again the possibility to train a model with negative latency, and…
Bilal-Rahou Mar 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion pyannote/audio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@


from .core.inference import Inference
from .core.streaming_inference import StreamingInference
from .core.io import Audio
from .core.model import Model
from .core.pipeline import Pipeline

__all__ = ["Audio", "Model", "Inference", "Pipeline"]
__all__ = ["Audio", "Model", "Inference", "Pipeline", "StreamingInference"]
7 changes: 5 additions & 2 deletions pyannote/audio/core/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ def infer(self, chunks: torch.Tensor) -> Union[np.ndarray, Tuple[np.ndarray]]:
def __convert(output: torch.Tensor, conversion: nn.Module, **kwargs):
return conversion(output).cpu().numpy()


return map_with_specifications(
self.model.specifications, __convert, outputs, self.conversion
)
Expand Down Expand Up @@ -549,7 +550,7 @@ def aggregate(
aggregated_scores : SlidingWindowFeature
Aggregated scores. Shape is (num_frames, num_classes)
"""

print("aggregate")
num_chunks, num_frames_per_chunk, num_classes = scores.data.shape

chunks = scores.sliding_window
Expand Down Expand Up @@ -596,6 +597,7 @@ def aggregate(
)
+ 1
)

aggregated_output: np.ndarray = np.zeros(
(num_frames, num_classes), dtype=np.float32
)
Expand All @@ -611,7 +613,6 @@ def aggregate(
aggregated_mask: np.ndarray = np.zeros(
(num_frames, num_classes), dtype=np.float32
)

# loop on the scores of sliding chunks
for (chunk, score), (_, mask) in zip(scores, masks):
# chunk ~ Segment
Expand All @@ -620,6 +621,7 @@ def aggregate(

start_frame = frames.closest_frame(chunk.start + 0.5 * frames.duration)


aggregated_output[start_frame : start_frame + num_frames_per_chunk] += (
score * mask * hamming_window * warm_up_window
)
Expand All @@ -644,6 +646,7 @@ def aggregate(

return SlidingWindowFeature(average, frames)


@staticmethod
def trim(
scores: SlidingWindowFeature,
Expand Down