Skip to content

support max_speakers configuration in diar_streaming_sortformer_4spk-v2.1 #15711

@livefantasia

Description

@livefantasia

Is your feature request related to a problem? Please describe.

diar_streaming_sortformer_4spk-v2.1 supports up-to 4 speakers in the audio, but there is no way to limit the max speakers in the output, in an audio with only 2 speakers, it can make segment with speaker 3, 4 even in high latency mode.

Describe the solution you'd like

Provide a max_speaker parameter when inference a particular session, different audio input in the same batch can use different value of max_speaker.

Describe alternatives you've considered

N/A (it may be doable via post processing such as clustering and similarity check, but it's low efficiency and need additional model)

Additional context

I tested an audio with 2 speakers, it starts to identify speaker 3, 4 after 30 mins

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions