Is your feature request related to a problem? Please describe.
diar_streaming_sortformer_4spk-v2.1 supports up-to 4 speakers in the audio, but there is no way to limit the max speakers in the output, in an audio with only 2 speakers, it can make segment with speaker 3, 4 even in high latency mode.
Describe the solution you'd like
Provide a max_speaker parameter when inference a particular session, different audio input in the same batch can use different value of max_speaker.
Describe alternatives you've considered
N/A (it may be doable via post processing such as clustering and similarity check, but it's low efficiency and need additional model)
Additional context
I tested an audio with 2 speakers, it starts to identify speaker 3, 4 after 30 mins
Is your feature request related to a problem? Please describe.
diar_streaming_sortformer_4spk-v2.1 supports up-to 4 speakers in the audio, but there is no way to limit the max speakers in the output, in an audio with only 2 speakers, it can make segment with speaker 3, 4 even in high latency mode.
Describe the solution you'd like
Provide a max_speaker parameter when inference a particular session, different audio input in the same batch can use different value of max_speaker.
Describe alternatives you've considered
N/A (it may be doable via post processing such as clustering and similarity check, but it's low efficiency and need additional model)
Additional context
I tested an audio with 2 speakers, it starts to identify speaker 3, 4 after 30 mins