Skip to main content
Version: 2.2.0-alpha.1

Speaker Profile

See the Mix documentation for more information on speaker profiles.

Speaker adaptation is a technique that adapts and improves speech recognition based on qualities of the speaker and channel. The best results are achieved by updating the data pack's acoustic model in real time based on the immediate utterance. Speaker profiles enable this behavior when transacting with the ASR service.

When running, notice log.txt will contain:

2020-06-18 03:26:15,443.443 INFO [__main__:217] - xasr: start
2020-06-18 03:26:15,444.444 DEBUG [__main__:230] - xasr: +SPEAKER PROFILE: external_reference {
type: SPEAKER_PROFILE
uri: "urn:nuance:asr/speakerid/1"
}

The 1 represents the speaker id, which is derived from the truth.tsv. If the speaker id column is empty for a particular test case, a speaker profile will not be used in that instance.

When running the same test set repeatedly, as is often done when iteratively improving machine learning models, you may opt to discard the speaker adaptation to avoid overfitting the adapatation model on the specific test cases in your test set. To do this, set discard_speaker_adaption in config.json to true. See RecognitionFlags in the ASRaaS gRPC API documentation for further information.

Additionally, the testing tool can be configured to ignore the speaker id values in your truth.tsv file for all test cases, and to not use speaker profiles at all. This is configured via the use_speaker_profile parameter, which defaults to true.

	...
"asr": {
"topic": "GEN",
"dlm_weight": 0.2,
"dlm_uri": "urn:nuance-mix:tag:model/coffee_demo_dlgaas/mix.asr?=language=eng-USA",
"auto_punctuate": true,
"use_speaker_profile": false,
...
}
...