Version: Next

Confidence Thresholds, Intent Subtotaling, and ROC Curve

Confidence thresholds and in-domain & out-of-domain intents can be specified in your input configuration file to produce additional scored output information.

Confidence Tresholds

When working with spoken dialog systems, acceptance and confirmation confidence thresholds are often used to tune the system's behavior and measure its performance.

Utterances at or above the acceptance threshold are meant to be those that the system has recognized with high cofidence and that do not need additional confirmation or clarification.
Utterances below the acceptance threshold but at or above the confirmation threshold are meant to be those that the system recognized with some confidence but that need confirmation from the user to be sure that the interpretation was correct.

You can specify acceptance_confidence_threshold and confirmation_confidence_threshold confidence threshold values in the nlu portion of config.json. These are optional, but if using them then both must be listed together.

	...
	"nlu": {
		"use_asr_results": true,
		"model_uri": "urn:nuance-mix:tag:model/coffee_demo/mix.nlu?=language=eng-USA",
		"wordset_path": null,
		"timeout": 15,
		"acceptance_confidence_threshold": 0.6,
		"confirmation_confidence_threshold": 0.3
	}
	...

The acceptance threshold must be greater than the confirmation threshold, and both thresholds must be provided together.

If both thresholds are provided, the HTML report will contain an additional section that shows accuracy information for the correct acceptance and correct confirmation threshold groups separately.

Intent Subtotaling

If you want to segment intents from your model ontology into in-domain and out-of-domain groups, you can use the subtotal_intents nlu parameter with OOD and ID keys (for out-of-domain and in-domain respectively) that list each set of intents:

	...
	"nlu": {
		"use_asr_results": true,
		"model_uri": "urn:nuance-mix:tag:model/coffee_demo/mix.nlu?=language=eng-USA",
		"subtotal_inents": {
			"ID": ["order_coffee", "order_tea"],
			"OOD": ["NO_MATCH"]
		},
		...
	},
	...

If in-domain and out-of-domain intents are not explicitly set, all intents will be considered as being in-domain.

ROC Curve

The Receiver Operator Characteristic (ROC) curve and accompanying tables display information about accuracy rates across different chosen confidence thresholds.

ROC Curve

These numbers are populated based on the acceptance_confidence_threshold and confirmation_confidence_threshold values given in the config as described above (if those thresholds are not present in the config when the report is generated, some information will be absent from the table).

To try out new confidence thresholds, enter new numbers in the boxes for each threshold, and hit Submit for each.

ROC Curve Adjust Thresholds

New values will be displayed based on the new thresholds, along with the original values and thresholds for comparison (if they were present in your config when the report was created).

Descriptions of the different values are given below.

Variable Name	Description
Total	The number of utterances in your truth file
In Domain	The percentage of utterances in your truth file that belong to in-domain intents
OOD Speech	Percentage of out-of-domain utterances that contain speech which could be recognized. `N/A` will be displayed if in-domain and out-of-domain groups are not specified.
OOD Noise	Percentage of out-of-domain utterances which do not include any intelligible speech. These include actual noises (such as a cough or a dog barking), background speech, and side conversations (where the caller is talking to someone else, e.g. off the phone). The noises also include digital noises (e.g. dial pulses and hang-ups). `N/A` will be displayed if in-domain and out-of-domain groups are not specified.
ID Accuracy	NLU sentence-level accuracy for utterances belonging to in-domain intents
CA	Percentage of utterances that were scored as correct with a confidence at or above the acceptance confidence threshold
CC	Percentage of utterances that were scored as correct with a confidence below the acceptance threshold but at or above the confirmation threshold
OOD Retry	Percentage of out-of-domain utterances with confidence scores lower than the confirmation threshold. `N/A` will be displayed if in-domain and out-of-domain groups are not specified.
ID Retry	Percentage of in-domain utterances with confidence scores lower than the confirmation threshold
FC	Percentage of utterances that were scored as incorrect with a confidence below the acceptance treshold but at or above the confirmation threshold
FA	Percentage of utterances that were scored as incorrect with a confidence at or above the acceptance threshold

Confidence Tresholds​

Intent Subtotaling​

ROC Curve​

Confidence Tresholds

Intent Subtotaling

ROC Curve