Version: Next

Entity Evaluation

The Mix Testing Tool offers several options for how entity evaluation is performed.

See Interpretation Results: Entities for additional information about how entity information is returned from NLUaaS.

Entity F1 Scores

The output report can include F1, precision, and recall scores for entities. To enable this, set the useEntityF1 parameter in the nlu section of the config to true:

	"nlu": {
		"use_asr_results": true,
		"useEntityF1": true,
		...
	}

Simplified Entity Scoring Behavior

If the reference value for a particular entity in the truth file is given as either a string or a number, the behavior of the tool is:

Score against the returned canonical value if there is one and it is not empty;
Otherwise use the struct_value field if it is present (if the entity is defined as a relationship entity with a predefined entity such as nuance_CALENDARX underneath it);
Lastly, fall back on and use the literal value.

Outputting and Evaluating on Canonicals and Literals Together

You can optionally have the tool output and score against a dictionary for each entity that contains one or more of the literal, formattedLiteral, and canonical values explicitly.

To enable this behavior, in the truth file, use a dictionary as the value for each entity key, and include at least one of literal, canonical, and formattedLiteral keys with their expected values inside that dictionary, as in this example:

speaker	codedWvnm	transcription	annotation	intent
1	1-1.wav	I would like a large coffee	{"COFFEE_SIZE": {"canonical": "lg", "literal": "large", "formattedLiteral": "large"}	ORDER_COFFEE
1	1-2.wav	I'd like a large iced latte	{"COFFEE_SIZE": {"canonical": "lg", "literal": "large"}, "COFFEE_TYPE": {"canonical": "latte", "literal": "iced latte"}}	ORDER_COFFEE
...

The scoring behavior is driven by the content of the entity dictionaries in the truth file. For each key (literal, formattedLiteral, and/or canonical) present in the dictionary for a particular entity (and for a particular test case), each returned value is compared against the reference value, and if any of the values do not match across any of the keys present in the reference, the entity is scored as incorrect.

If a particular key is not specified in the reference dictionary, a returned value for that key will be ignored. For example, if a literal is not specified for the COFFEE_SIZE entity in a test case, the returned literal for COFFEE_SIZE will not come into play when that test case is scored.

In the case of an explicitly empty entity dictionary in the reference, the evaluation for that entity will only be counted as correct if the returned value for that entity is also empty (i.e. the prediction does not include that entity).

Multiple Instances of the Same Entity

If you expect the model to return multiple, discontiguous instances of the same entity, you can specify this in the truth file by using a list as the expected value for that particular entity, with the individual items in the list representing the expected values you expect the model to return.

For example:

speaker	codedWvnm	transcription	annotation	intent
1	1-1.wav	Send a note to Allison and James	{"PERSON": [{"literal": "Allison"}, {"literal": "James"}]}	SEND_NOTE
...

The ordering of the list does matter, as NLUaaS returns multiple occurrences of the same entity according to the order in which they are tagged.

Ignoring Predicted Entities

You may choose to have the tool ignore entity output for purposes of evaluating against the reference. Entities can be ignored that match a regular expression pattern at a global level across all intents, and/or ignored for certain intents specifically.

The _GLOBAL_ key is a special reserved word, and entities matching the corresponding pattern are ignored across all intents.
Intent names can also be listed individually, and entity names matching the specified pattern are ignored just for that intent.
If ignore patterns are specified for individual intents and a global pattern is specified, the global pattern also applies to the specific intents (that is, the global pattern is combined with the entity-specific pattern).

Use the predicted_entities_to_ignore parameter under the nlu section of the config file, and provide a dictionary with intent names as keys (and/or the _GLOBAL_ key which applies to all intents) and regex ignore patterns as values, as in this example:

	"nlu": {
		"use_asr_results": true,
		"predicted_entities_to_ignore": {
			"_GLOBAL_": "^.+_SIZE$",
			"ORDER_COFFEE": "COFFEE_TYPE"
		},
		...
	}

Any ignored entities will also not be displayed in the HTML report.

Predefined Entities

The NLUaaS response for predefined entities such as nuance_CALENDARX (and custom entities that have a predefined entity underneath them via an entity relationship such as isA) contain a special key named struct_value with a JSON dictionary representing the particular information that that entity contains (such as calendar event information canonicalized in terms of date, time, or both for nuance_CALENDARX). If testing on the canonical value for such entities, the Mix Testing Tool will by default flatten the nested predefined entity representation.

For example, if the ontology has a BIRTH_DATE entity defined in terms of an isA nuance_CALENDARX relationship, the NLUaaS response will be similar to the following when "July twenty first" is tagged as BIRTH_DATE:

  {
    "BIRTH_DATE": {
      "entities": [{
        "textRange": {
          "startIndex": 23,
          "endIndex": 30
        },
        "confidence": 0.6452816724777222,
        "origin": "STATISTICAL",
        "entities": {
          "nuance_CALENDARX": {
            "entities": [{
              "text_range": {
                "start_index": 23,
                "end_index": 30
              },
              "confidence": 1.0,
              "origin": "STATISTICAL",
              "struct_value": {
                "nuance_CALENDAR": {
                  "nuance_DATE": {
                    "nuance_DATE_ABS": {
                      "nuance_MONTH": 7.0,
                      "nuance_DAY": 21.0
                    }
                  }
                }
              },
              "literal": "July twenty first"
            }]
          }
        },
        "literal": "July twenty first"
      }]
    }
  }

The Mix Testing Tool will (by default) flatten that representation to:

  {
    "BIRTH_DATE": {
      "nuance_CALENDAR.nuance_DATE.nuance_DATE_ABS.nuance_MONTH": 7.0,
      "nuance_CALENDAR.nuance_DATE.nuance_DATE_ABS.nuance_DAY": 21.0
    }
  }

As such, the truth file should contain flattened representations for each leaf node in the NLUaaS response as shown above.

`flattened_predefined_output` Parameter

As mentioned and described above, the Mix Testing Tool will by default flatten nested predefined entity representations. This can be controlled via the flattened_predefined_output NLU property. If this is set to false, the nested dictionary structure is preserved in the output as in this example:

  {
	"BIRTH_DATE": {
		"nuance_CALENDAR": {
			"nuance_DATE": {
				"nuance_DATE_ABS: {
					"nuance_MONTH": 7.0,
					"nuance_DAY": 21.0
				}
			}
		}
	}
  }

CALENDARX Formatting

Another option available with the Mix Testing Tool is to output an abbreviated format for nuance_CALENDARX canonicals.

Set the format_calendar_canonicals parameter under the nlu section of the config file and the tool will attempt to return special formatted values for different nuance_CALENDARX values.

	"nlu": {
		"use_asr_results": true,
		"format_calendar_canonicals": true,
		...
	}

The example above of "July twenty first" would be returned as yyyy/7/21. Additional examples are:

Literal	Formatted Output
July twenty first	`yyyy/7/21`
July twenty first 2017	`2017/7/21`
today	`day/0`
next Tuesday	`tuesday/+1`
last week	`week/-1`
May	`yyyy/5/1 - yyyy/5/31`
the eighth through the tenth	`yyyy/mm/8 - yyyy/mm/10`
five o'clock	`5:00`
five pm	`5:00 PM`
in three hours	`hour/+3`
ten minutes ago	`minute/-10`
from five until six	`5:00 - 6:00`
from nine am until noon	`9:00 AM - 12:00 PM`
in the morning	`MORNING`
this afternoon	`AFTERNOON`
this evening	`EVENING`
tonight	`TONIGHT`

Entity F1 Scores​

Simplified Entity Scoring Behavior​

Outputting and Evaluating on Canonicals and Literals Together​

Multiple Instances of the Same Entity​

Ignoring Predicted Entities​

Predefined Entities​

flattened_predefined_output Parameter​

CALENDARX Formatting​