Entity Evaluation
The Mix Testing Tool offers several options for how entity evaluation is performed.
See Interpretation Results: Entities for additional information about how entity information is returned from NLUaaS.
Entity F1 Scores​
The output report can include F1, precision, and recall scores for entities. To enable this, set the useEntityF1
parameter in the nlu
section of the config to true
:
"nlu": {
"use_asr_results": true,
"useEntityF1": true,
...
}
Simplified Entity Scoring Behavior​
If the reference value for a particular entity in the truth file is given as either a string or a number, the behavior of the tool is:
- Score against the returned
canonical
value if there is one and it is not empty; - Otherwise use the
struct_value
field if it is present (if the entity is defined as a relationship entity with a predefined entity such asnuance_CALENDARX
underneath it); - Lastly, fall back on and use the
literal
value.
Outputting and Evaluating on Canonicals and Literals Together​
You can optionally have the tool output and score against a dictionary for each entity that contains one or more of the literal
, formattedLiteral
, and canonical
values explicitly.
To enable this behavior, in the truth file, use a dictionary as the value for each entity key, and include at least one of literal
, canonical
, and formattedLiteral
keys with their expected values inside that dictionary, as in this example:
speaker codedWvnm transcription annotation intent
1 1-1.wav I would like a large coffee {"COFFEE_SIZE": {"canonical": "lg", "literal": "large", "formattedLiteral": "large"} ORDER_COFFEE
1 1-2.wav I'd like a large iced latte {"COFFEE_SIZE": {"canonical": "lg", "literal": "large"}, "COFFEE_TYPE": {"canonical": "latte", "literal": "iced latte"}} ORDER_COFFEE
...
The scoring behavior is driven by the content of the entity dictionaries in the truth file. For each key (literal
, formattedLiteral
, and/or canonical
) present in the dictionary for a particular entity (and for a particular test case), each returned value is compared against the reference value, and if any of the values do not match across any of the keys present in the reference, the entity is scored as incorrect.
If a particular key is not specified in the reference dictionary, a returned value for that key will be ignored. For example, if a literal
is not specified for the COFFEE_SIZE
entity in a test case, the returned literal
for COFFEE_SIZE
will not come into play when that test case is scored.
In the case of an explicitly empty entity dictionary in the reference, the evaluation for that entity will only be counted as correct if the returned value for that entity is also empty (i.e. the prediction does not include that entity).
Multiple Instances of the Same Entity​
If you expect the model to return multiple, discontiguous instances of the same entity, you can specify this in the truth file by using a list as the expected value for that particular entity, with the individual items in the list representing the expected values you expect the model to return.
For example:
speaker codedWvnm transcription annotation intent
1 1-1.wav Send a note to Allison and James {"PERSON": [{"literal": "Allison"}, {"literal": "James"}]} SEND_NOTE
...
The ordering of the list does matter, as NLUaaS returns multiple occurrences of the same entity according to the order in which they are tagged.
Ignoring Predicted Entities​
You may choose to have the tool ignore entity output for purposes of evaluating against the reference. Entities can be ignored that match a regular expression pattern at a global level across all intents, and/or ignored for certain intents specifically.
- The
_GLOBAL_
key is a special reserved word, and entities matching the corresponding pattern are ignored across all intents. - Intent names can also be listed individually, and entity names matching the specified pattern are ignored just for that intent.
- If ignore patterns are specified for individual intents and a global pattern is specified, the global pattern also applies to the specific intents (that is, the global pattern is combined with the entity-specific pattern).
Use the predicted_entities_to_ignore
parameter under the nlu
section of the config file, and provide a dictionary with intent names as keys (and/or the _GLOBAL_
key which applies to all intents) and regex ignore patterns as values, as in this example:
"nlu": {
"use_asr_results": true,
"predicted_entities_to_ignore": {
"_GLOBAL_": "^.+_SIZE$",
"ORDER_COFFEE": "COFFEE_TYPE"
},
...
}
Any ignored entities will also not be displayed in the HTML report.
Predefined Entities​
The NLUaaS response for predefined entities such as nuance_CALENDARX
(and custom entities that have a predefined entity underneath them via an entity relationship such as isA
) contain a special key named struct_value
with a JSON dictionary representing the particular information that that entity contains (such as calendar event information canonicalized in terms of date, time, or both for nuance_CALENDARX
). If testing on the canonical value for such entities, the Mix Testing Tool will by default flatten the nested predefined entity representation.
For example, if the ontology has a BIRTH_DATE
entity defined in terms of an isA
nuance_CALENDARX
relationship, the NLUaaS response will be similar to the following when "July twenty first" is tagged as BIRTH_DATE
:
{
"BIRTH_DATE": {
"entities": [{
"textRange": {
"startIndex": 23,
"endIndex": 30
},
"confidence": 0.6452816724777222,
"origin": "STATISTICAL",
"entities": {
"nuance_CALENDARX": {
"entities": [{
"text_range": {
"start_index": 23,
"end_index": 30
},
"confidence": 1.0,
"origin": "STATISTICAL",
"struct_value": {
"nuance_CALENDAR": {
"nuance_DATE": {
"nuance_DATE_ABS": {
"nuance_MONTH": 7.0,
"nuance_DAY": 21.0
}
}
}
},
"literal": "July twenty first"
}]
}
},
"literal": "July twenty first"
}]
}
}
The Mix Testing Tool will (by default) flatten that representation to:
{
"BIRTH_DATE": {
"nuance_CALENDAR.nuance_DATE.nuance_DATE_ABS.nuance_MONTH": 7.0,
"nuance_CALENDAR.nuance_DATE.nuance_DATE_ABS.nuance_DAY": 21.0
}
}
As such, the truth file should contain flattened representations for each leaf node in the NLUaaS response as shown above.
flattened_predefined_output
Parameter​
As mentioned and described above, the Mix Testing Tool will by default flatten nested predefined entity representations. This can be controlled via the flattened_predefined_output
NLU property. If this is set to false
, the nested dictionary structure is preserved in the output as in this example:
{
"BIRTH_DATE": {
"nuance_CALENDAR": {
"nuance_DATE": {
"nuance_DATE_ABS: {
"nuance_MONTH": 7.0,
"nuance_DAY": 21.0
}
}
}
}
}
CALENDARX Formatting​
Another option available with the Mix Testing Tool is to output an abbreviated format for nuance_CALENDARX
canonicals.
Set the format_calendar_canonicals
parameter under the nlu
section of the config file and the tool will attempt to return special formatted values for different nuance_CALENDARX
values.
"nlu": {
"use_asr_results": true,
"format_calendar_canonicals": true,
...
}
The example above of "July twenty first" would be returned as yyyy/7/21
. Additional examples are:
Literal | Formatted Output |
---|---|
July twenty first | yyyy/7/21 |
July twenty first 2017 | 2017/7/21 |
today | day/0 |
next Tuesday | tuesday/+1 |
last week | week/-1 |
May | yyyy/5/1 - yyyy/5/31 |
the eighth through the tenth | yyyy/mm/8 - yyyy/mm/10 |
five o'clock | 5:00 |
five pm | 5:00 PM |
in three hours | hour/+3 |
ten minutes ago | minute/-10 |
from five until six | 5:00 - 6:00 |
from nine am until noon | 9:00 AM - 12:00 PM |
in the morning | MORNING |
this afternoon | AFTERNOON |
this evening | EVENING |
tonight | TONIGHT |