Character Error Rate
Character error rate is an alternative ASR error rate metric.
By default, the tool uses the Word Error Rate metric for ASR scoring, which treats each whitespace-separated word as its own token when comparing the ASR output hypothesis to the reference transcription from the truth file. Alternatively, Character Error Rate can be used instead, which treats each individual character as its own token; the ASR hypothesis is then compared to the reference on a character-by-character basis. This is useful for languages like Japanese and Mandarin, where character sequences usually do not contain whitespace, and where spoken words can
To use the Character Error Rate metric instead of Word Error Rate, set the use_character_error_rate
parameter in the asr
section of the config to true
:
"asr": {
"topic": "GEN",
"auto_punctuate": false,
...
"use_character_error_rate": "true",
...
}
If use_character_error_rate
is set to true
, the generated report will display "Character Error Rate" instead of "Word Error Rate" (and "CER" instead of "WER" as an abbreviation). In the .json
output, you will still see keys with the same variable names (such as word_error_rate
) when using the character error rate metric as when using the word error rate metric, though the values will pertain to character error rate.