Version: Next

Character Error Rate

Character error rate is an alternative ASR error rate metric.

By default, the tool uses the Word Error Rate metric for ASR scoring, which treats each whitespace-separated word as its own token when comparing the ASR output hypothesis to the reference transcription from the truth file. Alternatively, Character Error Rate can be used instead, which treats each individual character as its own token; the ASR hypothesis is then compared to the reference on a character-by-character basis. This is useful for languages like Japanese and Mandarin, where character sequences usually do not contain whitespace, and where spoken words can

To use the Character Error Rate metric instead of Word Error Rate, set the use_character_error_rate parameter in the asr section of the config to true:

	"asr": {
		"topic": "GEN",
		"auto_punctuate": false,
		...
		"use_character_error_rate": "true",
		...
	}

If use_character_error_rate is set to true, the generated report will display "Character Error Rate" instead of "Word Error Rate" (and "CER" instead of "WER" as an abbreviation). In the .json output, you will still see keys with the same variable names (such as word_error_rate) when using the character error rate metric as when using the word error rate metric, though the values will pertain to character error rate.