Skip to main content
Version: Next

Creating Test Sets

note

It is likely that you already have a methodology in place. The below information is to inspire a lean view on setting up if you are getting started.

Bootstrapping Data​

  • Step 1. Create a test set tsv, i.e. truth.tsv
    • Build: repeat for each sample
      • 1a. speaker id
      • 1b. audio file
      • 1c. transcription (text input)
        • -> listen and transcribe the text
      • 1d. annotation (entities)
        • -> define expected entity mention values (in JSON form: {"ENTITY": "VALUE"})
      • 1e. intent classification
        • -> classify the sample
  • Step 2. Define wordsets (optional)
    • dynamic entities -> values (wordsets.json)
  • Step 3. Adjudication Map (optional)
    • define any rules to omit in scoring in mappings.adjmap

End Result​

Once compiled, your test set should look like so:

    test-set/
mappings.adjmap
project.trsx
truth.tsv
wordsets.json
audio/
file.wav
...