Creating Test Sets
note
It is likely that you already have a methodology in place. The below information is to inspire a lean view on setting up if you are getting started.
Bootstrapping Data​
- Step 1. Create a test set tsv, i.e. truth.tsv
- Build: repeat for each sample
- 1a. speaker id
- 1b. audio file
- -> record if needed (wav - 16khz linear PCM, 16-bit, or 8kHz PCM) : https://www.audacityteam.org/download/
 
- 1c. transcription (text input)
- -> listen and transcribe the text
 
- 1d. annotation (entities)
- -> define expected entity mention values (in JSON form: {"ENTITY": "VALUE"})
 
- -> define expected entity mention values (in JSON form: 
- 1e. intent classification
- -> classify the sample
 
 
 
- Build: repeat for each sample
- Step 2. Define wordsets (optional)
- dynamic entities -> values (wordsets.json)
 
- Step 3. Adjudication Map (optional)
- define any rules to omit in scoring in mappings.adjmap
 
End Result​
Once compiled, your test set should look like so:
    test-set/
        mappings.adjmap
        project.trsx
        truth.tsv
        wordsets.json
        audio/
            file.wav
            ...