Creating Test Sets
note
It is likely that you already have a methodology in place. The below information is to inspire a lean view on setting up if you are getting started.
Bootstrapping Data​
- Step 1. Create a test set tsv, i.e. truth.tsv
- Build: repeat for each sample
- 1a. speaker id
- 1b. audio file
- -> record if needed (wav - 16khz linear PCM, 16-bit, or 8kHz PCM) : https://www.audacityteam.org/download/
- 1c. transcription (text input)
- -> listen and transcribe the text
- 1d. annotation (entities)
- -> define expected entity mention values (in JSON form:
{"ENTITY": "VALUE"}
)
- -> define expected entity mention values (in JSON form:
- 1e. intent classification
- -> classify the sample
- Build: repeat for each sample
- Step 2. Define wordsets (optional)
- dynamic entities -> values (wordsets.json)
- Step 3. Adjudication Map (optional)
- define any rules to omit in scoring in mappings.adjmap
End Result​
Once compiled, your test set should look like so:
test-set/
mappings.adjmap
project.trsx
truth.tsv
wordsets.json
audio/
file.wav
...