Version: Next

Creating Test Sets

note

It is likely that you already have a methodology in place. The below information is to inspire a lean view on setting up if you are getting started.

Bootstrapping Data

Step 1. Create a test set tsv, i.e. truth.tsv
- Build: repeat for each sample
  - 1a. speaker id
  - 1b. audio file
    - -> record if needed (wav - 16khz linear PCM, 16-bit, or 8kHz PCM) : https://www.audacityteam.org/download/
  - 1c. transcription (text input)
    - -> listen and transcribe the text
  - 1d. annotation (entities)
    - -> define expected entity mention values (in JSON form: {"ENTITY": "VALUE"})
  - 1e. intent classification
    - -> classify the sample
Step 2. Define wordsets (optional)
- dynamic entities -> values (wordsets.json)
Step 3. Adjudication Map (optional)
- define any rules to omit in scoring in mappings.adjmap

End Result

Once compiled, your test set should look like so:

    test-set/
        mappings.adjmap
        project.trsx
        truth.tsv
        wordsets.json
        audio/
            file.wav
            ...

Bootstrapping Data​

End Result​

Bootstrapping Data

End Result