Skip to main content
Version: 2.2.0-alpha.1

Introduction

Overview​

Perform tests using the Nuance Mix SaaS offering, specifically using the Speech Recognition (ASRaaS) and Natural Language (NLUaaS) services.

This tool uses a truth file (tab separated - tsv) and references to audio files (PCM/wav) to call, process, and evaluate the results, giving insight into the performance of your model(s).

Transactions are executed in real time against the services and then reports are generated: JSON raw data, and visually rendered HTML.

Eager to dive right in? Check out the QuickStart.

graph TD; A[Mix Test Tool]-->A1[config.json <br/> truth.tsv <br/> mappings.adjmap <br/> audio data]; A1-->|Run Test|X[Validate truth file and resources]; subgraph Execute[Execute]; X-->|Start with Config & Inputs|C[Runner: testaccuracy]; C-->|Get token|A2[OAuth]; A2-->C; C-.->|Get audio data|D1[Audio WAV 8kHz, 16kHz]; D1-.->|Perform Speech Recognition|D[ASRaaS]; C-.->|Perform Interpretation|E[NLUaaS]; D-->|Write results|B1 end subgraph Analyze; E-->|Write results|B1[results.tsv] B1-->|Score Results|F[Scoring Library: score]; F-->G[results.nuanscored.tsv]; end subgraph Report; G-->|Generate Report|I[Report Generator: report]; I-->J[overview.json]; I-->K[report.html]; I-->K1[out.json]; end classDef primary fill:#30C4D6,color:#fff,stroke-width:0px; classDef dark fill:#333,color:#fefefe,stroke-width:0px; classDef middark fill:#888,color:#fefefe,stroke-width:0px; classDef regular fill:#1C6C98,color:#fff,stroke-width:0px; classDef section fill:#f4f4f4,color:#333,stroke-width:0px; classDef section2 fill:#CAEAEE,color:#333,stroke-width:0px; class A primary; class A2,B,D,E regular; class C,F,I,X section2; class Execute,Analyze,Report section; class B1,D1,G,H,J,K,K1 dark; class A1,D1 middark;

How It Works​

Truth File (.tsv)​

Provide a truth file with references to audio files to use when performing tests.

Data​

The truth fileΒ (TSV) must include:

  • Speaker profile identification
  • Reference to the audio file - path; linear PCM 8kHz or 16kHz supported for now
  • Transcription - text, note: adjudication map applies when scoring
  • Annotation - in json form (e.g. {"ENTITIY": "LITERAL|VALUE"})
  • Intent

Rows​

Each row in the TSV is exercised through the respective service based on your configuration (ie. ASR-only, NLU-only, ASR + NLU).

  • The speaker profile ID described is used when executing ASR.
  • Transcription applies to ASR, and is used for NLU if NLU-only configuration.
  • Annotation and Intent are leveraged in evaluation of NLU.

Execution Considerations​

When executing tests, they are based on a configuration.

Various configurations leverage the truth file in different ways:

  • ASR-only (nlu: null)
    • will rely on the speaker profile, audio file and transcription as input
  • NLU-only (asr: null)
    • will rely on the transcription as input
  • ASR and NLU
    • will rely on the speaker profile and audio file as input to ASR; intent and annotation to NLU

Scoring and Test Results​

Once execution is complete the results are scored and reports are created.

The scored results include:

  • transactions
    • count - total number of transactions
    • passed - how many passed
    • failed - how many failed
    • server_failed - number failed due to server
    • tool_failed - number failed due to tool
    • error_rate - rate of errors
  • transcriptions
    • count - total number of transcriptions
    • passed - how many passed
    • failed - how many failed
    • word_error_rate - the rate of errors
    • substitutions - number of substitutions
    • casing - number of casing differences
    • insertions - number of insertion differences
    • deletion - number of deletions
  • nlu + nlu_stats & nlu_grammar
    • count - total nlu transactions
    • passed - how many passed
    • failed - how many failed
    • error_rate - rate of errors
  • intents
    • count - total scoring tasks for intent testing
    • passed - how many passed
    • failed - how many failed
    • error_rate -rate of errors
    • confusion_matrix - decipher which intents are being confused (stat model and grammar)
  • slots
    • count - total scoring tasks for entity mentions
    • passed - how many passed
    • failed - how many failed
    • error_rate - rate of errors
    • slots[].. - each of the slots and their respective metrics
  • rows
    • index - item in tsv
    • perfect - test complete success
    • error - test error
    • audio - reference to audio path leveraged
    • word_error_rate
    • slot_error_rate
    • ....
      • transcription (expected, actual, score),
      • intent (expected, actual, score),
      • slots (passed, failed, slots[]..)

Report & Raw Results​

Final test results are compiled in raw JSON form, and accompanied by a rendered HTML report for convenience.

Sample Report

See Using the Report for more about the reports.

For information about the raw json output: see overview.json.

Errors​

If errors arrise, the tool will mark those results as such:

  • Tool Errors
  • Server Errors

See Errors for more details.

Disclaimer​

Performing accuracy tests successfully involves knowledge in how language AI technologies operate.

To compensate for acceptable formatting differences, the tool offers an adjudication map, which is domain-specific and requires customization.

Please consult your Nuance Technical Expert for assistance in this area.

Known Limitations​

note

Current limitations will be addressed in future releases.

The following features are currently not supported.

They are specifically noted as they apply based on complexity of tests being performed.

  • Scoring for...
    • multiple entities (including contiguous entities)
    • hierarchical entities (hasA)
    • operators, e.g. and/or/not