There are several minor changes needed to migrate from TTv2 to TTv3.

External VAD Required

TTv2 included internal logic to determine Voice Activity Detection. TTv3 now requires an external VAD and will use the logic of the integration to determine when voice is detected. This means that TTv3 now follows any optimization that the integration uses to determine voice activity. The integration simply sends either true (voice detected) or false (voice not detected) to the process API.

API Changes

The process API now takes 3 parameters instead of 1:

inputSamples - Same as TTv2, the frame of audio to process
voiceDetected - Determined by external VAD - true (user is speaking) or false (no voice activity)
botSpeaking - True when the remote/bot side is currently speaking, False otherwise. Setting to true will reset the internal processor state between conversational turns

The output is the probability between 0 and 1 that the user has finished their input. The threshold to determine end of speech is still recommended between 0.3 and 0.5, with 0.3 being more aggressive/quicker and 0.5 providing a bit longer shift but providing fewer false positives. The API no longer returns -1, it returns the previous value until the result is ready.