VAD

Overview

Voice Activity Detection (VAD) is a component in audio processing that determines whether the incoming audio contains speech. In this implementation, VAD helps control the noise cancellation process and optimizes audio transmission.

KrispSDK constructor arguments

PropertyTypeDefaultDescription
params.models.modelNCStringundefinedpath to the NC model used for sampling rates above 8KHz
params.models.modelVADStringundefinedpath to the VAD model
let krispSDK = new KrispSDK({
     // .. other params
      models: {
        // ... other models
				modelNC: "/dist/models/c6.f.s.da1785.kef",
        modelVAD: "/dist/models/vad_2.0.0_1.0.kef",
      },
    },
});

Noise Filter Creation

const audioSettings = {
  audio: {
    echoCancellation: false,
    noiseSuppression: false,
    autoGainControl: false,
  },
};

const stream = await navigator.mediaDevices.getUserMedia(audioSettings);

const source = audioContext.createMediaStreamSource(stream);
const destination = audioContext.createMediaStreamDestination();

const filterParam = {
	audioContext,
  useVAD: true, // Toggle VAD with this paramter
  vad: {
    // The VAD threshold is a configurable value that dictates when an audio signal is classified as speech or silence.
    // It is compared against the output of the VAD processing function to decide if speech is present
    threshold: 0.5,
  },
};

const filterNode = await krispSDK.createNoiseFilter(filterParam, onReady);

source.connect(filterNode).connect(destination);

destination.stream; // destination stream is the resulting stream which can be used in your buisness logic

The accuracy of VAD depends on the chosen threshold value. A lower threshold may classify more background noise as speech, while a higher threshold might cause speech to be ignored.