Noise Cancellation (NC)

Noise Cancellation algorithm is designed to remove background noise during real-time
communication. Krisp SDK includes technologies for both Outbound (Microphone) and Inbound (Speaker) Noise Cancellation.

:microphone: Outbound NC

The AI models have been primarily designed for close proximity near-field acoustic environments with a sound source radius of <50cm (mouth-to-microphone distance). The effectiveness of the algorithm in more distant acoustic environments will depend on a variety of factors such as distance, echo levels, sound-to-noise ratio (SNR), and the characteristics of the audio system/device.

:speaker: Inbound NC

Distinctive AI models are at work when Noise Cancellation needs to be applied on the Inbound (Speaker) stream, since the technology is not only removing the noise, but also supports network and app-specific codec degradations as well as multiple overlapping speakers. The algorithms are also robust enough for scenarios where low bandwidth audio needs to be supported, such as in the case of landline inbound streams.

General Specs

Both technologies are available in the form of two AI models - Small (default in SDK) and Big (on-demand)

  • The Small models are designed to integrate into lower-end devices and run 7x faster than the Big models while delivering good Noise Cancellation quality.
  • The Big models deliver the highest Speech and Noise quality but come at a higher CPU cost

The SDK natively supports only 32khz, 16khz and 8khz audio streams. Higher sample rates are
downsampled, processed and then upsampled back to the original rate.

The library takes as input the audio frame and returns a noiseless frame of the same size as takes as input. The amount of algorithmic latency depends on the sampling rate and frame duration of the audio stream.

For example:

  • For 10 ms frame duration and 16000 Hz sample rate, the expected algorithmic latency will be ≈25 ms.