Models

Accent Conversion (AC)

Accent Conversion (AC) is a real-time AI voice conversion model designed to enhance communication clarity by neutralizing accents in call center environments.

AC dynamically converts the user's accent into the customer's natively understood accent in real-time.

Along with accent modification, this model also removes background noises and voices.

AC requires no user enrollment and is compatible with all headsets used in call center environments.

Specs

  • Accents: Conversion of Indian, Philippine, African and Latin American English to North American English accent.
  • Platforms: Windows, Max, Linux
  • CPU Specs: Intel 10 gen, I5, or equivalent CPU.
  • Latency: Low latency (~180ms)

Noise Cancellation (NC)

Noise Cancellation algorithm is designed to remove background noise during real-time
communication. Krisp SDK includes technologies for both Outbound (Microphone) and Inbound (Speaker) Noise Cancellation.

🎤 Outbound NC

The AI models have been primarily designed for close proximity near-field acoustic environments with a sound source radius of <50cm (mouth-to-microphone distance). The effectiveness of the algorithm in more distant acoustic environments will depend on a variety of factors such as distance, echo levels, sound-to-noise ratio (SNR), and the characteristics of the audio system/device.

🔈 Inbound NC

Distinctive AI models are at work when Noise Cancellation needs to be applied on the Inbound (Speaker) stream, since the technology is not only removing the noise, but also supports network and app-specific codec degradations as well as multiple overlapping speakers. The algorithms are also robust enough for scenarios where low bandwidth audio needs to be supported, such as in the case of landline inbound streams.

De-Reverberation

When the noise cancellation algorithm runs, it also automatically performs de-reverberation removing room echo from the audio.

The technical specs and more details about the algorithms can be found here.

Specs

Models come in Small (default in SDK) and Big (on-demand) versions.

  • The Small models are designed to integrate into lower-end devices and run 7x faster than the Big models while delivering good Noise Cancellation quality.
  • The Big models deliver the highest Speech and Noise quality but come at a higher CPU cost

The SDK natively supports only 32khz, 16khz and 8khz audio streams. Higher sample rates are
downsampled, processed and then upsampled back to the original rate.

The library takes as input the audio frame and returns a noiseless frame of the same size as takes as input. The amount of algorithmic latency depends on the sampling rate and frame duration of the audio stream.

For example:

  • For 10 ms frame duration and 16000 Hz sample rate, the expected algorithmic latency will be ≈25 ms.

Background Voice Cancellation (BVC)

Background Voice Cancellation (BVC) technology is developed to cancel all background voices. It also removes all background noises and reverberation. The technology does not require user voice enrollment or training on user voice data. Krisp has deployed this technology in its Desktop applications, fixing the problem of cross-talk in call centers and offices.

BVC technology is designed to work with any headset and earbud. It works best with wired USB headsets with a boom microphone and is also compatible with most Bluetooth headsets, including AirPods. The list of devices tested by Krisp for BVC can be found here.

Also, the model might demonstrate acceptable performance when using high-quality built-in microphones or standalone external microphones, such as Apple MacBooks.

Specs

The BVC technology is compatible with Krisp Audio SDK v6.0 and later versions. The SDK includes a CPU-efficient Low Power BVC model as standard.

BVC LP Model Name: hs.c6.f.s.de56df

NC FB Model Name: c6.f.s.ced125

Combined model size: 20.1MB

Network size: BVC LP - 3.5M weights, NC FB 1.5M weights

Larger BVC models with the highest quality results can be available on demand.

Sampling Rate

The BVC model works at the 32KHz sampling rate. The BVC model works well for sampling rates above 8KHz, in which case the sound stream will be resampled to the working 32KHz sampling rate.

The BVC technology is incompatible with Narrow Band devices with sampling rates of <=8KHz. For 8KHz and lower sampling rates, narrowband Krisp NC models should be used.

Note: In some cases, the device might report a higher sampling rate but actually work with the narrow band or at a lower sampling rate. In this case, the performance and quality of BVC might be sub-optimal, with possible voice suppressions. It is recommended to put these devices into a blocked list of devices.

Performance

The performance of the algorithm varies depending on the platform within the range of 5-13%.

Performance metrics for the latest versions can be found in platform-specific Introduction pages.