SDK Features
All audio filters are language-independent.
Real-Time Noise Cancellation (NC)
Noise Cancellation algorithm is designed to remove background noise during real-time
communication.
The AI models have been primarily designed for close proximity near-field acoustic environments with a sound source radius of <50cm (mouth-to-microphone distance). The effectiveness of the algorithm in more distant acoustic environments will depend on a variety of factors such as distance, echo levels, sound-to-noise ratio (SNR), and the characteristics of the audio system/device.
The SDK natively supports only 32khz, 16khz and 8khz audio streams. Higher sample rates are
downsampled, processed and then upsampled back to the original rate.
The library takes as input the audio frame and returns a noiseless frame of the same size as takes as input. The amount of latency depends on the sampling rate and frame duration of the audio stream.
For example:
- For 30 ms frame duration and 16000 Hz sample rate, the latency will be 15 ms.
- For 10 ms frame duration and 16000 Hz sample rate, the latency will be 25 ms.
The algorithm can cause a maximum 32ms latency.
The SDK comes with two models - Main and Light.
- The Main model delivers the highest Speech and Noise quality but comes with higher CPU usage
- The Light model is designed to integrate into lower-end devices and runs 7x faster compared to the Main model while delivering a decent Speech and Noise quality.
Real-Time De-Reverb
When the noise cancellation algorithm runs, it also automatically performs de-reverberation removing room echo from the audio.
Real-Time Voice Activity Detection (VAD)
Voice Activity Detection (VAD) algorithm is designed to predict whether there is
speech in an audio frame or not. It is able to identify the speech presence in high noise conditions.
The VAD model works with 10ms frame duration audios and 8k sampling rate. Higher sample rate
audio streams will be downsampled to 8k for processing.
The aggressiveness of the algorithm can be adjusted using a threshold that varies within a range (0,1). The default threshold is 0.5. The threshold can be changed to adjust the algorithm to the exact use case - the higher the threshold the lower the aggressiveness of the VAD algorithm.
Real-Time Noise and Voice Statistics
This real-time algorithm retrieves per-frame statistics about the levels of processed voice and removed noise. These statistics are represented as values within the range of 0 to 100, indicating the amount of voice and removed noise in each frame.
In addition to per-frame statistics, the algorithm includes an end-of-stream feature that enables users to retrieve information on the amount of removed noise classified into four categories: no noise, low, medium, and high. This feature also provides information on the total talk time accumulated from the start of the processing until the point at which the statistics are retrieved.
Updated 25 days ago