Alessio Bazzica 253f8369bb AGC2 RNN VAD: safe SIMD optimizations scheme + AVX2 kill switch
In preparation for adding AVX2 code, a safe scheme to support
different SIMD optimizations is added.

Safety features:
- AVX2 kill switch to stop using it even if supported by the
  architecture
- struct indicating the available CPU features propagated from
  AGC2 to each component; in this way
  - better control over the unit tests
  - no need to propagate individual kill switches but just
    set to false features that are turned off

Note that (i) this CL does not change the performance of the RNN VAD
and (ii) no AVX2 optimization is added yet.

Bug: webrtc:10480
Change-Id: I0e61f3311ecd140f38369cf68b6e5954f3dc1f5a
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/193140
Reviewed-by: Per Åhgren <peah@webrtc.org>
Commit-Queue: Alessio Bazzica <alessiob@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#32739}
2020-12-02 10:09:24 +00:00

61 lines
2.0 KiB
C++

/*
* Copyright (c) 2018 The WebRTC project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef MODULES_AUDIO_PROCESSING_AGC2_VAD_WITH_LEVEL_H_
#define MODULES_AUDIO_PROCESSING_AGC2_VAD_WITH_LEVEL_H_
#include <memory>
#include "modules/audio_processing/agc2/cpu_features.h"
#include "modules/audio_processing/include/audio_frame_view.h"
namespace webrtc {
// Class to analyze voice activity and audio levels.
class VadLevelAnalyzer {
public:
struct Result {
float speech_probability; // Range: [0, 1].
float rms_dbfs; // Root mean square power (dBFS).
float peak_dbfs; // Peak power (dBFS).
};
// Voice Activity Detector (VAD) interface.
class VoiceActivityDetector {
public:
virtual ~VoiceActivityDetector() = default;
// Analyzes an audio frame and returns the speech probability.
virtual float ComputeProbability(AudioFrameView<const float> frame) = 0;
};
// Ctor. Uses the default VAD.
VadLevelAnalyzer();
VadLevelAnalyzer(float vad_probability_attack,
const AvailableCpuFeatures& cpu_features);
// Ctor. Uses a custom `vad`.
VadLevelAnalyzer(float vad_probability_attack,
std::unique_ptr<VoiceActivityDetector> vad);
VadLevelAnalyzer(const VadLevelAnalyzer&) = delete;
VadLevelAnalyzer& operator=(const VadLevelAnalyzer&) = delete;
~VadLevelAnalyzer();
// Computes the speech probability and the level for `frame`.
Result AnalyzeFrame(AudioFrameView<const float> frame);
private:
std::unique_ptr<VoiceActivityDetector> vad_;
const float vad_probability_attack_;
float vad_probability_ = 0.f;
};
} // namespace webrtc
#endif // MODULES_AUDIO_PROCESSING_AGC2_VAD_WITH_LEVEL_H_