This CL brings a large improvement to the RNN VAD CPU performance
by finally using `VectorMath::DotProduct()` for pitch search.
The realtime factor improved from about 390x to 570x for SSE2
(+180x, 45% faster) and to 610x for AVX2 (+235x, 60% faster).
RNN VAD benchmark results:
```
+-----+-------+------+------+
| run | none* | SSE2 | AVX2 |
+-----+-------+------+------+
| 1 | 393x | 572x | 618x |
| 2 | 388x | 568x | 607x |
| 3 | 393x | 564x | 599x |
+-----+-------+------+------+
```
*: baseline, no SIMD used for pitch search, but SSE2 used for the RNN
Results obtained as follows:
1. Force SSE2 in `DISABLED_RnnVadPerformance` for the RNN part in
order to measure the baseline correctly:
```
RnnBasedVad rnn_vad({/*sse2=*/true, /*avx2=*/true, /*neon=*/false});
```
2. Run the test:
```
$ ./out/release/modules_unittests \
--gtest_filter=*RnnVadTest*DISABLED_RnnVadPerformance* \
--gtest_also_run_disabled_tests --logs
```
Bug: webrtc:10480
Change-Id: I89a2bd420265540026944b9c0f1fdd4bfda7f475
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/195001
Reviewed-by: Gustaf Ullberg <gustaf@webrtc.org>
Commit-Queue: Alessio Bazzica <alessiob@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#32755}