Abstract ： In this paper, the causes of nonlinear acoustic echo cancellation are discussed 、 Research status 、 Starting from the technical difficulties , This paper introduces the dual coupling acoustic echo cancellation algorithm and experimental results of Huawei cloud audio and video team .
Nonlinear acoustic echo cancellation , It's very common and tricky in real acoustic systems , So far, there is no particularly effective way to solve . At present, the public literature on nonlinear acoustic echo cancellation is very few . Huawei cloud special phonetic video industry 20 many years , How to deal with nonlinear acoustic echo cancellation , What's the effect ？ Fan Zhan, a cloud audio and video expert of Huawei , Will be eliminated from the nonlinear acoustic echo generated by the cause of 、 Research status 、 Starting from the technical difficulties , This paper introduces the dual coupling acoustic echo cancellation algorithm and experimental results of Huawei cloud audio and video team . The following is the content of sharing shorthand ：
The opening ： Why share nonlinear acoustic echo cancellation technology
What I'm going to talk about today is 《 Nonlinear acoustic echo cancellation technology 》, The reason for choosing this direction , It's mainly for two reasons ： First, nonlinear acoustic echo cancellation is a technical problem that has plagued the industry for many years , This problem is very common in real acoustic systems , And it's tricky at the same time , up to now , There is no particularly effective way . I guess you should be interested in this topic .
There's another reason , I've done some technical research before , In the existing public literature , There is very little information about nonlinear acoustic echo cancellation , So I want to take this opportunity , Introduce the latest progress of Huawei cloud in this field , I hope it can be helpful for your follow-up research , At the same time, I also want to do some technical exchanges with experts .
Today's presentation consists of four parts ：
1、 Part one what is nonlinear acoustic echo , The principle of its creation 、 Research status and technical difficulties ;
2、 The second part focuses on the double coupling acoustic echo cancellation algorithm ;
3、 The third part is to test the performance of the algorithm through experiments ;
4、 Finally, I'd like to make some simple conclusions .
One 、 Nonlinear acoustic echo
1. What is nonlinear acoustic echo
Let's go straight to the first part , What is nonlinear acoustic echo ？ Here's a picture , It represents a path map of acoustic echo , The left side of the graph corresponds to the transmitter , On the right is the receiver . The signal we send out has to go through D/A Transformation , From digital to analog , And then through the power amplifier , Zoom in and drive the horn , This will make a sound . And then the air goes through the channel , At the receiving end, it's picked up by the microphone , And then it goes through the power amplifier again , Last pass A/D Transformation , From analog to digital . So here y[k] It's the echo signal we get .
2. How to judge linear echo and nonlinear echo
So here comes the question , The echo we're receiving y[k] Is it linear echo or nonlinear echo ？ Or how should we judge it ？
I think we have to solve this problem , The core is to understand every link in it , Let's see if they're linear or nonlinear , If all the links are linear , So it's natural y[k] It's a linear echo , Otherwise, as long as one link is nonlinear , So this echo is nonlinear .
Here I divide the entire echo path into A、B、C、D Four parts . Let's see ,ABCD Which part of it is most likely to be nonlinear ？ The answer should be B. That's the power amplifier and horn in the echo path , The specific reasons will be analyzed in detail later .
Now I want to explain why A、C、D They're not nonlinear . First of all A and D It's better to judge , They all belong to linear time invariant systems . What is more difficult to judge is C, Because in some complicated scenes , The acoustic echo often reaches the receiving end after multiple reflections of different paths , At the same time, it will have a strong reverberation , Even in more extreme cases , There is also a relative displacement change between the speaker and the microphone , The echo path also changes rapidly over time . So many factors add up , The performance of echo cancellation algorithm will degrade rapidly , Even completely ineffective .
Some students may ask , Is this a complicated situation , Isn't it nonlinear ？ In my submission C It should be a linear time-varying acoustic system , Because the main basis for distinguishing between linear and nonlinear is the superposition principle , These complex scenarios mentioned earlier , They still satisfy the principle of superposition , therefore C It's a linear system .
Here's a little more , A careful friend will find it B There's a power amplifier inside , At the same time C There's also a power amplifier inside , Why is it that B After amplifying the power amplifier , May cause nonlinear distortion , and C There is no nonlinear distortion in the power amplifier ？ The main difference between them is B After amplification, the output is a big signal , To drive the horn . and C After amplification, the output is still small signal , There is usually no nonlinear distortion .
3. The cause of nonlinear acoustic echo
The cause of nonlinear acoustic echo , I've listed two reasons . One of the reasons , The miniaturization and cheapness of acoustic devices , The acoustic device here is the front B The power amplifier and horn mentioned in it .
Why the miniaturization of acoustic devices is prone to nonlinear distortion ？ This needs to start with the basic principle of horn sound , We all know that the nature of sound waves is a kind of physical vibration , The basic principle of the horn is to drive the diaphragm of the horn to vibrate through the current , This diaphragm will drive the surrounding air molecules to vibrate accordingly , And then there's a sound . If we're going to make a big noise , So you need to use more electricity per unit time to drive more air molecules to vibrate .
Suppose you have two speakers of different sizes , They use the same power to drive , For loudspeakers , Because it has a larger contact area with the air , So it can drive more air molecules to vibrate in unit time , So it's going to make a lot of noise . And if the trumpet wants to make the same sound as the big one , You need to increase the drive power , It's a problem ： Our power amplifier will enter a state of saturation distortion , This leads to nonlinear distortion . This is one of the main reasons why acoustic devices are easy to produce nonlinear distortion when they are miniaturized . It's easier to understand the cheapness here , Not much .
Two of the reasons , Is the acoustic structure design is not reasonable . The most typical example is that the vibration isolation design of acoustic system is unreasonable . Between the speaker unit and the mic receiver unit , Vibration isolation is usually required , If there is no vibration isolation treatment , So in the process of the horn sounding , The vibration that he generates will be transmitted to the receiver of the microphone in a physical way , Modulate the acoustic signal received by the microphone , And this vibration is essentially random 、 Nonlinear vibrations , So it's bound to bring nonlinear distortion .
We did a survey on the main mobile phone models on the market before , Mainly investigate acoustic properties . We were surprised to find that , More than half of the mobile phone models on the market , Acoustic properties are not ideal , Corresponding to the... In this “ Poor ” and “ range ” These two gears . We usually play video games on our mobile phones , Or voice calls , There are often leakage echo problems and double shear problems , It's directly related to the poor acoustics of mobile phones .
Of course, this set of data is only for mobile phones , There are many electronic products like mobile phones on the market , They should have similar problems . This set of data tells us , Nonlinear distortion is a common problem in electronic products in our life , I believe that the research on this issue will be a very valuable and meaningful direction .
4、 Research status of nonlinear acoustic echo cancellation technology
Before that IEEE In our digital library “ Acoustic echo cancellation ” Related literature of , Finally, we found 3402 piece , There are conference papers , And journals 、 The magazine 、 Books, etc . I used the same method to search for “ Nonlinear acoustic echo cancellation ”, It turns out that only 254 Articles , Less than in the previous literature 1/10, This means that nonlinear acoustic echo cancellation technology is a relatively cold research direction in the whole acoustic echo cancellation field .
Since this direction is valuable and meaningful , Why is it so cold ？ One answer I can think of is that it's too difficult , Very challenging . Let's take a look at its technical difficulties .
5、 The technical difficulties of nonlinear acoustic echo cancellation
I started from 6 The linear and nonlinear echo cancellation problems are compared in different dimensions . The first dimension , System transfer function . In a linear system , We consider the system transfer function to be a slowly time-varying system , We can approach the transfer function by adaptive filtering , To effectively suppress the echo . And in nonlinear systems , System transfer functions are usually fast changing 、 Mutant , If we approach it in a linear way , The update rate of the filter will appear , Can't keep up with the speed of the system's transfer function , It leads to poor acoustic echo cancellation .
The second dimension is the optimization model , In linearity, we have a very complete linear optimization model , From the construction of objective function to the solution of system optimization problem , The whole context is very clear . And in a nonlinear system , At present, there is a lack of an effective model to support it .
The next four dimensions correspond to 4 A question , They are ubiquitous in the field of linear echo cancellation 4 A difficult question , These problems also exist in the field of nonlinearity . For example, strong reverberation , If we have a video conference in a small conference room , Then the sound will be reflected through the walls many times , It brings a strong reverberation , Reverberation can have a long tailing time . If you want to suppress this strong reverberation echo , We need to lengthen the length of the linear filter , This brings a new problem ： according to Widrow The theory of adaptive filtering , The longer the filter is , The slower the convergence rate , At the same time, the greater the noise , Then the echo cancellation in strong reverberation is not ideal .
The second problem is delay . In the field of real-time audio and video calls , Delay jump is a common problem . The main phenomenon is that the delay relationship between the signal collected by the microphone and the echo reference signal will jump , After each jump, the signal needs to be realigned , It's going to leak out some echoes .
The third problem is the howling problem . The detection and suppression of howling is recognized as a classic problem in the field of echo .
Finally, there is the double talk question . Double talk is an important index to evaluate the performance of echo cancellation algorithm , Of course, it's also a very difficult problem to deal with , Because double talk can easily lead to the divergence of filter coefficients .
Taking these dimensions together, we can see that , Nonlinear acoustic echo cancellation is a challenging research direction .
Two 、 Double coupled acoustic echo cancellation algorithm
This is an algorithm proposed by our team , Its main feature is , In the process of constructing the filter model, some characteristics of nonlinear acoustic echo are combined , So it's in terms of suppressing nonlinear echoes , It also shows its inherent advantages .
1、 Modeling of nonlinear acoustic echo systems
Go back to the previous acoustic echo map . We simplified the model . Let's use a transfer function for the horn end on the left Wn To express , Suppose it represents a nonlinear echo path transfer function ; At the same time, we put the right side of the horn , It's McDonnell , Unified use Wl To express , It's a linear echo transfer function . Based on this mathematical assumption , The signal we received y It can be expressed as a transmitted signal x The result of convolution with these two transfer functions respectively .
Next, we simplify the model appropriately , Simplification is mainly based on mathematical decomposition , We assume a nonlinear transfer function , It can be decomposed into a combination of linear and nonlinear system functions , You get the middle equation .
Next, we replace the middle equation with variables , You get the last expression , The physical meaning of this expression is very clear , We can see , The whole echo path can be expressed as the sum of linear echo path and nonlinear echo path , This is the physical meaning of it .
2、 Dual coupling adaptive filter
Based on such a mathematical model , Next we build a new filter structure , It is called double coupling adaptive filter . This filter is compared with the traditional linear adaptive filter , There are two main differences , The first difference is that traditional linear filters have only one learning unit , And our filter has two learning units , These are linear echo path filters here , We use it Wl To express . And nonlinear echo path filters , We use it Wn To express .
The second difference is , We also add a coupling factor between the two filters , The purpose of this coupling factor is to coordinate the better work of the two , Let the two be able to give full play to their effectiveness , It can even play 1+1＞2 The effect of .
3、 Double coupled filter design
When the structure of the filter is determined , We're going to design filter coefficients . The design process is summed up in three steps , The first step is to build optimization criteria , The second step is to solve the weight coefficient of the filter ——Wl and Wn, The last step is to build a coupling mechanism .
The first step is to build optimization criteria . I think building optimization criteria , It should be the most important step in the filter design , Because it determines the upper limit of filter performance . What kind of optimization criterion is a good one ？ I think good optimization criteria need to match the physical characteristics of the problem effectively , So before building optimization criteria , We first analyze the characteristics of nonlinear acoustic echo , It is hoped that some physical properties of nonlinear acoustic echo can be found through this analysis .
Our analysis is based on the above function , We call it short-term correlation , It means two signals , In a short observation window “T” The similarity of waveforms within such a scale , It's important to note that this function is statistically , Because we do the mathematical expectation on it . At the same time, we add a phase correction factor to the last term of the molecule , The purpose is to align the initial phases of these two signals .
Based on the short-term correlation function constructed above , We analyze a lot of acoustic echo data , And selected several groups of typical data ： The green curve corresponds to a very linear set of echo data . We can see from this data that , All the time T Within the range of , Its short-term correlation is very high , achieve 0.97 above , Close to the 1. Yellow curve , The corresponding data has relatively weak nonlinear distortion , So in time T After getting bigger , The short-term correlation decreases gradually , Finally, it tends to a relatively stable value . The red curve is a data with strong nonlinear distortion , In order to effectively compare the three sets of data , We also give a blue curve , This curve is the short-term correlation between signal and noise , It's all the time T The range is very small .
By comparing these curves , Two conclusions will be drawn , The first conclusion is that we construct the short-term correlation function , It can objectively reflect the linearity characteristics of the acoustic system , The better the linearity , The larger the value will be . Second conclusion ： For systems with strong nonlinear distortion , It's in the short-term observation window （ Such as T<100ms） There is still a strong correlation , This can be seen from the red curve .
It is based on such characteristics , We then construct a new error function , be called “ Short time cumulative error function ”. You can notice that we are in an observation window T Inside , The residuals are accumulated .
Based on this error function , We further construct a new optimization criterion , be called “ Minimum mean short time cumulative error criterion ”. We hope to optimize the constraints of the criteria , Finally, the filter weight coefficient can satisfy two characteristics , The first feature is that the filter is statistically optimal , That is, the global optimum , So we add mathematical expectation operation to the objective function . meanwhile , We also hope that the short-term time window is the best one , That is, local optimum , So within mathematical expectations , We also integrate the error for a short time .
This optimization criterion is essentially different from the traditional linear adaptive filter , Because the traditional linear adaptive filter is based on the least mean square error criterion , It's just statistically optimal , There is no local optimal constraint .
4、 Double coupled filter design
Let's first solve the problem here Wl, It's a linear filter . The main solution is , hypothesis Wn The nonlinear filter is the optimal solution , Put this optimal solution into the previous optimization equation , We will get the optimization objective function after the above simplification .
In this place , We've made some prior hypotheses , Suppose that the first and second order statistics of the nonlinear filter are equal to 0, We can further simplify the above optimization problem , We get the equation that we're very familiar with , Namely Wiener-Hopf equation . The result tells us , The optimal solution of linear filter is consistent with that of traditional adaptive filter , All are Wiener-Hopf The theoretical optimal solution of the equation . So we can use some of the existing more mature algorithm , such as NLMS Algorithm 、RLS Algorithm , It is solved iteratively . This is it. Wl The design of the .
Let's take a look at Wn The design of the .Wn Design and Wl The design is similar to , Also need to optimize the linear filter , Into the initial optimization problem , The previous optimization problem can be simplified to the following equation . After a series of variable substitutions , Finally, the optimal solution of the nonlinear filter is obtained , It has the form of least square estimation .
The third step is to build the coupling mechanism . Before introducing the coupling mechanism , Let's talk about the expected characteristics of this coupling mechanism . I hope that when the linearity of the acoustic system is very good , Linear filters play a leading role , And the nonlinear filter is in a dormant state , Or closed state ; In turn, , When the nonlinearity of an acoustic system is very strong , It is hoped that the nonlinear filter will play a leading role , The linear filter is in a semi sleep state . The actual acoustic system is often a continuous alternation of nonlinear and linear states 、 superposition , Therefore, we hope to build a mechanism to control the coupling of these two states .
To design the coupling mechanism , It is necessary to measure linearity and nonlinearity . therefore , We define two factors , They are linearity factor and nonlinearity factor respectively , For the two equations on the left . The basic idea of coupling control is to replace the values of these two factors into the NLMS In the algorithm and the least squares algorithm , Adjust the learning speed of both .
In order to facilitate you to have a qualitative understanding of the double coupling acoustic echo cancellation algorithm , I drew another set of curves , The left set of pictures corresponds to the scene of linear echo . Let's take a look first NLMS Algorithm , The Yellow curve represents the real system transfer function , The red curve is NLMS The result of the algorithm . You can see , In a linear scenario ,NLMS The linear filter obtained by the algorithm can effectively approximate the real transfer function , And then it can effectively suppress the linear acoustic echo .
Let's take a look at the double coupling algorithm , In a linear echo scene , The double coupled nonlinear filter is in a sleep state , So its value tends to 0 Of , At this time, linear filters play a leading role .
Next, let's look at the nonlinear acoustic echo scene on the right . We assume that the nonlinear distortion mainly occurs in t1 To t2 In this time period , You can see the yellow line at this time , There was a mutation , about NLMS Algorithm , When nonlinear distortion occurs , Its linear filter approximates the nonlinear distortion . But because the speed of learning can't keep up with the speed of filter change , So there's always a big one between it and the real value gap. And when the nonlinear distortion disappears , It will take some time for it to return to normal , So throughout the time period , There will be echo leakage problems .
Let's look at the double coupling algorithm , After the appearance of nonlinear distortion , The linear filter will enter a relative sleep state , It's the coupling mechanism mentioned above , It will slow down its update speed , So in the whole period of time in which nonlinearity occurs , His value changes slowly .
After entering the nonlinear distortion state , The nonlinear filter starts to work , It will quickly track changes in nonlinear characteristics , And when the nonlinear distortion disappears , The nonlinear filter goes into sleep again . Combine these two filters , The change of acoustic echo path can be tracked effectively . Here's just an example , The actual situation is often much more complicated .
And then we're going to talk about this 2 The characteristics of the filters are compared , Mainly from 4 Different dimensions . The first is the optimization criteria .NLMS The algorithm is based on the least mean square error criterion , The double coupling algorithm is based on the minimum average short-term cumulative error criterion , So their optimization criteria are different .
The second is the optimal solution of the theory ,NLMS The algorithm has Wiener-Hopf The solution of the equation , The linear filter of double coupling algorithm also has Wiener-Hopf The solution of the equation , The nonlinear filter has the least square solution .
The third dimension is the amount of computation ,NLMS The amount of computation is O（M）,M Represents the order of the filter , And the double coupling algorithm will have one more operation O（N2）, Because he has two filters ,N Is the order of the nonlinear filter , The square here is because the least square needs to inverse the matrix , So it's more computationally expensive than linear NLMS It's a lot more computation .
The third is the control mechanism ,NLMS The algorithm has only one filter , Its control is mainly realized by adjusting the step size , It's relatively easy to control . The double coupling algorithm needs to control two sets of filters , The complexity of control is much higher .
3、 ... and 、 Analysis of experimental results
Here I mainly divided two experimental scenarios to compare the double coupling algorithm and NLMS Performance of the algorithm , The first is a single talk test scenario , The second is the double talk test scenario .
First, let's take a look at the test scenario , The first example is for the case of strong nonlinear distortion , The three pictures on the left represent the spectrum of the original signal , NLMS After echo cancellation, the spectrum is obtained 、 The spectrogram of the double coupling algorithm . The deeper the color , It means more energy . The graph on the right represents the echo rejection ratio , The higher the value, the better , The red curve is the echo rejection ratio of the double coupling algorithm , The black line is the standard NLMS The echo rejection ratio of the algorithm .
We can see ,NLMS After the algorithm converges , The echo rejection ratio can only reach 10 About a decibel , Relatively low . And the double coupling algorithm after convergence , You can achieve 25 More than decibels , That is to say, it is better than NLMS There are many algorithms 15 Decibels , This advantage is obvious .
Let's take a second example , For the case of weak nonlinear distortion , On the left is the spectrum , On the right is the echo suppression ratio . We evaluate the performance of monologue mainly by echo rejection ratio and convergence rate . Let's look at it first NLMS Algorithm , After it converges , Maybe it can inhibit 22~25 Decibels . The convergence speed of this algorithm is very slow , About 100 After multiple frames, it will enter the state of relative convergence .
Let's look at the double coupling algorithm , After stabilization , Can inhibit 35~40 Decibels , Than NLMS The algorithm is probably improved 15~20 A DB echo rejection ratio . At the same time, it has an obvious advantage ： Convergence is fast , Almost after the echo came , He's in a state of convergence .
The next figure is a comparison of the echo rejection ratios of different mobile phone models . Red is the double coupling algorithm , Blue is NLMS Algorithm , From this set of data , We can see that the double coupling method is better than NLMS The algorithm has generally improved by about 10 An echo rejection ratio of more than decibels , It has a big advantage .
Finally, we will enter the double talk test scenario . Let me start with an example of testing , This set of data is a video conference data , The picture on the left is the original mic signal spectrum , The picture on the right is a spectrogram of echo reference signals , When we compare the two graphs, we find that , The double passage mainly appears in the middle paragraph .
We evaluate the performance of dual speech mainly by echo rejection ratio and near end speech distortion . The above three pictures are the spectrum after echo cancellation , The picture in the middle is NLMS The result of the algorithm . We can see that its echo suppression is not ideal , Whether it's in a single or double talk , There's a lot of echo residue . And the bottom one is a spectrogram obtained by the double coupling algorithm , It can be seen that echo suppression is relatively clean in both single and double lectures , And in the double talk , The damage to the near end speech is also very small . This data corresponds to the video conference scene , So there's a final step that needs to be done NLP To deal with .
The figure above is based on the double coupling algorithm , Did NLP The output after that . We can see that after processing , The whole spectrum is very clear , The echo went clean , And there's not much damage to the spectrum , The double talk is very thorough .
Four 、 summary
Finally, let me briefly summarize , Today, I mainly introduce three aspects , The first is the recognition of nonlinear acoustic echo 、 Cause of occurrence 、 Research status and technical difficulties .
Next, it focuses on the dual coupling acoustic echo cancellation algorithm of Huawei cloud audio and video , Our main contribution is reflected in two aspects , The first aspect is to construct a dual coupling adaptive filter structure ; The second is to propose the minimum average short-term cumulative error criterion and solve it . After solving the problem , We will find that the linear filter with double coupling filter has Wiener-Hopf The optimal solution of the equation is in this form , Then the nonlinear filter has the least square solution .
Finally, we test the performance of the algorithm through experiments , He was found in the scene of strong nonlinear distortion , In linear scenarios , In addition, we have achieved significant performance improvement in the dual talk scenario ; The echo rejection ratio has been improved 10 More than decibels ; Faster convergence , In about 30 Within milliseconds . But there are also flaws in this algorithm ： Too much computation ; There are many coupling control links , It's a little more complicated .