Top

Published in:

Open Access 2019 | OriginalPaper | Chapter

6. Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)

Authors : Franz Zotter, Matthias Frank

Published in: Ambisonics

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Unlike pressure-gradient transducers, single-transducer microphones with higher-order directivity apparently turned out to be difficult to manufacture at reasonable audio quality. Therefore nowadays, higher-order Ambisonic recording with compact devices is based on compact spherical arrays of pressure transducers. To prepare for higher-order Ambisonic recording based on arrays, we first need a model of the sound pressure that the individual transducers of such an array would receive in an arbitrary surrounding sound field. The lossless, linear wave equation is the most suitable model to describe how sound propagates when the sound field is composed of surrounding sound sources. Fundamentally, the wave equation models sound propagation by how small packages of air react (i) when being expanded or compressed by a change of the internal pressure, and to (ii) directional differences in the outside pressure by starting to move. Based there upon, the inhomogeneous solutions of the wave equation describe how an entire free sound field builds up if being excited by an omnidirectional sound source, as a simplified model of an arbitrary physical source, such as a loudspeaker, human talker, or musical instrument. After adressing these basics, the chapter shows a way to get Ambisonic signals of high spatial and timbral quality from the array signals, considering the necessary diffuse-field equalization, side-lobe suppression, and trade off between spatial resolution and low-frequeny noise boost. The chapter concludes with application examples.

...a turning point has been the design of HOA microphones, opening an exciting experimental field in terms of real 3D sound field recording ...

Jérôme Daniel [1] at Ambisonics Symposium 2009.

Gary Elko and Jens Meyer are the well-known inventors of the first commercially available compact spherical microphone array that is able to record higher-order Ambisonics [2], the Eigenmike. There are several inspiring scientific works with valuable contributions that can be recommended for further reading [3‐12], above all Boaz Rafaely’s excellent introductory book [13].

This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals.

6.1 Equation of Compression

Wave propagation involves reversible short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change $\frac{V}{V_0}$ to the pressure change $p=-K\,\frac{V}{V_0}$ by the bulk modulus of air. After expressing the bulk modulus by more common constants¹ $K=\rho \,c^2$ and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimension $\dot{p} = -\rho \,c^2\;\frac{\partial v_x}{\partial x}$, cf. Appendix A.6.1, we get the three-dimensional compression equation:

$$\begin{aligned} \frac{\partial p}{\partial t}&=-\rho \,c^2\;\varvec{\nabla }^\mathrm {T}\varvec{v}. \end{aligned}$$

(6.1)

Here the inner product of the Del symbol $\varvec{\nabla }^\mathrm {T}=(\frac{\partial }{\partial x},\frac{\partial }{\partial y},\frac{\partial }{\partial z})$ with $\varvec{v}$ yields what is called divergence $\mathrm {div}(\varvec{v})=\varvec{\nabla }^\mathrm {T}\varvec{v}=\frac{\partial v_\mathrm {x}}{\partial x}+\frac{\partial v_\mathrm {y}}{\partial y}+\frac{\partial v_\mathrm {z}}{\partial z}$. The equation means: Independently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time.

6.2 Equation of Motion

The equation of motion is relatively simple to understand from the Newtonian equation of motion, e.g. for the x direction, $F_\mathrm {x}=m\,\frac{\partial v_\mathrm {x}}{\partial t}$ equates the external force to mass m times acceleration, i.e. increase in velocity $\frac{\partial v}{\partial t}$. For a small package of air with constant volume $V_0=\Delta x\Delta y\Delta z$, the mass is obtained by the air density $m=\rho \,V_0$, and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the x direction $F_\mathrm {x}=-[p(x+\Delta x)-p(x)]\Delta y\Delta z$. For the x direction, this yields after expanding by $\frac{\Delta x}{\Delta x}$

$$ -\frac{\Delta p}{\Delta x}\,V_0=\rho \,V_0\,\frac{\partial v_\mathrm {x}}{\partial t}. $$

Dividing by $-V_0$ and letting $V_0\rightarrow 0$, we obtain the typical shape of the equation of motion for all three space directions

$$\begin{aligned} \varvec{\nabla }\,p&=-\rho \,\frac{\partial \varvec{v}}{\partial t}. \end{aligned}$$

(6.2)

The equation of motion means: Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction.

6.3 Wave Equation

We can combine the compression equation $\frac{\partial p}{\partial t}=-\rho \,c^2\;\varvec{\nabla }^\mathrm {T}\varvec{v}$ with the equation of motion $\varvec{\nabla }\,p=-\rho \,\frac{\partial \varvec{v}}{\partial t}$ by deriving the first one with regard to time $\frac{\partial ^2 p}{\partial t^2}=-\rho \,c^2\,\varvec{\nabla }^\mathrm {T}\frac{\partial \varvec{v}}{\partial t}$ and the second one with the gradient $\varvec{\nabla }^\mathrm {T}$ yielding the Laplacian $\varvec{\nabla }^\mathrm {T}\varvec{\nabla }=\bigtriangleup $, hence $\bigtriangleup p=-\rho \varvec{\nabla }^\mathrm {T}\frac{\partial \varvec{v}}{\partial t}$. Division of the first result by $c^2$ and equating both terms yields the lossless wave equation $\bigtriangleup p = \frac{1}{c^2}\frac{\partial ^2}{\partial t^2}p$ that is typically written as

$$\begin{aligned} \Bigl (\bigtriangleup -\frac{1}{c^2}\frac{\partial ^2}{\partial t^2}\Bigr )p&=0. \end{aligned}$$

(6.3)

Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative).

If p is a pure sinusoidal oscillation $\sin (\omega \,t+\phi _0)$, the second derivative in time corresponds to a factor $-\omega ^2$, and by substitution with the wave-number $k=\frac{\omega }{c}$, we can write the frequency-domain wave equation as

$$\begin{aligned} (\bigtriangleup +k^2)\,p&=0,&\text {Helmholtz equation.} \end{aligned}$$

(6.4)

6.3.1 Elementary Inhomogeneous Solution: Green’s Function (Free Field)

The Green’s function is an elementary prototype for solutions to inhomogeneous problems $(\bigtriangleup +k^2)p=-q$, which is defined as

$$\begin{aligned} \bigl (\bigtriangleup +k^2\bigr )G=-\delta . \end{aligned}$$

A general excitation q of the equation can be represented by its convolution with the Dirac delta distribution $\int q(\varvec{s})\,\delta (\varvec{r}-\varvec{s})\, \mathrm {d}V(\varvec{s})=q(\varvec{r})$. Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green’s function with the excitation function $p(\varvec{r})=\int q(\varvec{s})\,G(\varvec{r}-\varvec{s})\,\mathrm {d}V(\varvec{s})$ over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time.

The Green’s function in three dimensions is derived in Appendix A.6.3, Eq. (A.91),

$$\begin{aligned} G&=\frac{e^{-\mathrm {i}k\,r}}{4\pi r}, \end{aligned}$$

(6.5)

with the wave number $k=\frac{\omega }{c}$ and distance between source and receiver $r=\sqrt{\Vert \varvec{r}-\varvec{r}_\mathrm {s}\Vert ^2}$.

Acoustic source phenomena are characterized by the behavior of the Green’s function: far away, the amplitude decays with $\frac{1}{r}$ and the phase $-kr=-\omega \frac{r}{c}$ corresponds to the radially increasing delay $\frac{r}{c}$. Both is expressed in Sommerfeld’s radiation condition $\lim _{r\rightarrow \infty }r\bigl (\frac{\partial }{\partial r}p+\mathrm {i}k\,p\bigr )=0$.

Plane waves. The radius coordinate of the Green’s function is the distance between two Cartesian position vectors $\varvec{r}_\mathrm {s}$ and $\varvec{r}$, the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector $\varvec{r}_\mathrm {s}=r_\mathrm {s}\varvec{\theta }_\mathrm {s}$. This permits far-field approximation

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17207-7_6/MediaObjects/472601_1_En_6_Equ6_HTML.png

(6.6)

For the phase approximation, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m $+$ 15 cm, we could change the sign of the wave. To approximate the phase of the Green’s function, we must therefore at least use $r_\mathrm {s}-\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{r}$ as approximation. By contrast, this level of precision is irrelevant for the magnitude approximation, e.g., it would be negligible if we used $\frac{1}{15\,\mathrm {m}}$ instead of the magnitude $\frac{1}{15\,\mathrm {m}+15\,\mathrm {cm}}$.

At a large distance $r_\mathrm {s}$ assumed to be constant, the Green’s function is proportional to a plane wave from the source direction $\varvec{\theta }_\mathrm {s}$

$$\begin{aligned} \lim _{r_\mathrm {s}\rightarrow \infty } G = {\textstyle \frac{e^{-\mathrm {i}k\,r_\mathrm {s}}}{4\pi \,r_\mathrm {s}}}\; e^{\mathrm {i}k\,\varvec{\theta }_\mathrm {s}^\mathrm {T}\,\varvec{r}}. \end{aligned}$$

(6.7)

The plane-wave part is of unit magnitude $|p|=1$

$$\begin{aligned} p=e^{\mathrm {i}k\,\varvec{\theta }_\mathrm {s}^\mathrm {T}\,\varvec{r}} \end{aligned}$$

(6.8)

and its phase evaluates the projection of the position vector onto the plane-wave arrival direction $\varvec{\theta }_\mathrm {s}$. Towards the direction $\varvec{\theta }_\mathrm {s}$, the phase grows positive, i.e. the signal arrives earlier. Towards the plane-wave propagation direction $-\varvec{\theta }_\mathrm {s}$ the phase grows negatively, implying an increasing time delay, which is constant on any plane perpendicular to $\varvec{\theta }_\mathrm {s}$.

Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region.²

6.4 Basis Solutions in Spherical Coordinates

Figure 4.11 shows spherical coordinates [14, 15] using radius r, azimuth $\varphi $, and zenith $\vartheta $. For simplification, zenith is replaced by $\zeta =\cos \vartheta =\frac{z}{r}$, here. We may solve the Helmholtz equation $(\bigtriangleup +k^2)p=0$ in spherical coordinates by the radial and directional parts of the Laplacian $\bigtriangleup =\bigtriangleup _\mathrm {r}+\bigtriangleup _{\upvarphi ,\upzeta }$, as identified in Appendix A.3

$$\begin{aligned} \bigtriangleup _\mathrm {r}&=\frac{\partial ^2}{\partial r^2}+\frac{2}{r}\frac{\partial }{\partial r},&\bigtriangleup _{\upvarphi ,\upzeta }&=\frac{1-\zeta ^2}{r^2}\frac{\partial ^2}{\partial \zeta ^2} - \frac{2}{r^2}\zeta \frac{\partial }{\partial \zeta }+\frac{1}{r^2(1-\zeta ^2)}\frac{\partial ^2}{\partial \varphi ^2}. \end{aligned}$$

(6.9)

We already know the spherical harmonics as directional eigensolutions from Sect. 4.7

$$\begin{aligned} \bigtriangleup _{\upvarphi ,\upzeta }Y_n^m=-\frac{n(n+1)}{r^2}\,Y_n^m \end{aligned}$$

(6.10)

and assume them to be a factor of the solution $p_n^m=R\,Y_n^m$ determining the value of $\bigtriangleup _{\upvarphi ,\upzeta }$ in $ (\bigtriangleup _\mathrm {r}+k^2+\bigtriangleup _{\upvarphi ,\upzeta })p_n^m=0$. We find a separated radial differential equation after insertion, multiplication by $\frac{r^2}{Y_n^m}$, and re-expressing the differentials $\frac{\partial }{\partial r}=k\frac{\partial }{\partial kr}$ and $\frac{\partial ^2}{\partial r^2}=k^2\frac{\partial ^2}{\partial (kr)^2}$

$$\begin{aligned} \left[ (kr)^2\frac{\partial ^2}{\partial (kr)^2}+2(kr)\frac{\partial }{\partial (kr)}+(kr)^2-n(n+1)\right] R&=0. \end{aligned}$$

(6.11)

Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical Bessel differential equation: spherical Hankel functions of the second kind $h_n^{(2)}(kr)$ able to represent radiation (radially outgoing into every direction), consistently with Green’s function G, diverging with an $(n+1)$-fold pole at $kr=0$, a physical behavior that would also be observed after spatially differentiating G, see Fig. 6.1; spherical Bessel functions $j_n(kr)=\mathfrak {R}\{h_n^{(2)}(kr)\}$ are real-valued, converge everywhere, exhibit an n-fold zero at $kr=0$, and can’t represent radiation. Implementations typically rely on the accurate standard libraries implementing cylindrical Bessel and Hankel functions:

$$\begin{aligned} j_n(kr)&=\sqrt{\frac{\pi }{2}\frac{1}{kr}}\,J_{n+\frac{1}{2}}(kr),&h_n^{(2)}(kr)&=\sqrt{\frac{\pi }{2}\frac{1}{kr}}\,H^{(2)}_{n+\frac{1}{2}}(kr). \end{aligned}$$

(6.12)

Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed $j_n(kr)Y_n^m(\varvec{\theta })$ and radiating fields $h_n(kr)Y_n^m(\varvec{\theta })$

$$\begin{aligned} p&=\sum _{n=0}^\infty \sum _{m=-n}^n\bigl [b_{nm}j_n\left( kr\right) +c_{nm}h_n\left( kr\right) \bigr ]Y_n^m\left( \varvec{\theta }\right) . \end{aligned}$$

(6.13)

Here, $b_{nm}$ are the coefficients for incoming waves that pass through and emanate from radii larger than r and $c_{nm}$ are the coefficients of outgoing waves radiating from sources at radii smaller than r; the coefficients are called wave spectra of the incoming and outgoing waves, cf. [16].

Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coefficients $b_{nm}$, while $c_{nm}=0$ in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients $\chi _{nm}$ as a set of Ambisonic signals are described by the incoming wave spectrum, see Appendix A.6.5, Eq. (A.119)

$$\begin{aligned} b_{nm}&=4\pi \,\mathrm {i}^n\;\chi _{nm}. \end{aligned}$$

(6.14)

Figure 6.2 shows a single plane wave incoming from the direction $\varvec{\theta }_\mathrm {s}$ represented by

$$\begin{aligned} b_{nm}=4\pi \,\mathrm {i}^n\;Y_n^m(\varvec{\theta }_\mathrm {s}) \end{aligned}$$

(6.15)

at four different time steps corresponding to $0^\circ $, $60^\circ $, $120^\circ $ and $180^\circ $ time shifts for the two wave lengths shown.

6.5 Scattering by Rigid Higher-Order Microphone Surface

Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere of some radius $r=\mathrm {a}$, such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of the sound particle velocity. The radial sound particle velocity is obtained via the equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions $j_n'(x)$ as well as $h_n'^{(2)}(x)$, which is implemented by $f'_n(x)=\frac{n}{x}f_n(x)-f_{n+1}(x)$ for either of the functions, cf. e.g. [16]. A sound-hard boundary condition at the radius $\mathrm {a}$ requires

$$\begin{aligned} v_\mathrm {r}\bigr |_{r=\mathrm {a}}&=\frac{\mathrm {i}}{\rho \,c}\sum _{n=0}^\infty \sum _{m=-n}^n\bigl [b_{nm}\,j_n'(kr) +c_{nm}\,h_n'^{(2)}(kr) \bigr ]_{r=\mathrm {a}}Y_n^m(\varvec{\theta })=0, \end{aligned}$$

which is fulfilled by a vanishing term in square brackets. The rigid boundary responds to incoming surround-sound by velocity-canceling outgoing waves $h_n'^{(2)}(k\mathrm {a})\,c_{nm}=-{j_n'(k\mathrm {a})}\,b_{nm}.$ The coefficients $\psi _{nm}$ yield the sound pressure in Fig. 6.4,

$$\begin{aligned} p&=\sum _{n=0}^\infty \sum _{m=-n}^n\psi _{nm}\,Y_n^m(\varvec{\theta }),&\text {with } \psi _{nm}&=\Bigl [j_n(kr)-h_n^{(2)}(kr){\frac{j_n'(k\mathrm {a})}{h_n'^{(2)}(k\mathrm {a})}}\Bigr ]_{r=\mathrm {a}}\,b_{nm}. \end{aligned}$$

(6.16)

The two terms of the bracket are typically further simplified by a common denominator and recognizing the Wronskian Eq. (A.97) in the numerator $\frac{j_n(x)h_n'(x)-j_n'(x)h_n(x)}{h_n'(x)}=\frac{\mathrm {i}}{x^2h_n'(x)}$

$$\begin{aligned} \psi _{nm}|_{r=\mathrm {a}}&=\frac{\mathrm {i}}{(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})}\,b_{nm}. \end{aligned}$$

(6.17)

Relation of recorded sound pressure to Ambisonic signal. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5,

$$\begin{aligned} \psi _{nm}|_{r=\mathrm {a}}&=\frac{4\pi \,\mathrm {i}^{n+1}}{(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})}\,\chi _{nm}. \end{aligned}$$

(6.18)

It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals $\psi _{nm}$, the Ambisonic signals $\chi _{nm}$ of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing.

Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger than the diameter $2\mathrm {a}$ of the sphere will only weakly map to complicated high-order patterns. It is therefore easily understood that the transfer function $\mathrm {i}^{n+1}[(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})]^{-1}$ attenuates the reception of high-order Ambisonic signals at low frequencies, see Fig. 6.5.

6.6 Higher-Order Microphone Array Encoding

The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig. 6.7. The first processing step is about decomposing the pressure samples $\varvec{p}(t) $ from the microphone array into its spherical harmonics coefficients $\varvec{\psi }_\mathrm {N}(t)$: To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix $(\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger $ does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section.

The second step then sharpens the sound pressure image to an Ambisonic signal by filtering the spherical harmonic coefficient signals. The basic relation between sound pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a filter for every coefficient signal, differing only in filter characteristics for different spherical harmonic orders. Robustness to noise, microphone matching and positioning is the key here, and only achieved by the careful design of these filters, as shown in a further sections below. The design considers a gradually increasing sharpening over frequency, for which it moreover employs a filter bank with separate, max-$\varvec{r}_\mathrm {E}$ weighted and E normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes.

6.7 Discrete Sound Pressure Samples in Spherical Harmonics

To determine the Ambisonics signals $\chi _{nm}$, we obviously need to find $\psi _{nm}$ based on all sound pressure samples $p(\varvec{\uptheta }_i)$ recorded by the microphones distributed on the rigid-sphere array. To accomplish this, we set up a system of model equations equating the pressure samples to the unknown coefficients $\psi _{nm}$ expanded over the spherical harmonics $Y_n^m(\varvec{\uptheta }_i)$ sampled at every microphone position. A vector and matrix notation $\varvec{p}=[p(\varvec{\uptheta }_i)]_i$ and $\varvec{Y}_\mathrm {N}^\mathrm {T}= [\varvec{y}(\varvec{\uptheta }_i)^\mathrm {T}]_{i,nm}$ is helpful

$$\begin{aligned} \begin{bmatrix} p(\varvec{\uptheta }_1)\\\vdots \\p(\varvec{\uptheta }_\mathrm {M})\end{bmatrix}&=\begin{bmatrix} Y_0^0(\varvec{\theta }_1)&\dots&Y_\mathrm {N}^\mathrm {N}(\varvec{\theta }_1)\\ \vdots&\vdots&\vdots \\ Y_0^0(\varvec{\theta }_\mathrm {M})&\dots&Y_\mathrm {N}^\mathrm {N}(\varvec{\theta }_\mathrm {M}) \end{bmatrix} \begin{bmatrix} \psi _{00}\\ \vdots \\ \psi _{\mathrm {NN}} \end{bmatrix}\nonumber \\ \varvec{p}_\mathrm {N}&=\varvec{Y}_\mathrm {N}^\mathrm {T}\,\varvec{\psi }_\mathrm {N}. \end{aligned}$$

(6.19)

Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix $\varvec{Y}_\mathrm {N}$ is well conditioned. Typically more microphones are used than coefficients searched $\mathrm {M}\ge (\mathrm {N}+1)^2$. Inversion is a matter of mean-square error minimization: As the $\mathrm {M}$ dimensions may contain more degrees of freedom than $(\mathrm {N}+1)^2$, the coefficient vector $\varvec{\psi }_\mathrm {N}$ giving the closest model $\varvec{p}_\mathrm {N}$ to the measurement $\varvec{p}$ is searched,

$$\begin{aligned} \min _{\varvec{\psi }_\mathrm {N}}&\Vert \varvec{e}\Vert ^2,&\text {with } \varvec{e}&=\varvec{p}_\mathrm {N}-\varvec{p}=\varvec{Y}_\mathrm {N}^\mathrm {T}\,\varvec{\psi }_\mathrm {N}-\varvec{p}. \end{aligned}$$

(6.20)

The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65),

$$\begin{aligned} \varvec{\psi }_\mathrm {N}&=(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\varvec{Y}_\mathrm {N}\;\varvec{p}=(\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \;\varvec{p}. \end{aligned}$$

(6.21)

The resulting left inverse $(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\varvec{Y}_\mathrm {N}$ inverts the thin matrix $\varvec{Y}_\mathrm {N}^\mathrm {T}$ from the left. $(\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger $ symbolizes the pseudo inverse; it is left-inverse for thin matrices.

If the microphones are arranged in a t-design and the order $\mathrm {N}$ is chosen suitably, then the transpose matrix times $\frac{4\pi }{\mathrm {L}}$ is equivalent to the left inverse. A more thorough discussion on spherical point sets can be found in [17‐19].

The maximum determinant points [20] are a particular kind of critical directional sampling scheme that allows to use exactly as few microphones $\mathrm {M}=(\mathrm {N}+1)^2$ as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix $\varvec{Y}_\mathrm {N}$, so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for $\mathrm {N}=4$ are used in the simulation example below.³

Finite-order assumption and spatial aliasing. An important implication of estimating $\psi _{nm}$ is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated well-enough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept spatial aliasing at high frequencies, i.e. directional mapping errors and direction-specific comb filters. Figure 6.8 shows spatial aliasing of $\varvec{\psi }_\mathrm {N}=(\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\,\varvec{p}$ in the angular domain $p=\sum \psi _{nm}Y_n^m$.

6.8 Regularizing Filter Bank for Radial Filters

The filters $\mathrm {i}^{n}\bigl [(ka)^2\,h_n'^{(2)}(ka)\bigr ]^{-1}$ of Fig. 6.5 exhibit an $n\mathrm {th}$-order zero at 0 Hz, $k\mathrm {a}=0$. To retrieve the Ambisonic signals $\chi _{nm}$ from the sound pressure signals $\psi _{nm}$, their inverse would have a n-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected $n\mathrm {th}$-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders n must be stabilized by high-pass slopes of at least the order n, see also [6, 9, 21‐25], and with $(n+1)\mathrm {th}$-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a noise boost of 20 dB for a $4\mathrm {th}$-order microphone with $\mathrm {a}=4.2$ cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, $n\mathrm {th}$-order Butterworth high-pass response is defined by $H_\mathrm {hi}=\frac{\omega ^n}{1+\omega ^n}$ and amplitude-complementary to the low pass $H_\mathrm {lo}=\frac{1}{1+\omega ^n}$, so that $H_\mathrm {hi}+H_\mathrm {lo}=1$.

Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The band-pass filters $H_b(\omega )$ are composed of a $(b+1)\mathrm {th}$-order high- and $(b+2)\mathrm {th}$-order low-pass skirt at $\omega _b$, and $\omega _{b+1}$, respectively, except for the band $b=0$ (low-pass) and $b=\mathrm {N}$ (high-pass)

$$\begin{aligned} \hat{H}_0(\omega )&=\frac{1}{1+\bigl (\frac{\omega }{\omega _{1}}\bigr )^{2}} ,&\hat{H}_b(\omega )&=\frac{\bigl (\frac{\omega }{\omega _{b}}\bigr )^{b+1}}{1+\bigl (\frac{\omega }{\omega _{b}}\bigr )^{b+1}}\frac{1}{1+\bigl (\frac{\omega }{\omega _{b+1}}\bigr )^{b+2}} ,&\hat{H}_\mathrm {N}(\omega )&=\frac{\bigl (\frac{\omega }{\omega _{\mathrm {N}}}\bigr )^{\mathrm {N}+1}}{1+\bigl (\frac{\omega }{\omega _{\mathrm {N}}}\bigr )^{\mathrm {N}+1}}. \end{aligned}$$

(6.22)

To make the bands perfectly reconstructing, filters are normalized by the sum response

$$\begin{aligned} H_b=\frac{\hat{H}_b}{\sum _{b=0}^\mathrm {N} \hat{H}_b(\omega )}. \end{aligned}$$

(6.23)

By adjusting the cut-on frequencies $\omega _b$ of the different orders $b=1,\dots ,\mathrm {N}$, the noise and mapping behavior of the microphone array is adjusted; only the zeroth order is present in every band down to 0 Hz.

This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.

6.9 Loudness-Normalized Sub-band Side-Lobe Suppression

The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-$\varvec{r}_\mathrm {E}$ sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the E measure is desirable in every band.

To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26, 27], that are modified by a filter bank employing diffuse-field normalized max-$\varvec{r}_\mathrm {E}$-weights in separate frequency bands $b=0,\dots ,\mathrm {N}$, cf. Fig. 6.11, with the $n\mathrm {th}$ order discarded for bands below $b<n$:

$$\begin{aligned} \rho _n(\omega )&=\left[ \sum _{b=n}^{\mathrm {N}}a_{n,b}\,H_b(\omega )\right] \,\mathrm {i}^{-n-1}\,(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})\,e^{\mathrm {i}k\mathrm {a}}. \end{aligned}$$

(6.24)

Here, $e^{\mathrm {i}k\mathrm {a}}$ removes the linear phase of $h_n'^{(2)}$, and $a_{n,b}$ is the set of diffuse-field ($\sqrt{E}$) equalized max-$\varvec{r}_\mathrm {E}$ weights for the band b in which the Ambisonic orders retrieved are $0\le n\le b$

$$\begin{aligned} a_{n,b}&={\left\{ \begin{array}{ll} P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )\;\sqrt{\frac{\sum _{n=0}^\mathrm {N}(2n+1)\bigl [P_n\bigl (\cos \frac{137.9^\circ }{\mathrm {N}+1.51}\bigr )\bigr ]^2}{\sum _{n=0}^b(2n+1)\bigl [P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )\bigr ]^2}} , &{} \text {for }n\le b\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

(6.25)

Figure 6.12 shows the polar patterns of the corresponding direction-spread functions.

For the implementation of $\rho _n(\omega )$ by fast block filtering, $\omega =2\pi \,f$ and $k=\omega /c$ are uniformly sampled with frequency, and the inverse discrete Fourier transform yields the associated impulse responses (attention: the value at 0 Hz must be replaced for stable results, and cyclic time-domain shifts and windows are necessary).

The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it has minimal side lobes.

6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression

Typical gain mismatch between the microphones is not always more accurate than 0.5 dB. The result is that the physically dominant omnidirectional signal will leak into the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification. The effect on mapping is equivalent to one of microphone self noise, however gain mismatch yields a correlated signal exciting the microphones, whereas self-noise yields low-frequency noise.

If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted.

If a noise-free case is assumed, and only the max-$\varvec{r}_\mathrm {E}$ side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-$\varvec{r}_\mathrm {E}$ weights in Fig. 6.14c.

Self-noise behavior. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the $\mathrm {M}$ microphone signals $p_i=\mathcal {N}$ into the $(\mathrm {N}+1)^2$ spherical harmonic coefficient signals $\psi _{nm}=\frac{(\mathrm {N}+1)^2}{\mathrm {M}}\mathcal {N}$, if $\mathrm {M}\approx (\mathrm {N}+1)^2$ and the microphone arrangement permits a well-conditioned pseudo inversion $\varvec{Y}_\mathrm {N}^\dagger $. The spectral change of the microphone self noise due to the radial filters $\rho _n(\omega )$ can be described by the noise of the $(2n+1)$ signals of the same order, amplified by $|\rho _n(\omega )|^2$, in comparison to the zeroth-order signal:

$$\begin{aligned} |G(\omega )|^2&=\frac{ \sum _{n=0}^\mathrm {N}(2n+1)|\rho _n(\omega )|^2}{ |(k\mathrm {a})^2\,h_0'^{(2)}(k\mathrm {a})|^2}. \end{aligned}$$

(6.26)

Figure 6.15 analyzes the noise amplification for the simulation example (max-$\varvec{r}_\mathrm {E}$ weighting in each sub band, $\mathrm {a}=4.2$ cm) and shows the dependency on exemplary cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise boosts. The trade here is: the more noise boost one can allow, the more directional resolution one gets, see Fig. 6.16.

Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a $12^\circ \times 11.25^\circ $ azimuth$\times $ zenith grid, yielding $480\times 256$ pt impulse responses for each of the 32 transducers.

6.11 Practical Free-Software Examples

6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites

We give a practical signal processing example for the Eigenmike em32 which is applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The radial filtering that sharpens the surround sound image uses mcfx-convolver, see Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different filter curves for the orders $n=0,\dots ,4$ as defined above. The convolver presets (wav files and config files for mcfx-convolver) are provided online http://phaidra.kug.ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB.

As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of $-5$ dB, and a high shelf at 5 kHz using a gain of $+5$ dB, see Fig. 6.18.

6.11.2 SPARTA Array2SH

The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radial-focusing filters based on the geometry of the array. It supports rigid and open arrays and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

previous chapter Signal Flow and Effects in Ambisonic Productions

next chapter Compact Spherical Loudspeaker Arrays

Typical constants are: density $\rho =1.2$ kg/m$^3$, speed of sound $c=343$ m/s.

This is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infinite-amplitude point-source infinitely far away is required with infinite anticipation $t_\mathrm {s}\rightarrow +\infty $ (non-causal).

md04.0025 on https://web.maths.unsw.edu.au/~rsw/Sphere/Images/MD/md_data.html.

J. Daniel, Evolving views on HOA: from technological to pragmatic concernts, in Proceedings of the 1st Ambisonics Symposium (Graz, 2009)

G.W. Elko, R.A. Kubli, J. Meyer, Audio system based on at least second-order eigenbeams, in PCT Patent, vol. WO 03/061336, no. A1 (2003)

G.W. Elko, Superdirectional microphone arrays, in Acoustic Signal Processing for Telecommunication, ed. by J. Benesty, S.L. Gay (Kluwer Academic Publishers, Dordrecht, 2000)CrossRef

J. Meyer, G. Elko, A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. Proceedings.(ICASSP’02), vol. 2 (Orlando, 2002)

G.W. Elko, Differential microphone arrays, in Audio Signal Processing for Next-Generation Multimedia Communication Systems, ed. by Y. Huang, J. Benesty (Springer, Berlin, 2004)

J. Daniel, S. Moreau, Further study of sound ELD coding with higher order ambisonics, in 116th AES Convention (2004)

S.-O. Petersen, Localization of sound sources using 3d microphone array, M. Thesis, University of South Denmark, Odense (2004). www.oscarpetersen.dk/speciale/Thesis.pdf

B. Rafaely, Analysis and design of spherical microphone arrays. IEEE Trans. Speech Audio Process. (2005)

S. Moreau, Étude et réalisation d’outils avancés d’encodage spatial pour la technique de spatialisation sonore Higher Order Ambisonics: microphone 3d et contrôle de distance, Ph.D. Thesis, Université du Maine (2006)

10.

H. Teutsch, Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition (Springer, Berlin, 2007)MATH

11.

Z. Li, R. Duraiswami, Flexible and optimal design of spherical microphone arrays for beamforming. IEEE Trans. ASLP 15(2) (2007)CrossRef

12.

W. Song, W. Ellermeier, J. Hald, Using beamforming and binaural synthesis for the psychoacoustical evaluation of target sources in noise. J. Acoust. Soc. Am. 123(2) (2008)CrossRef

13.

B. Rafaely, Fundamentals of Spherical Array Processing, 2nd edn. (Springer, Berlin, 2019)CrossRef

14.

ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978)

15.

ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009)

16.

E.G. Williams, Fourier Acoustics (Academic, Cambridge, 1999)

17.

B. Rafaely, B. Weiss, E. Bachmat, Spatial aliasing in spherical microphone arrays. IEEE Trans. Signal Process. 55(3) (2007)MathSciNetCrossRef

18.

F. Zotter, Sampling strategies for acoustic holography/holophony on the sphere, in NAG-DAGA, Rotterdam (2009)

19.

P. Lecomte, P.-A. Gauthier, C. Langrenne, A. Berry, A. Garcia, A fifty-node Lebedev grid and its applications to ambisonics. J. Audio Eng. Soc. 64(11) (2016)CrossRef

20.

I.H. Sloan, R.S. Womersley, Extremal systems of points and numerical integration on the sphere. Adv. Comput. Math. 21, 107–125 (2004)MathSciNetCrossRef

21.

B. Bernschütz, C. Pörschmann, S. Spors, Soft-limiting bei modaler amplitudenverstärkung bei sphärischen mikrofonarrays im plane-wave decomposition verfahren, in Fortschritte der Akustik - DAGA (2011)

22.

T. Rettberg, S. Spors, On the impact of noise introduced by spherical beamforming techniques on data-based binaural synthesis, in Fortschritte der Akustik - DAGA (2013)

23.

T. Rettberg, S. Spors, Time-domain behaviour of spherical microphone arrays at high orders, in Fortschritte der Akustik - DAGA (2014)

24.

B. Rafaely, Fundamentals of Spherical Array Processing, 1st edn. (Springer, Berlin, 2015)CrossRef

25.

D.L. Alon, B. Rafaely, Spatial decomposition by spherical array processing, in Parametric Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017)

26.

S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic recording, in Fortschritte der Akustik – DAGA Nürnberg (2015)

27.

F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: The icosahedral loudspeaker. Comput. Music J. 41(3) (2017)CrossRef

28.

F. Zotter, M. Frank, C. Haar, Spherical microphone array equalization for ambisonics, in Fortschritte der Akustik - DAGA (Nürnberg, 2015)

Title: Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
Authors: Franz Zotter
Matthias Frank
Publisher: Springer International Publishing
Book: Ambisonics
Print ISBN: 978-3-030-17206-0

Electronic ISBN: 978-3-030-17207-7

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-17207-7_6