WO1994019791A1

WO1994019791A1 - Improved filter for use in audio compression and decompression systems

Info

Publication number: WO1994019791A1
Application number: PCT/US1994/002026
Authority: WO
Inventors: Sriram Jayasimha
Original assignee: Aware, Inc.
Priority date: 1993-02-18
Filing date: 1994-02-16
Publication date: 1994-09-01
Also published as: AU6249994A; IL108683A0

Abstract

A method and apparatus for improving the computational efficiency of perfect reconstruction audio filter banks is disclosed. The i nvention implements an analysis filter bank (350) for generating (324) a plurality of sub-band signals from an input (320) audio signal utilizing a modified windowing transformation followed by a discrete cosine transform. Since the discrete cosine transform can be performed in a computationally efficient manner, a net savings in the computational complexity of the system is obtained relative to prior art filter systems which use a cosine modulation of a set of polyphase components obtained from a windowing transformation. The synthesis filter bank which recombines the sub-band signal to regenerate the original audio signal is implemented with the aid of the inverse discrete cosine transformation thereby providing a significant computational savings.

Description

TMPROVED FILTER FOR USE IN AUDIO COMPRESSION AND DECOMPRESSION SYSTEMS

Field of the Invention

The present invention relates to audio compression and decompression systems, and more particularly, to an improved filter design which reduces the computational workload inherent in audio compression and decompression systems.

Background of the Invention

While digital audio recordings provide many advantages over analog systems, the data storage requirements for high-fidelity recordings are substantial. A high fidelity recording typically requires more than one million bits per second of playback time. The total storage needed for even a short recording is too high for many computer applications. In addition, the digital bit rates inherent in non- compressed high-fidelity audio recordings makes the transmission of such audio tracks over limited bandwidth transmission systems difficult. Hence, systems for compressing audio sound tracks to reduce the storage and bandwidth requirements are in great demand.

One class of audio compression systems divides the sound track into a series of segments. Over the time interval represented by each segment, the sound track is analyzed to determine the signal components in each of a plurality of frequency bands. The measured components are then replaced by approximations requiring fewer bits to represent, but which preserve features of the sound track that are important to a human listener. At the receiver, an approximation to the original sound track is generated by reversing the analysis process with the approximations in place of the original signal components.

The analysis and synthesis operations are normally carried out with the aid of perfect, or near perfect, reconstruction filter banks. The systems in question include an analysis filter bank which generates a set of decimated sub-band outputs from a segment of the sound track. Each decimated sub-band output represents the signal in a predetermined frequency range. The inverse operation is carried out by a synthesis filter bank which accepts a set of decimated sub-band ovtputs and generates therefrom a segment of audio sound track. In practice, the synthesis and analysis filter banks are implemented on digital computers which may be general purpose computers or special computers designed to more efficiently carry out the operations. If the analysis and synthesis operations are carried out with sufficient precision, the segment of audio sound track generated by the synthesis filter bank will match the original segment of audio sound track that was inputted to the analysis filter bank. The differences between the reconstructed audio sound track and the original sound track can be made arbitrarily small.

The use of high quality audio on computer platforms has been limited by the enormous data rate required for storage and playback. Although some increase in performance has been gained using conventional audio filter banks, these improvements have not been sufficient to allow playback of high fidelity recordings on the commonly used computer platforms without the addition of expensive special purpose hardware or significant reduction in audio quality.

Broadly, it is the object of the present invention to provide an improved audio filter bank system.

It is a further object of the present invention to provide an audio filter bank system which requires less computational resources to analyze and synthesize aπudlϊiro* t wiraoxvre f f/orrmmes.

These and other objects of the present invention will become apparent from th ιee following detailed description of the invention and the accompanying drawing gss.

Summary of the Invention

The present invention is based on the observation that the method by which polyphase quadrature filter banks operate may be reorganized to allow the cosine modulation of the polyphase components to be accomplished by discrete cosine transformation of a set of modified polyphase components. In the present invention, the most recently received W input digital audio signal values of an audio signal are stored in the apparatus. These signal values are used to generate M sub-band component signals, where W≥M by generating M modified polyphase components from the stored input digital audio signal values and then forming differences of sums of the weighted values. The sub-band component signals are then generated from the discrete cosine transform of the modified polyphase components. Since the discrete cosine transformation may be implemented in a computationally more efficient manner than the cosine modulation used in the prior art, a net computational savings is achieved. The resynthesis of the input signal from a set of sub-band analyzed signals may also be accomplished with the aid of discrete cosine transforms which results in a similar computational savings in the synthesis operations.

Brief Description of the Drawings

Figure 1 is a block diagram of an audio compression system.

Figure 2 illustrates the relationship of two overlapping audio segments.

Figure 3(a) is a block diagram of a single filter constructed from a low- frequency bandpass filter and a mixer.

Figure 3(b) is a block diagram of a sub-band analysis filter for generating a set of M frequency components, Sj, from a W sample window.

Figure 4 is a block diagram of a sub-band analysis filter according to the present invention.

Figure 5 is a block diagram of a synthesis filter bank according to the present invention.

Detailed Description of the Invention

The manner in which the present invention obtains its advantages may be more easily understood with reference to the manner in which a conventional audio compression system operates. Figure 1 is a block diagram of an audio compression system 10 using a conventional sub-band analysis system. The audio compression system accepts an input signal 11 which is divided into a plurality of segments 19. Each segment is aralyzed by a filter bank 12 which provides the frequency components for the segment. Each frequency component is a time average of the amplitude of the signal in a corresponding frequency band. The time average is, in general, a weighted average. The frequencies of the sub-bands are uniformly distributed between a minimum and maximum value which depend on the number of samples in each segment 19 and the rate at which samples are taken. The input signal is preferably digital in nature; however, it will be apparent to those skilled in the art that an analog signal may be used by including an analog-to-digital converter prior to filter bank 12.

The component waveforms generated by filter bank 12 are replaced by digital approximations by quantizer 14. The number of bits assigned to each amplitude is determined by a psycho-acoustic analyzer 16 which utilizes information about the auditory system to minimize the distortions introduced by the quantization. The quantized frequency components are then further coded by coder 18 which makes use of the redundancy in the quantized components to further reduce the number of bits needed to represent the coded coefficients. Coder 18 does not introduce further errors into the frequency components. Coding algorithms are well known to those skilled in the signal compression arts, and hence, will not be discussed in more detail here.

The manner in which the input signal is divided into segments can effect the quality of the regenerated audio signal. Consider the case in which the signal is analyzed on segments that do not overlap. This analysis is equivalent to employing a model in which the regenerated signal is produced by summing the signals of a number of harmonic oscillators whose amplitudes remain constant over the duration of the segment on which each amplitude was calculated. In general, this model is a poor approximation to an actual audio track. In general, the amplitudes of the various frequency components would be expected to change over the duration of the segments in question. Models that do not take this change into account will have significantly greater distortions than models in which the amplitudes can change over the duration of the segment, since there will be abrupt changes in the amplitudes of the frequency components at each segment boundary. One method for reducing the discontinuities in the frequency component amplitudes at the segment boundaries is to employ a sub-band analysis filter that utilizes overlapping segments to generate successive frequency component amplitudes. The relationship ofthe segments is shown in Figure 2 for a signal 301. The sub-band analysis filter generates M frequency components for signal 301 for each M signal values. However, each frequency component is generated over a segment having a duration much greater than M. Each component is generated over a segment having a length of W sample values, where W>M. Typical segments are shown at 312 and 313. It should be noted that successive segments overlap by (W-M) samples.

The various frequency bands in a sub-band analysis filter bank preferably have the same shape but are shifted relative to one another. This arrangement guarantees that all frequency bands have the same aliasing properties. Such a filter bank can be constructed from a single low frequency bandpass filter having the desired band shape. The manner in which the various filter bands are constructed is most easily understood with reference to Figure 3(a) which is a block diagram of a single filter constructed from a low-frequency bandpass filter 377 and a mixer 376. Assume that the low-pass filter 377 has a center frequency of F_c and that the desired center frequency of filter 350 is to be F. Then by shifting the input audio signal by a frequency of F-F_c prior to analyzing the signal with low-frequency bandpass filter 377, the output of low-frequency bandpass filter 377 will be the amplitude ofthe audio signal in a band having a center frequency of F. Modulator 376 accomplishes this frequency shift.

A filter bank can then be constructed from a single prototype low- frequency bandpass filter by using different modulation frequencies to shift the incoming audio signal prior to analysis by the prototype filter. While such a filter bank can be constructed from analog circuit components, it is difficult to obtain filter performance ofthe type needed. Hence, the preferred embodiment ofthe present invention utilizes digital filter techniques.

A block diagram of a sub-band analysis filter 350 for generating a set of M frequency components, S[, from a W sample window is shown in Figure 3(b). The M audio samples are clocked into a W-sample shift register 320 by controller 325. The oldest M samples in shift register 320 are shifted out the end ofthe shift register and discarded. The contents ofthe shift register are then used to generate 2M polyphase components P^, for k=0 to 2M-1. The polyphase components are generated by a windowing operation followed by partial summation. The windowing operation generates a W- component array Zj from the contents of shift register 320 by multiplying each entry in the shift register by a corresponding weight, i.e.,

Zi = hi*xi (1)

where the xj, for i=0..W-l are the values stored in shift register 320, and the h_j are coefficients of a low-pass prototype filter which are stored in controller 325. For those wishing a more detailed explanation ofthe process for generating sets of filter coefficients, see J. Rothweiler, "POLYPHASE QUADRATURE FILTERS - A NEW SUB-BAND CODING TECHNIQUE" IEEE Proceedings ofthe 1983 ICASSP Conference, pp. 1280-1283. The polyphase components are then generated from the Z[ by the following summing operations:

The frequency components, Sj, are obtained via the following matrix multiplication from the polyphase components

This operation is equivalent to passing the polyphase components through M finite impulse response filters of length 2M. The cosine modulation ofthe polyphase components shown in Eq. (3a) may be replaced by other such modulation terms. The form shown in Eq. (3a) leads to near-perfect reconstruction. An alternative modulation scheme which allows for perfect reconstruction is as follows:

It can be seen by comparison to Figure 3(a) that the matrix multiplication provides an operation analogous to the modulation ofthe incoming audio signal. The windowing operation performs the analysis with the prototype low-frequency filter.

As noted above, the computational workload in analyzing and synthesizing audio tracks is of a great importance in providing systems that can operate on general purpose computing platforms. It will be apparent from the above discussion that the computational workload inherent in generating M frequency components from a window of W audio sample values is approximately (W+2M^) multiplies and adds.

The method ofthe present invention utilizes an observation that if the above equations are re- written, the cosine modulation ofthe polyphase components may be carried out by performing a discrete cosine transform (DCT) on a vector constructed from the polyphase components. Since computationally efficient methods for carrying out DCT transformations which involve a computational complexity of order Mlog₂M multiplies and adds are known, a substantial computational saving is obtained compared to the 2M² multiplies and adds needed by the prior art method to perform the cosine modulation..

It can be shown that Eq. 3(a), for M even, may be re-written as follows

where

P_k+M for k = 0,l,...,^-l

^P'> = -4> ^κ_~ Ir ft,k- -l,....2M-l ^4b

Eq. (4a) will be recognized by those skilled in the art as the inverse DCT-II transform ofthe first M points ofthe anti-symmetric part ofthe 2M sequence, P'^. The P' sequence is related to the original polyphase components by a rotation. Fast implementations of requiring ofthe order of Mlog₂M multiplies and adds for computing DCT's and their inverses are known to the art, and hence, will not be discussed in more detail here. Readers interested in more details of these implementations are referred to K.R. Rao and P. Yip, DISCRETE COSINE TRANSFORM, Algorithms, Advantages, Applications, Academic Press. A block diagram of a sub-band analysis filter 380 for generating a set of M frequency components, Sj, from a W sample window is shown in Figure 4. The M audio samples are clocked into a W-sample shift register 382 by controller 381. The oldest M samples in shift register 382 are shifted out the end ofthe shift register and discarded. The contents ofthe shift register are then used to generate the M rotated polyphase components (P'^ to P^M-_l-_k)- These are then transformed by a DCT transform generator 384. Transform generator 384 may be constructed from a general purpose digital computer or from special purpose hardware. Dedicated hardware for carrying out DCT transformation is known to those skilled in the art, and hence, will not be discussed further here.

It should be noted that the computation of each polyphase component involves multiply and add operations. A further subtract per pair of polyphase components is involved in computing the anti-symmetric part of the rotated polyphase component which is then transformed by the DCT generator. In the preferred embodiment ofthe present invention, this subtraction operation is integrated into the windowing operation to directly obtain the desired anti¬ symmetric part ofthe rotated polyphase components. This integration allows one to take advantage of pipelined multiply-add hardware in those embodiments in which the windowing operation is performed on special purpose hardware.

It should also be noted that a computationally efficient method may also be used if the alternate cosine modulation shown in Eq. (3b) is used. In this case, it can be shown that, for M even,

It will be apparent to those skilled in the art that this is the DCT-IV ofthe first M points ofthe anti-symmetric part ofthe 2M point sequence, P'^ shown in Eq. 4(b). As was described above, the subtraction operation shown in Eq. (5) is preferably carried out as part ofthe windowing operation that generates the polyphase components P^, because the efficiencies of pipelined multiply and add hardware can be utilized. Given the sub-band components Sj, the original time domain audio samples may be recovered by first recovering the polyphase components and then performing an inverse ofthe windowing operation described above. The polyphase components are recovered by using the inverse ofthe DCT operation used to generate the sub-band components to generate a set of components Q_j. For example, if the sub-band components were generated using Eq. (4), then one first computes

for i=0,l,...M-l. It will be apparent to those skilled in art that this is equivalent to transforming the sub-band components using DCT-II. Hence, the computation may be carried out using the computationally efficient methods for DCT transforms. The original polyphase components, for M even, are then generated from the Qj as follows:

P" ι.-_{τ M} for i = i ,...,2M- l

-P"._+« for i = 0,l,..., " - l

where β for i = 0,l,...,M- l R", = 0 for i = M (8)

-Q_2M__t for i = M + l, M + 2,..., 2M-1

The time domain samples x^ may be generated from the polyphase components, P^, by the inverse ofthe windowing transform described above. However, a computationally more efficient method may be utilized. A block diagram of a synthesis filter bank according to the present invention is shown in Figure 5 at 500. M frequency components are received and stored in a register 502. The contents of register 502 are then transformed utilizing the inverse ofthe DCT transform that was used to generate the frequency components as shown at 508. For example, if the analysis filter bank used the DCT of Eq.(4a), then inverse DCT generator would use the transform given in Eq. (6). In the preferred embodiment the present invention, inverse DCT generator 508 utilizes one ofthe fast computational methods to perform the transformation, i.e., a method having a computational complexity of order Mlog₂M. The transformation can be performed in special purpose hardware or on a general purpose computer. The output of generator 508 is then converted to the polyphase components by generator 510 according to Eqs. (7) and (8). The resultant 2M polyphase components are then shifted into a 2W entry shift register 512 and the oldest 2M values in the shift register are shifted out and discarded. The contents in the shift register are inputted to array generator 513 which builds a W value array 514 by iterating the following loop 8 times: take the first M samples from shift register 512, ignore the next 2M samples, then take the next M samples. The contents of array 514 are then multiplied by W weight coefficients,- h'j which are related to the hj used in the corresponding sub-band analysis filter to generate a set of weighted values WJ = h'i*uj, which are stored in array 516. Here, the UJ are the contents of array 514. The M time domain samples, xj for j=0,...M-l, are then generated by summing circuit 518 which sums the appropriate WJ values, i.e.,

It should be noted that the weighted values may be obtained directly from the sub-band components, Qj using a W entry shift register in place of shift register 512. The M values of Q obtained from the inverse DCT generator are directly shifted sequentially into a W-entry shift register, and the oldest M values in the shift register are discarded. That is, the polyphase components are not regenerated. Denote the contents ofthe shift register by Uj. The weighted array, W_j, used in Eq. (9), for M even, may then be obtained by forming sums of products of u and the prototype filter coefficients:

where M h^u^^ for i - O, !, ..., — !

3M , ...,— — 1 (10) ^ w_2M,_+i =

...,2M-1

W for £ = 0, 1, 2, -1

2M

The M time domain samples are then according to Eq. (9). This procedure used fewer additions then the procedure described above with reference to Figure 5, and hence, is the preferred method for regenerating the time domain samples.

While the above-described embodiments of sub-band analysis and synthesis filters are described in terms of special purpose hardware for carrying out the various operations, it will be apparent to those skilled in the art that the entire operation may be carried out on a general purpose digital computer.

Accordingly, improved sub-band analysis and synthesis filter banks have been described. Various modifications to the present invention will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Hence, the present invention is to be limited solely by the scope ofthe following claims.

Claims

WHAT IS CLAIMED IS:

1. An analysis filter bank for generating M sub-band audio signals from an input digital audio signal, said analysis filter bank comprising: means for receiving said digital input audio signal; means for storing the most recently received W input digital audio signal values, wherein W≥M; means for generating M modified polyphase components from said stored input digital audio signal values comprising means for weighting said W stored input digital audio signal values and means for forming the sums and differences of said weighted values; and means for generating a discrete cosine transform of said modified polyphase components to obtain said M sub-band audio signal values.

2. A synthesis filter bank for generating an audio signal from M sub-band audio signals, said synthesis filter bank comprising: means for receiving and storing said M sub-band audio signals; means for generating M signals values from said sub-band audio signals by performing an inverse discrete cosine transform of said M sub-band audio signals; means for storing the W most recently generated said signal values, where W≥M; and means for generating weighted sums of said stored signal values.