WO2001039174A1

WO2001039174A1 - Low memory digital audio effects using down-sampling up-sampling technique

Info

Publication number: WO2001039174A1
Application number: PCT/SG1999/000133
Authority: WO
Inventors: Mohammed Javed Absar; Sapna George
Original assignee: Stmicroelectronics Asia Pacific Pte Ltd.
Priority date: 1999-11-25
Filing date: 1999-11-25
Publication date: 2001-05-31

Abstract

A method of introducing digital audio effects in an audio signal including: receiving an input stream at an input sampling frequency; down-sampling the input stream so that a selected number of sample data are retained; applying a digital audio effect to the selected number of sample data; and up-sampling the sample data to a predetermined output frequency.

Description

Low Memory Digital Audio Effects Using Down-sampling Up-sampling Technique

Field of the Invention

This invention is applicable in the field of Digital Audio Effects Algorithms (e.g. echo, chorus, reverberation, flanging) implemented on a DSP with reduced memory usage requirement.

Background of the Invention

Audio effects, such as echo, chorus, reverberation and flanging are indispensable in systems such as music production, home entertainment (Hi-Fi), car audio and Karaoke Systems. Often these effects are implemented using digital signal processors, with associated memory, input-output peripherals, analogue-to-digital and digital-to-analogue converters.

The processor takes in the "dry" input, produced by an instrument such as a keyboard or previously recorded on some analogue medium, and samples it at an appropriate rate. It is also possible that the input comes from a digitally recorded source (e.g. 44.1 kHz. sampled Audio CD) , in which case no additional sampling is required. Whatever the source, the fmal input stream is in digital form so that it can be subjected to DSP effects algorithm. The resulting "wet" stream is reconstructed to analogue form, to be sent to the next unit in the audio chain, such as speaker system, a recording channel, a mixer, or another effects processor .

In all digital audio effects the basic element is the delay-buffer, several of which may be combined to form complicated effects such as reverberation. Traditionally, the delay buffer stores data at the same rate as the input sampling rate. Therefore the delay buffer size is a function of not only of the maximum delay allowed in the system but also of the sampling frequency. Delay Buffer Size (words) = Max. Delay (ms.) x Sampling Frequency (kHz.) The memory requirement can be prohibitive when it comes to implementing effects such as reverb where delays of over 200 ms. are often desired.

This invention attempts to decrease memory requirements for effects algorithm without affecting quality appreciably.

Summary of the Invention

In accordance with the invention, there is provided a method of introducing digital audio effects in an audio signal including: receiving an input stream at an input sampling frequency; down-sampling the input stream so that a selected number of sample data are retained; applying a digital audio effect to the selected number of sample data; and up-sampling the sample data to a predetermined output frequency.

In another aspect, there is provided a digital signal processor including an audio effect engine for introducing a digital audio effect in an audio signal, including: a down-sampler for down-sampling an input signal to a selected number of sample data; an audio effects engine for applying the digital audio effect to the sample data; and an up-sampler for up-sampling the sample data, to which the audio effect has been applied, to a predetermined output frequency.

The input data may be converted to a lower sampling frequency at, for example, a ratio of 4: 1. At lower sampling rate, the amount of data is lesser, for the same duration of the signal. The algorithm introduces effect at this frequency. For generating the output the effect added samples are reconvened to the desired output frequency. Brief Description of the Drawings

The invention is more fully described by way of non-limiting example only, with reference to the accompanying drawings, in which: Figure 1 illustrates audio effects operating at a constant sampling rate; and

Figure 2 illustrates a down-sampling/up-sampling technique to decrease delay-buffer size.

Detailed Description of a Preferred Embodiment

Consider the effects engine in Figure 1. The input stream is arriving at the rate of 44.1 kHz. If a delay of 200 ms. is required, the buffer size for single channel would be

Delay Buffer Size (words) = Max. Delay (ms.) x Sampling Frequency (kHz.) = 200 ms. x 44.1 kHz. = 8820 words

This would be considered a large amount for a Karaoke System. One method of avoiding such large size could be to keep data in the buffer in a compressed format. However, it comes with several difficulties. If a Lossy compressor such as AC-3 or MPEG is used, the computation requirement for encoding and decoding is very high. Lossless compression will not give high compression, and in addition to that the compression ratio is not fixed.

The buffer size above can also be decreased if the amount of data necessary for representing the signal for same duration (Max. Delay) is decreased. This can be realised by perforrning sampling rate conversion. At a lower sampling rate the same duration of the signal can be represented using fewer samples. High frequency contents of the signals will have to be discarded, but this may be acceptable in cases where listening and singing environment (microphone, analogue-to-digital converters etc.) are of commercial level quality only.

Figure 2. shows the Down-Sampling Up-Sampling Technique for reducing buffer size. Prior to storage or any action by the audio effects engine, the input stream passes through an anti-aliasing filter +decimator. The anti-aliasing filter removes frequency components above π/N, where N is the decimation factor. Decimation by N means that one out of every N sample data is retained, the rest are thrown away. Decimation causes the high frequency components (above sampling_frequency/(2*Ν)), to wrap around and appear as ghost frequency components at lower frequency. To avoid these ghost components, the high frequencies are suppressed so that the wrap around is not audible. 16-bit pcm (pulse code modulated) has 96 dB SNR, therefore an anti-aliasing filter with stop-band attenuation around 100 dB. would be sufficient.

After decimation, data from input stream is ready for storage into buffer and also available to the audio effect engine. The manner in which the effects algorithm acts upon this data depends on the actual effect it implements. Simple effect such as echo are implemented as y[n] =a*x[n] + (l-a)x[n-D], O≤a≤l, x[n] being the current input, y[n] the output and D the delay. The output is a simple function of current input and a delayed version it.

Complex algorithms such as reverb require current input as well as pre-processed input at various delay time. All such delay times may have to be scaled to obtain equivalent time in terms of the decimated samples. Moreover, most effects' algorithms have lattice-filter coefficients adjusted to the pre-defined frequencies of operation (e.g. 44.1kHz). They have to be adjusted to the operating sampling frequency of the delay-buffer. Once these adjustments are done, mrining the effects algorithm is really straightforward as it effectively operates at the down-sampled frequency.

The output from the effects' engine has to be re-converted to the desired output frequency through an up-sampling process. Up-sampling consists of two steps, the first being the expansion stage wherein each data is preceded by N-l zeros, where N is the up-sampling ratio. Zeros insertion causes the spectrum to shrink by N. Therefore frequency images at 2π intervals (created due to sampling process) come within the 0</≤π boundary. They are removed by an anti-aliasing filter with cut-off at π/N. Looking equivalently at the time domain behaviours of the filtering process, the inserted zeros are changed to new values, obtained by interpolation behaviour of the filter.

It is to be noted that if the decimation and expansion ratio are same, the same filter may be used for both decimation and expansion steps.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1. A method of introducing digital audio effects in an audio signal including: receiving an input stream at an input sampling frequency; down-sampling the input stream so that a selected number of sample data are retained; applying a digital audio effect to the selected number of sample data; and up-sampling the sample data to a predetermined output frequency.

2. A method as claimed in claim 1, wherein the down-sampling includes removing frequency components from the input stream and applying a decimation factor such that only the selected number of sample data are retained

3. A method as claimed in any one of the preceding claims, wherein the down-sampling includes passing the input stream through an anti-aliasing filter for the removal of frequency components, the filter being operable to remove frequency components above π/N, where N is the decimation factor, in that one out of every N samples are retained.

4. A method as claim in claim 1 , wherein the up-sampling includes expansion of the sample data by insertion of zeros.

5. A method as claimed in claim 4, wherein the up-sampling further includes passing the expanded sample data through an anti-aliasing filter with cut-off at π/N, where N is the up- sampling ratio.

6. A method as claimed in claim 5, wherein the zeros introduced with expansion of the sample data are changed to new values, based on interpolation behaviour of the filter.

7. A method as claimed in any one of the preceding claims, wherein the retained sample data are stored in a buffer prior to application of the audio effect.

8. A digital signal processor, including: a down-sampler for down-sampling an input signal to a selected number of sample data; an audio effects engine for applying a digital audio effect to the sample data; and an up-sampler for up-sampling the sample data, to which the audio effect has been applied, to a predetermined output frequency.

9. A digital signal processor as claimed in claim 8, wherein the down-sampler includes an anti-aliasing filter and a decimator, the decimator being arranged to retain the selected number of sample data by applying a decimation factor N and retaining one out of every N sample data input to the down-sampler, and wherein the anti-aliasing filter is effective to remove frequency components above π/N.

10. A digital signal processor as claimed in claim 8, wherein the up-sampler includes an anti-aliasing filter and an expansion means for inserting N-1 zeros before each sample data, to which the audio effect has been applied, where N is the up-sampling ratio, and wherein the anti-aliasing filter has a cut-off at π/N.

11. A digital signal processor as claimed in any one of claims 8 to 10, further including a buffer in which the selected number of sample data are stored prior to application of the audio effect.