US6496795B1 - Modulated complex lapped transform for integrated signal enhancement and coding - Google Patents

Modulated complex lapped transform for integrated signal enhancement and coding Download PDF

Info

Publication number
US6496795B1
US6496795B1 US09/305,690 US30569099A US6496795B1 US 6496795 B1 US6496795 B1 US 6496795B1 US 30569099 A US30569099 A US 30569099A US 6496795 B1 US6496795 B1 US 6496795B1
Authority
US
United States
Prior art keywords
real
coefficients
transform
imaginary
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/305,690
Inventor
Henrique S. Malvar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MALVAR, HENRIQUE S.
Priority to US09/305,690 priority Critical patent/US6496795B1/en
Priority to AU36084/00A priority patent/AU3608400A/en
Priority to PCT/US2000/004868 priority patent/WO2000051014A2/en
Priority to AU37073/00A priority patent/AU3707300A/en
Priority to AU36060/00A priority patent/AU3606000A/en
Priority to PCT/US2000/004867 priority patent/WO2000051013A2/en
Priority to PCT/US2000/004996 priority patent/WO2000051015A1/en
Publication of US6496795B1 publication Critical patent/US6496795B1/en
Application granted granted Critical
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the present invention relates to a system and method for producing modulated complex lapped transforms (MCLTs), and in particular, a system and method for incorporating complex coefficients to modulated lapped transforms (MLTs) to derive MCLTs.
  • MCLTs modulated complex lapped transforms
  • MLTs modulated lapped transforms
  • the Fourier transform is a classical tool used for frequency decomposition of a signal.
  • the Fourier transform breaks a signal down to component frequencies.
  • its usefulness is limited to signals that are stationary, i.e., spectral patterns of signals that do not change appreciably with time. Since most real-world signals, such as audio and video signals, are not stationary signals, localized frequency decompositions are used, such as time-frequency transforms. These transforms provide spectral information that is localized in time.
  • DCT discrete cosine transform
  • the DCT breaks a signal down to component frequencies. For instance, a block of M samples of the signal can be mapped to a block of M frequency components via a matrix of M ⁇ M coefficients. To ensure a good energy compaction performance, the DCT approximates the eigenvectors of the autocorrelation matrix of typical signal blocks.
  • a nk is the element of an A transformation matrix in the nth row and kth column, or equivalently, the nth sample of the kth basis function.
  • the DCT can be used for convolution and correlation, because it satisfies a modified shift property.
  • Typical uses of the DCT are in transform coding, spectral analysis, and frequency-domain adaptive filtering.
  • DCT-IV discrete cosine transform
  • the scaling factor is identical for all basis functions. It should be noted that the DCT-IV basis functions have a frequency shift, when compared to the DCT basis. Nevertheless, these transforms still lead to orthogonal basis.
  • the DCT and DCT-IV are useful tools for frequency-domain signal decomposition. However, they suffer from blocking artifacts.
  • the transform coefficients X(k) are processed in some desired way: quantization, filtering, noise reduction, etc.
  • Reconstructed signal blocks are obtained by applying the inverse transform to such modified coefficients. When such reconstructed signal blocks are pasted together to form the reconstructed signal (e.g. a decoded audio or video signal), there will be discontinuities at the block boundaries.
  • the modulated lapped transform eliminates such discontinuities.
  • the MLT is a particular form of a cosine-modulated filter bank that allows for perfect reconstruction. For example, a signal can be recovered exactly from its MLT coefficients. Also, the MLT does not have blocking artifacts, namely, the MLT provides a reconstructed signal that decays smoothly to zero at its boundaries, avoiding discontinuities along block boundaries. In addition, the MLT has almost optimal performance for transform coding of a wide variety of signals. Because of these properties, the MLT is being used in many applications, such as many modern audio and video coding systems, including Dolby AC-3, MPEG-2 Layer III, and others.
  • MLT transform coefficients
  • spectral subtraction spectral subtraction
  • acoustic echo cancellation devices digital audio representations are commonplace. For example, music compact discs (CDs), Internet audio clips, satellite television, digital video discs (DVDs), and telephony (wired or cellular) rely on digital audio techniques.
  • Digital representation of an audio signal is achieved by converting the analog audio signal into a digital signal with an analog-to-digital (A/D) converter.
  • the digital representation can then be encoded, compressed, stored, transferred, utilized, etc.
  • the digital signal can then be converted back to an analog signal with a digital-to-analog (D/A) converter, if desired.
  • D/A digital-to-analog
  • the A/D and D/A converters sample the analog signal periodically, usually at one of the following standard frequencies: 8 kHz for telephony, Internet, videoconferencing; 11.025 kHz for Internet, CD-ROMs, 16 kHz for videoconferencing, long-distance audio broadcasting, Internet, future telephony; 22.05 kHz for CD-ROMs, Internet; 32 kHz for CD-ROMs, videoconferencing, ISDN audio; 44.1 kHz for Audio CDs; and 48 kHz for Studio audio production.
  • raw bits produced by the A/D are usually formatted at 16 bits per audio sample.
  • the storage capacity is about 700 megabytes (5,600 megabits)
  • MiniDiscs can only store about 140 megabytes, and so a compression of about 4:1 is necessary to fit 30 min to 1 hour of audio in a 2.5′′ MiniDisc.
  • the raw bit rate is too high for most current channel capacities.
  • an efficient encoder/decoder (commonly referred to as coder/decoder, or codec) with good compressions is used.
  • coder/decoder commonly referred to as coder/decoder, or codec
  • the raw bit rate is 64 kbps, but the desired channel rate varies between 5 and 10 kbps. Therefore, a codec needs to compress the bit rate by a factor between 5 and 15, with minimum loss of perceived audio signal quality.
  • codecs can be implemented either in dedicated hardware, typically with programmable digital signal processor (DSP) chips, or in software in a general-purpose computer.
  • DSP digital signal processor
  • G.711, G.726, G.722, G.728, G.723.1, and G.729 other telephony standards: GSM, half-rate GSM, cellular CDMA (IS-733); high-fidelity audio: Dolby AC-2 and AC-3, MPEG LII and LIII, Sony MiniDisc; Internet audio: ACELP-Net, DolbyNet, PictureTel Siren, RealAudio; and military applications: LPC-10 and USFS-1016 vocoders.
  • the present invention is embodied in a system and method for performing spectral analysis of a digital signal having a discrete duration.
  • the present invention performs spectral analysis by spectrally decomposing the digital signal at predefined frequencies uniformly distributed over a sampling frequency interval into complex frequency coefficients so that magnitude and phase information at each frequency is immediately available.
  • the system of the present invention produces a modulated complex lapped transform (MCLT) and includes real and imaginary window processors and real and imaginary transform processors.
  • Each window processor has window functions and operators.
  • the real window processor receives the input signal as sample blocks and applies and computes butterfly coefficients for the real part of the signal to produce resulting real vectors.
  • the imaginary window processor receives the input signal as sample blocks and applies and computes butterfly coefficients for the imaginary part of the signal to produce resulting imaginary vectors.
  • the real transform processor computes a spatial transform on the real vectors to produce a real transform coefficient for the MCLT.
  • the imaginary transform processor computes a spatial transform on the imaginary vectors to produce an imaginary transform coefficient for the MCLT.
  • the system can include inverse transform module for inverse transformation of the encoded output.
  • the inverse transform module can include components that are the exact inverse of the inverse real and imaginary transform processors and the real and imaginary inverse window processors.
  • the encoded output is received and processed by inverse real and imaginary transform processors, and then received and processed by real and imaginary inverse window processors to produce an output signal that substantially matches the input signal.
  • FIG. 1 is a block diagram illustrating an apparatus for carrying out the invention
  • FIG. 2 is a general block diagram illustrating a system for computing and encoding modulated complex lapped transforms in accordance with the present invention
  • FIG. 3 is a general block/flow diagram illustrating a system and method for computing modulated lapped transforms in accordance with the present invention
  • FIG. 4 is a detailed block/flow diagram illustrating computation of a modulated complex lapped transform in accordance with the present invention
  • FIG. 5 is a detailed diagram illustrating the window operation of the modulated complex lapped transform of FIG. 4;
  • FIG. 6 is a flow diagram illustrating operational computation of a modulated complex lapped transform in accordance with the present invention
  • FIG. 7 is a general block diagram of a full-band adaptive filter
  • FIG. 8 is a general block diagram of a frequency-domain MCLT-based adaptive filter in accordance with the present invention.
  • FIG. 9 is a block diagram of a working example of the adaptive filer of FIG. 8 of the present invention in the form of an acoustic echo cancellation device;
  • FIG. 10 is a general block diagram of an acoustic echo cancellation device with MCLT-based adaptive filters in accordance with the present invention.
  • FIG. 11 is a wave signal illustrating sample results of the working example of FIG. 9;
  • FIG. 12 is a general block diagram of a noise reduction device with MCLT-based adaptive filters in accordance with the present invention.
  • FIG. 13 is a flow diagram illustrating operational computation of a noise reduction device incorporating the modulated complex lapped transform of the present invention.
  • FIG. 14 is a wave signal illustrating sample results of the working example of FIGS. 12-13.
  • FIG. 15 is a block diagram of a working system of the present invention illustrated as an integrated signal enhancer and noise reducer with a codec.
  • the MCLT of the present invention can achieve short-time spectral decomposition of signals with explicit magnitude and phase information and perfect signal reconstruction.
  • the MCLT of the present invention can use sine functions at defined frequencies and phases to generate an additional orthogonal decomposition.
  • the defined frequencies and phases are preferably the same that the MLT basis functions use for cosine modulation of a particular window function with certain properties.
  • the MCLT of the present invention is easily integrated with MLT-based systems. Once the MCLT of a signal has been computed, its MLT can be trivially obtained simply by discarding the imaginary parts.
  • the present invention can use both the cosine and sine modulating functions for producing a frame decomposition with desirable properties. Further, the cosine and sine modulations can be used to compute the real and imaginary parts of a transform that has all the magnitude/phase properties of the short-time Fourier transform, while allowing for perfect signal reconstruction. Consequently, the novel MCLT of the present invention can be used in applications such as high-fidelity audio coding, adaptive filtering, acoustic echo cancellation, noise reduction, or any other application where high-fidelity signal reconstruction is required.
  • FIG. 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented.
  • the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the invention may be practiced with a variety of computer system configurations, including personal computers, server computers, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located on both
  • an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional computer 100 , including a processing unit 102 , a system memory 104 , and a system bus 106 that couples various system components including the system memory 104 to the processing unit 102 .
  • the system bus 106 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
  • the system memory includes computer storage media in the form of read only memory (ROM) 110 and random access memory (RAM) 112 .
  • a basic input/output system 114 (BIOS), containing the basic routines that helps to transfer information between elements within computer 100 , such as during start-up, is stored in ROM 110 .
  • the computer 100 may include a hard disk drive 116 for reading from and writing to a hard disk, not shown, a magnetic disk drive 118 for reading from or writing to a removable magnetic disk 120 , and an optical disk drive 122 for reading from or writing to a removable optical disk 124 such as a CD ROM or other optical media.
  • the hard disk drive 116 , magnetic disk drive 128 , and optical disk drive 122 are connected to the system bus 106 by a hard disk drive interface 126 , a magnetic disk drive interface 128 , and an optical drive interface 130 , respectively.
  • the drives and their associated computer-readable media provide storage of computer readable instructions, data structures, program modules and other data for the computer 100 .
  • the exemplary environment described herein employs a hard disk, a removable magnetic disk 120 and a removable optical disk 130 , it should be appreciated by those skilled in the art that other types of computer readable media can store data that is accessible by a computer.
  • Such computer readable media can be any available media that can be accessed by computer 100 .
  • Such computer readable media may comprise communication media and computer storage media.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set of changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as wired network or direct wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • communication media includes wired media such as a wired network or direct wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • Computer storage media includes any method or technology for the storage of information such as computer readable instructions, data structures, program modules or other data.
  • such storage media includes RAM, ROM, EPROM, flash memory or other memory technology, CD-ROM, digital video disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 100 . Combinations of any of the above should also be included within the scope of computer readable media.
  • a number of program modules may be stored on the hard disk, magnetic disk 120 , optical disk 124 , ROM 110 or RAM 112 , including an operating system 132 , one or more application programs 134 , other program modules 136 , and program data 138 .
  • a user may enter commands and information into the computer 100 through input devices such as a keyboard 140 and pointing device 142 .
  • Other input devices may include a microphone, joystick, game pad, satellite dish, scanner, or the like.
  • These and other input devices are often connected to the processing unit 102 through a serial port interface 144 that is coupled to the system bus 106 , but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB).
  • a monitor 146 or other type of display device is also connected to the system bus 106 via an interface, such as a video adapter 148 .
  • computers may also include other peripheral output devices (not shown), such as speakers and printers.
  • the computer 100 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 150 .
  • the remote computer 150 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 100 , although only a memory storage device 152 has been illustrated in FIG. 1 .
  • the logical connections depicted in FIG. 1 include a local area network (LAN) 154 and a wide area network (WAN) 156 .
  • LAN local area network
  • WAN wide area network
  • the computer 100 When used in a LAN networking environment, the computer 100 is connected to the local network 154 through a network interface or adapter 158 . When used in a WAN networking environment, the computer 100 typically includes a modem 160 or other means for establishing communications over the wide area network 156 , such as the Internet.
  • the modem 160 which may be internal or external, is connected to the system bus 106 via the serial port interface 144 .
  • program modules depicted relative to the computer 100 may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
  • FIG. 2 is a general block diagram illustrating a system for computing and encoding modulated complex lapped transforms in accordance with the present invention.
  • an input signal 206 is received by a sampling device 208 , which breaks the signal into blocks.
  • Each block contains L samples, and each new block is formed by discarding the M oldest samples of the block and adding the M newest input samples to the block.
  • L 2M.
  • Also included in the system 200 are real and imaginary window processors 210 , 212 for reducing blocking effects, and real and imaginary transformation processors 220 , 222 for coding each block.
  • one window processor with dual real and imaginary computational devices can be used instead of separate real and imaginary window processors.
  • one transform processor with dual real and imaginary computational devices can be used instead of separate real and imaginary transform processors.
  • the real and imaginary window processors 210 , 212 receive and process the input block by applying and computing butterfly coefficients for the real and imaginary parts of the signal, respectively, to produce resulting real and imaginary vectors.
  • the butterfly coefficients are determined by a given window function, which will be discussed in detail below.
  • the real and imaginary transformation processors 220 , 222 compute spatial transforms on the resulting real and imaginary vectors to produce real and imaginary transform coefficients of the MCLT, respectively.
  • FIG. 3 is a general block/flow diagram illustrating a system and method for computing modulated lapped transforms in accordance with the present invention.
  • the MCLT computation system 300 first receives an input signal 310 .
  • a single window processor 312 with real and imaginary computational devices or dual real and imaginary window processors receives a block of M samples of the input signal (box 314 ).
  • the window processor 312 applies and computes butterfly coefficients, for the real and imaginary parts of the signal (boxes 315 , 316 ), respectively, to produce real and imaginary resulting vectors (boxes 318 , 320 ), respectively.
  • a single transform processor 322 with real and imaginary computational devices or dual real and imaginary transform processors receives the real and imaginary resulting vectors (box 323 ).
  • the transform processor 318 performs a discrete cosine transform (DCT) operation on the real vectors (box 324 ) and a discrete sine transform (DST) operation on the imaginary vectors (box 326 ).
  • DCT discrete cosine transform
  • DST discrete sine transform
  • real and imaginary output signals are respectively produced as vectors with real and imaginary MCLT coefficients corresponding to the input block of samples (boxes 328 , 330 ).
  • the output signal can be processed by transmitting, storing, enhancing, filtering, etc. the signal (box 332 ). For example, interference within the signal can be reduced with a noise reducer, echo canceller, etc., compression can be achieved by scalar or vector quantization of the MLT coefficients, etc., as desired.
  • FIG. 4 is a detailed block/flow diagram illustrating a modulated complex lapped transform (MCLT) extended from a modulated lapped transform processor (MLT) in accordance with the present invention.
  • a transform processor such as a modulated complex lapped transform processor (MCLT) of the present invention, that is preferably an extension of a modulated lapped transform processor (MLT).
  • MLT is preferably the basis for the MCLT because among other things, although other transform processors, such as discrete cosine transforms (DCT and DCT-IV) are useful tools for frequency-domain signal decomposition, they suffer from blocking artifacts.
  • transform coefficients X(k) are processed by DCT and DCT-IV transform processors in some desired way, such as quantization, filtering, noise reduction, etc.
  • Reconstructed signal blocks are obtained by applying the inverse transform to such modified coefficients.
  • reconstructed signal blocks are pasted together to form the reconstructed signal (e.g. a decoded audio or video signal)
  • the modulated lapped transform MHT eliminates such discontinuities by extending the length of the basis functions to twice the block size, i.e. 2M.
  • the MLT can lead to orthogonal or biorthogonal basis and can achieve short-time decomposition of signals as a superposition of overlapping windowed cosine and sine functions. Such functions provide a more efficient. tool for localized frequency decomposition of signals than the DCT or DCT-IV.
  • the MLT is a particular form of a cosine-modulated filter bank that allows for perfect reconstruction. For example, a signal can be recovered exactly from its MLT coefficients. Also, the MLT does not have blocking artifacts, namely, the MLT provides a reconstructed signal that decays smoothly to zero at its boundaries, avoiding discontinuities along block boundaries. In addition, the MLT has almost optimal performance, in a rate/distortion sense, for transform coding of a wide variety of signals.
  • the MLT is based on the oddly-stacked time-domain aliasing cancellation (TDAC) filter bank.
  • TDAC time-domain aliasing cancellation
  • p a (n,k) and p s (n,k) are the basis functions for the direct (analysis) and inverse (synthesis) transforms
  • h a (n) and h s (n) are the analysis and synthesis windows, respectively.
  • the time index n varies from 0 to 2M ⁇ 1 and the frequency index k varies from 0 to M ⁇ 1, where M is the block size.
  • the direct transform matrix P a has an entry in the n-th row and k-th column of p a (n,k).
  • the inverse transform matrix P s has entries p s (n,k).
  • the MLT can be compared with the DCT-IV.
  • ⁇ M ⁇ . ⁇ is the M-sample (one block) delay operator.
  • the MLT can be computed from a standard DCT-IV.
  • a modulated complex lapped transform (MCLT) is derived.
  • p a s ⁇ ( n , k ) ⁇ h a ⁇ ( n ) ⁇ 2 M ⁇ sin ⁇ [ ( n + M + 1 2 ) ⁇ ( k + 1 2 ) ⁇ ⁇ M ] ( 6 )
  • the MCLT can be viewed as providing additional sine-modulated functions as a 2x oversampling in the frequency domain, because for every new M real-valued input samples the MCLT computes M complex frequency components.
  • the MCLT functions above form an over-complete basis. Consequently, the MCLT is in fact a 2x oversampled DFT filter bank (using a doubly-odd DFT instead of the traditional DFT), in which the DFT length size is 2M and the frame (block) size is M.
  • the lowest-frequency subband (the “DC” subband) is complex-valued.
  • mapping from the input block x to the reconstructed block y is done via a diagonal matrix of order 2M.
  • This is in n contrast to the MLT, for which the product P s P a T is not diagonal.
  • the off-diagonal terms of P s P a T for the MLT are the time-domain aliasing terms, which are cancelled when the overlapped blocks are superimposed.
  • the time-domain aliasing terms will not cancel exactly, producing artifacts.
  • the MCLT because of its 2x oversampling, does not rely on time-domain aliasing cancellation.
  • the MCLT can be computed via the type-IV discrete cosine transform (DCT-IV).
  • DCT-IV discrete cosine transform
  • the frequencies of the cosine functions that form the DCT-IV basis are (k+1/2) ⁇ /M, the same as those of the MLT and MCLT.
  • ⁇ M ⁇ . ⁇ is the M-sample (one block) delay operator.
  • the MCLT can be computed from a MCLT computational system 400 having a window processor 410 , which receives M sample blocks 412 of an input signal 414 .
  • the window processor 410 has real and imaginary window operators, real and imaginary transform processors 416 , 418 , such as a length-M DCT-IV and a length-M DST-IV, respectively, as shown in the simplified block diagram of FIG. 4 .
  • the real and imaginary window operators of the window processor 410 applies and computes real and imaginary butterfly coefficients, respectively to produce resulting real and imaginary vectors.
  • the length-M DCT-IV 416 receives the real vectors and the length-M DST-IV 418 receives the imaginary vectors.
  • the real transform processor 416 performs a discrete cosine transform (DCT) operation on the real vectors and the imaginary transform processor 418 performs a discrete sine transform (DST) operation on the imaginary vectors.
  • Output signals with real and imaginary parts 424 , 426 are produced as vectors with MCLT coefficients corresponding to the input block of samples.
  • the DCT-IV and DST-IV can be implemented with the fast techniques.
  • the inverse MCLT can be computed by simply transposing the components, moving the delays to the bottom half ouputs of the DCT-IV and DST-IV, replacing the coefficients h a (n) by h s (n), and multiplying the contents of the final buffer by 1 ⁇ 2.
  • the fast MCLT computation shown in FIG. 4 does not assume identical analysis and synthesis windows. Therefore, it can be used to compute a biorthogonal MCLT, as long as the windows satisfy the perfect reconstruction condition in eqn. (3).
  • the MCLT computational system 400 of FIG. 4 includes real and imaginary window operators 502 , 504 . Initially, a first a block of M samples of an input signal x(n) is obtained. Second, for each window operator 502 , 504 butterfly coefficients 512 , 513 are applied and computed to produce resulting real vectors u(n) and imaginary vectors v(n). The butterfly coefficients are determined by a window function ⁇ h(n) ⁇ 514 , 515 .
  • the real and imaginary vectors are received by the real and imaginary transform processors 520 , 522 , which are preferably discrete cosine transform (DCT) and discrete sine transform (DST) processors to produce vectors with real and imaginary MCLT transform coefficients corresponding to the input signal.
  • DCT discrete cosine transform
  • DST discrete sine transform
  • FIG. 6 is a flow diagram illustrating operational computation of a working example of a modulated complex lapped transform in accordance with the present invention.
  • an input buffer x containing M signal samples are read by a MCLT system (box 610 ).
  • h(n) based butterflies are computed for a cosine (real) part u(n) (box 612 ), a top half of u(n) is stored in a buffer 616 for use in the next block (box 614 ) and the top half of u(n) is read from a previous block (box 618 ).
  • h(n) based butterflies are computed for a sine (imaginary) part v(n) (box 620 ), a top half of v(n) is stored in a buffer 624 for use in the next block (box 622 ) and the top half of v(n) is read from a previous block (box 626 ).
  • a discrete cosine transform, type IV (DCT IV) is then computed on u(n) (box 628 ) and a discrete sine transform, type IV (DST IV) is computed on v(n) (box 630 ).
  • an output buffer U(k) containing the real part of the MCLT is produced (box 632 ) and an output buffer V(k) containing the imaginary part of the MCLT is produced (box 634 ).
  • the present invention allows the present invention to provide perfect reconstruction, in that a signal x(n) (where n denotes the discrete-time index) can be recovered exactly from its MCLT coefficients.
  • the MCLT is a linear operator that projects the input block into a frame containing 2M basis functions.
  • the MCLT corresponds to a tight frame (all blocks of same energy are mapped into transform vectors with the same energy), with a magnitude amplification factor equal to two.
  • An advantage of the novel MCLT of the present invention is that the MCLT maps a block of M input signal samples into M complex frequency coefficients. As a result, magnitude and phase information at each frequency is immediately available with the MCLT.
  • the real part of the MCLT is the MLT, which makes for simplified computation of the MLT of a signal whose MCLT is known. This allows for efficient integration with multimedia applications, such as acoustic echo cancellation and audio coding.
  • the MCLT maps a block of M input signal samples into M complex frequency coefficients, the MCLT leads to data expansion of a factor of two. In other words, an oversampling factor of two. This oversampling actually provides good performance in acoustic echo cancellation applications.
  • the MCLT of the present invention provides short-time spectral decomposition of signals with explicit magnitude and phase information and perfect signal reconstruction. Fast computability by means of butterflies followed by discrete cosine transform operators. Also, the real of the MCLT can be computed directly by the techniques discussed above, and the imaginary part can be computed with simple modifications. In addition, the MCLT is easily integrated with MLT-based systems. Further, once the MCLT of a signal has been computed, its MLT can be trivially obtained simply by discarding the imaginary part.
  • FIG. 7 is a general block diagram of a full-band adaptive filter using an adaptive FIR filtering approach.
  • a filtering system 700 such as a full-band adaptive filtering system, includes an input signal x(n) 710 and reference signal r(n) 712 received by an adaptive filter 720 .
  • the adaptive filter produces an output signal y(n) 722 , which is sent back into the adaptive filter 720 for providing automatic refinement adjustments to the filtering process until the output signal y(n) approximates as closely as possible the reference signal r(n).
  • is a parameter that controls the speed of adaptation.
  • the LMS performs updates as an adaptation rule if the error e(n) has the same sign as the input e(n), i.e. if their product is positive, then
  • the adaptation rule above corresponds to adjusting the coefficient vector w in the negative direction of the gradient of the error with respect to w, i.e. a steepest descent update.
  • the adaptive filter of FIG. 7 can be used in many applications where the response of the filter needs to change in view of varying conditions. Examples include modems (to equalize the signal distortions caused by the telephone lines) and acoustic echo cancellers (to remove the feedback from loudspeakers into microphones). If the input signal has a frequency spectrum that contains many peaks and valleys (i.e., if it is heavily colored), the parameter ⁇ in the LMS update equation has to be set to a very low value, which reduces the speed of adaptation, i.e., the speed in which the error signal e(n) converges to values near zero. On the other hand, if the input signal has a flat (white) spectrum, the LMS update equation is optimal, in the sense that it will lead to the fastest possible convergence
  • FIG. 8 is a general block diagram of a frequency-domain MCLT-based adaptive filter in accordance with the present invention.
  • the performance of the LMS adaptive filter of FIG. 7 discussed above can be improved for colored input signals by using the new structure in FIG. 8 .
  • the signals can be broken into frequency subbands and an adaptive LMS filter can be performed in each subband, as shown in FIG. 8 .
  • FIG. 8 depicts the MCLT as the transform operator that performs the frequency decomposition, other transforms could be used, such as a modulated lapped transform (MLT). If real transforms such as the MLT are used, the adaptive filters of each of the subbands have real coefficients. With a complex-valued transform such as the MCLT, the filter coefficients will have complex values.
  • MLT modulated lapped transform
  • the frequency-domain adaptive filter of FIG. 8 includes a first MCLT processor 810 for receiving and processing an input signal x(n) for producing input signal vectors, such as X( 0 ) through X(M ⁇ 1) and a second MCLT processor 812 for receiving and processing a reference signal r(n) for producing reference signal vectors, such as R( 0 ) through R(M ⁇ 1).
  • a first MCLT processor 810 for receiving and processing an input signal x(n) for producing input signal vectors, such as X( 0 ) through X(M ⁇ 1)
  • a second MCLT processor 812 for receiving and processing a reference signal r(n) for producing reference signal vectors, such as R( 0 ) through R(M ⁇ 1).
  • Also included in system 800 are plural adaptive filters 814 for receiving the input signal vectors X( 0 ) through X(M ⁇ 1) and the reference signal vectors R( 0 ) through R(M ⁇ 1) for producing corrected signal vectors, such as Y( 0 ) through Y(M ⁇ 1) and an inverse modulated complex lapped transform processor (IMCLT) 816 .
  • the IMCLT 816 receives and processes the corrected signal vectors Y( 0 ) through Y(M ⁇ 1) for producing a final output signal y(n) that substantially matches the input signal x(n).
  • the adaptive filter of FIG. 8 there is an adaptive filter for each subband k. Consequently, the subband signals are modified according to the adaptive filter learning computation.
  • the final output y(n) is obtained by applying an inverse MCLT (IMCLT) on the corrected subband/transform coefficients ⁇ Y(k) ⁇ . If the original adaptive filter of FIG. 7 had L coefficients, each adaptive filter in FIG. 8 needs only to have L/M coefficients, for the same time span.
  • the advantages of using the frequency-domain adaptive filter of FIG. 8 include faster convergence, because the signals within each subband are approximately white, even for a heavily colored input. Also, the device of FIG. 8 provides improved error control, because the ⁇ factors for the adaptive filters in each subband can be adjusted independently. Finally, the system in FIG. 8 can have a reduced computational complexity, because of the fast FFT-based algorithms available to compute the transforms.
  • AEC acoustic echo cancellation
  • the AEC can use an adaptive filter that estimates the feedback transfer function from the loudspeaker to the microphone. The estimated echo return is then subtracted from the microphone signal.
  • Simple FIR filters are not ideal because of the length of the impulse response necessary to obtain a reasonable amount of echo reduction (for a 16 kHz sampling rate and an echo window of 100 ms, a 1,600-point impulse response is needed).
  • subband adaptive filtering the long FIR full-band filter is replaced by a collection of short FIR filters, one for each subband.
  • a critically sampled filter bank such as the MLT can be used for adaptive filtering, but the uncancelled aliasing due to subband processing may limit the amount of echo reduction to 10 dB or less. Performance can be improved by using cross-filters among neighboring subbands, but the extra degrees of freedom in such adaptive cross-filters usually slows down convergence significantly.
  • subband acoustic echo cancellation AEC can be performed without cross-filters.
  • Each subband can be processed by a short FIR filter with complex taps, as shown in FIG. 8 . With a large number of subbands, the subband signals are essentially white, and so each adaptive filter can be adjusted via the normalized LMS computation.
  • FIG. 9 is a block diagram of a speakerphone working example of the adaptive filer of FIG. 8 of the present invention in the form of an acoustic echo cancellation device.
  • the speakerphone system 900 of FIG. 9 includes communication equipment 910 comprising microphone input signals 912 received from a microphone 914 , which can be amplified by an amplifier 915 and speaker output signals 916 transmitted to a speaker 918 .
  • the system 900 also includes a filter 920 , such as the MCLT-based adaptive filter discussed above, for receiving input signals produced by the speaker 918 and reference signals received by the microphone 914 .
  • the local microphone not only captures audio signals intended to be transmitted (such as voice signals of a local person), it also captures audio signals that are being attenuated at the local loudspeaker (such as voice signals from a remote person's transmission) as feedback. Unless the speaker feedback is cancelled, that feedback signal is sent back to the remote person. As such, the remote person will hear an echo of his or her own voice transmitted to the local person.
  • the adaptive filter of the present invention includes an MCLT-based adaptive filter for processing and filtering the input and references signal for producing an output signal with information indicating the estimated echo portion of the signal.
  • the estimated echo portion of the output signal is removed or canceled and a resulting clean output signal is sent to the microphone input of the communication equipment 910 . Consequently, after an initial audio signal is sent through the system 900 , subsequent audio signals with feedback or echoes produced by the loudspeaker 918 are canceled by the adaptive filter 920 before the microphone input is received.
  • the output of the adaptive filter will be a good estimate of the portion of the microphone signal that is dependent on the loudspeaker signal, which is precisely the echo.
  • the echo is subtracted from the signal of the microphone 914 , as shown in FIG. 9, only the part of the microphone signal that is not correlated with the loudspeaker signal will remain.
  • the remaining part corresponds to the other local sounds, such as the voice of the person, speaking and other ambient sounds.
  • FIG. 10 is a general block diagram of an acoustic echo cancellation device (AEC) with MCLT-based adaptive filters in accordance with the present invention.
  • the AEC 1000 of FIG. 10 includes a first MCLT processor 1010 for receiving and processing a loudspeaker signal as an input signal x(n) for producing input signal vectors X( 0 ) through X(M ⁇ 1) and a second MCLT processor 1012 for receiving and processing a microphone signal as a reference signal r(n) for producing reference signal vectors R( 0 ) through R(M ⁇ 1).
  • a first MCLT processor 1010 for receiving and processing a loudspeaker signal as an input signal x(n) for producing input signal vectors X( 0 ) through X(M ⁇ 1)
  • a second MCLT processor 1012 for receiving and processing a microphone signal as a reference signal r(n) for producing reference signal vectors R( 0 ) through R(M ⁇ 1).
  • adaptive filters 1014 for receiving the input signal vectors X( 0 ) through X(M ⁇ 1) and the reference signal vectors R( 0 ) through R(M ⁇ 1).
  • the adaptive filters estimate the echo within the signals, which are then combined with the reference signals for canceling the echoes and producing cleaned and corrected signal vectors, such as Y( 0 ) through Y(M ⁇ 1).
  • An inverse modulated complex lapped transform processor (IMCLT) 1016 receives and processes the corrected signal vectors Y( 0 ) through Y(M ⁇ 1), which have MCLT coefficients without echo, for producing a final output signal with the echo canceled.
  • IMCLT inverse modulated complex lapped transform processor
  • Subtraction of the estimated echo from the microphone signal is preferably performed for each subband, resulting in a set of subband signals Y(k) with the echo substantially removed.
  • the AEC and spectral subtraction can be combined using a single MLCT decomposition. For example, spectral subtraction can be applied to the subband signals immediately after the AEC adaptive filters. If the resulting signal is to be encoded by an MLT-based codec, then the MLT coefficients for the audio codec can be obtained by simply taking the real part of the outputs of the spectral subtraction. Therefore, only a single transformation step with the MCLT is necessary to perform simultaneous signal enhancement and coding.
  • an inverse modulated complex transform can be performed on Y(k), as shown in FIG. 8 .
  • codec coder/decoder
  • the MCLT-based adaptive filters uses short windows, which leads to low processing delay.
  • Another advantage is that the MCLT allows for perfect signal reconstruction. Also, integrating an MCLT adaptive filter with an MLT-based processing system (for example, an audio codec) is very easy, since the MLT is obtained directly as the real part of the MCLT. Further, for a given number of subbands M (which is also the block size), a windowed Fourier transform decomposes the signal into M/2+1 distinct subbands. The MCLT breaks the signal into M subbands, and so it provides essentially twice the frequency resolution. Therefore, an MCLT-based adaptive filter will converge faster, because narrower subbands tend to have a flatter spectrum.
  • FIG. 11 is a wave signal illustrating sample results of the working example of FIG. 9 .
  • the first wave signal (Wave A) is the microphone signal as a recorded echo return.
  • the second wave signal (Wave B) is the output of a full-band AEC, echo reduction ratio (ERR) ⁇ 26 dB (it should be noted that the vertical scale is different).
  • the third wave signal (Wave C) is an output of a 512-band MLT AEC without cross filters, ERR ⁇ 5 dB.
  • the bottom wave signal (Wave D) is an output of a 512-band MCLT AEC without cross filters, ERR ⁇ 20 dB.
  • the original signal is an actual echo return recorded at 16 kHz sampling from a microphone located at about 20′′ from the loudspeaker (using a 4′′ driver).
  • the signals in FIG. 11 show the cancelled echo after convergence of each AEC (which takes a few seconds in all cases).
  • the MLT and MCLT AECs used M 512 subbands and a four-tap adaptive filter in each band (corresponding to an echo window of about 128 ms).
  • the echo attenuation for the MCLT is about 20 dB, which is adequate for many practical teleconferencing applications.
  • the MCLT of the present invention is amenable to other types of frequency-domain processing while allowing for perfect signal reconstruction.
  • another kind of processing that can be efficiently performed in the frequency domain, especially with the MCLT is noise reduction.
  • the AEC usually removes feedback from the loudspeaker, but may not remove other noises, such as ambient noises that may be generated by computers and fans in an office.
  • the signal Y(k) is considered as having a desired signal and a noise component, in the form:
  • noise reduction is achieved by estimating the average noise magnitude
  • N e (k) is the noise level estimate for the kth subband, which can be subtracted from Y(k), in the form
  • Y f (k) is the filtered signal.
  • a portion of the estimated magnitude noise is subtracted from the magnitude of each subband signal.
  • the phase is not affected, since the average noise phase is always zero.
  • the parameters ⁇ (k) control how much of the noise estimate is subtracted from each subband signal, and so 0 ⁇ (k) ⁇ 1 is preferably set. These parameters are preferably adjusted depending on the quality of the noise estimates. For example, if the noise estimate is significantly above the true noise level, the subtraction in eqn. (A) will remove part of the signal, also, leading to noticeable artifacts.
  • FIG. 12 is a general block diagram of a noise reduction device with MCLT-based adaptive filters in accordance with the present invention.
  • the noise reduction device 1200 of FIG. 12 includes an MCLT processor 1210 for receiving and processing an input signal x(n) corrupted by noise for producing input signal vectors X( 0 ) through X(M ⁇ 1), plural subtraction devices 1212 , such as subband noise subtraction devices and-an inverse modulated complex lapped transform processor (IMCLT) 1216 .
  • IMCLT inverse modulated complex lapped transform processor
  • the plural subband noise subtraction devices 1212 receive the input signal vectors X( 0 ) through X(M ⁇ 1) and compute magnitude, Xmag( 0 ) through Xmag(M ⁇ 1), and phase, Xph( 0 ) through Xph(M ⁇ 1), information (box 1218 ).
  • Noise levels, Ne( 0 ) through Ne(M ⁇ 1) are estimated from the magnitude, Xmag( 0 ) through Xmag(M ⁇ 1), information (box 1220 ).
  • the noise level estimates are combined with the magnitude information for reducing the noise based on the noise level estimated to produce cleaned and corrected magnitude information, which is then sent to a recovery device 1222 for recovering the real and imaginary parts of this information.
  • An inverse modulated complex lapped transform processor (IMCLT) 1216 receives and processes the corrected information as signal vectors Y( 0 ) through Y(M ⁇ 1), which have MCLT coefficients with reduced noise, for producing a final output signal with noise reduction.
  • IMCLT inverse modulated complex lapped transform processor
  • FIG. 13 is a flow diagram illustrating operational computation of a noise reduction device incorporating the modulated complex lapped transform of the present invention.
  • an input buffer x containing M subband coefficients are read (box 1310 ) by MCLT processor 1210 of FIG. 12 .
  • the MCLT coefficients X(k) are computed (box 1312 ) and this information is sent to the subband subtraction device 1212 of FIG. 12 .
  • spectral subtraction is performed (box 1320 ) by the subband subtraction device 1212 of FIG. 12, preferably with the following expression:
  • the noise reduction process is preferably performed right after the echo cancellation process of FIG. 10, otherwise the loudspeaker echo would interfere with the noise estimate and make it less reliable.
  • FIG. 14 is a wave signal illustrating sample results of the working example of FIGS. 12-13 using the MCLT of the present invention with spectral subtraction.
  • the top wave signal (Wave A) is the original speech, corrupted by PC noise, SNR ⁇ 15 dB and the bottom wave signal (Wave B) is the processed speech, SNR ⁇ 30 dB.
  • FIG. 15 is a block diagram of a working system of the present invention shown as a signal enhancer and noise reducer integrated with a codec.
  • the MCLT, AEC and noise reducer of the present invention as described above, can be integrated with an audio codec for use by a real time communication system, such as audio applications including Internet telephony or other forms of hands-free teleconferencing or telephony.
  • the integration of the MCLT, AEC, noise reducer and codec leads to improved performance, reduced computational complexity, and reduced memory usage and processing delay over current systems.
  • the AEC (element 800 of FIG. 8) and noise reducer (element 1200 of FIG. 12) of the present invention can be integrated with an audio codec to form a novel integrated processor 1500 .
  • the integrated processor 1500 of the present invention includes a MCLT processor 1510 (similar to MCLT processor 810 of FIG. 8 ), an AEC processor 1512 (similar to the AEC processor 800 of FIG. 8 ), a noise reducer 1516 (similar to the noise reducer 1200 of FIG. 12 ), a coefficient filter 1516 , a magnitude processor 1518 , and a codec 1520 , which can be any suitable codec.
  • the noise reducer 1516 is preferably included in the codec 1520 , as shown in FIG. 15 .
  • the codec can be the audio codec (with suitable modifications in accordance with the present invention) described in co-pending U.S. patent application Ser. No. 09/085,620, filed on May 27, 1998 by Henrique Malvar, entitled “Scalable Audio Coder and Decoder” and assigned to the current assignee, which is herein incorporated by reference.
  • the audio that is captured can be enhanced by operations of the AEC 1512 and noise reducer 1514 of the present invention.
  • the audio signal is preferably coded (compressed) by the codec 1520 to limit the bit rate to a rate that is adequate for the communication channel.
  • the MCLT processor 1510 receives and processes an input signal x(n) for producing input signal vectors, such as X( 0 ) through X(M ⁇ 1) and the AEC processor 1512 receives and processes a reference signal for producing reference signal vectors 1522 .
  • the noise reducer 1516 then receives the signal vectors 1514 from the AEC processor 1512 and produces enhanced MCLT coefficients (real and imaginary parts), such as coefficients Y( 0 ) through Y(M ⁇ 1), in accordance with the noise reducer 1200 of FIG. 12 .
  • the AEC does not need to perform an inverse MCLT after the adaptive filters in each subband, if a codec operates in the frequency domain.
  • the same principle applies when the noise reduction process of eqn. (A) is added.
  • the noise reduction step is applied to generate the filtered subband coefficients Y f (k), which are then sent directly to the codec, without the need to return to the corresponding time-domain signal.
  • the echo-cancelled and noise filtered signal y f (n) can be obtained simply by computing an inverse MCLT on the subband signals Y f (k).
  • the present invention uses the MCLT as the first processing step in order to avoid performing inverse MCLT computations.
  • MLT modulated lapped transform
  • the MLT is the real part of the MCLT.
  • coefficients R(k) can be obtained by:
  • the coefficient filter 1516 and the magnitude processor 1518 receive coefficients Y( 0 ) through Y(M ⁇ 1).
  • the coefficient filter 1516 processes the MCLT coefficients (real and imaginary parts) and discards the imaginary parts of the MCLT coefficients.
  • the integrated system 1500 of FIG. 15 produces accurate masking functions.
  • the audio codec of U.S. patent application Ser. No. 09/085,620 computes weighting functions based on hearing thresholds, defined by functions that approximate the masking phenomena in the human auditory system.
  • Such masking functions can be computed based on the power spectrum of the incoming audio, i.e.. the power values at each frequency index k. As such, the spectral magnitudes are approximated by magnitudes of the MLT coefficients.
  • the MLT coefficients are obtained from projected the signal into modulated cosines, their magnitudes are typically not directly proportional to the actual physical r.m.s. (root mean-square) power contained in the signal at each frequency subband.
  • the magnitudes can be computed directly from the real and imaginary parts (cosine and sine projections, respectively), and such magnitudes are then directly proportional to the physical r.m.s. power at each frequency subband k. In that way, the computation of the masking functions are more precise as compared to computations based solely on the MLT (real part) coefficients.
  • the magnitude processor 1518 computes the magnitudes of the MCLT coefficients, such as U( 0 ) through U(M ⁇ 1). Computation of the magnitudes U(k) can be performed by the following expression:
  • the codec 1520 further includes a weighting processor 1524 , a masking functions processor 1526 and an encoding processor 1528 .
  • the masking functions processor 1526 receives the magnitude coefficients produced by the magnitude processor 1518 and computes masking functions.
  • the weighting processor 1524 receives the masking functions and the real part of the signal from the coefficient filter 1516 , such as R( 0 ) through R(M ⁇ 1) for producing the weighted signal, as described above.
  • the encoding processor 1528 performs quantization and encoding processing to produce the output bitstream.
  • the input signal is transformed from the time domain to the frequency domain once, by means of an MCLT. Therefore, the enhancement functions of the AEC 1512 and the noise reducer 1514 are computed on the MCLT coefficients, while the codec 1520 uses the real part of the MCLT coefficients for quantization and encoding and the magnitudes of the MCLT coefficients for computation of precise auditory masking functions and weighting functions.
  • the computational load can be minimized by performing the MCLT computation once by integrating several components that process the signal in the frequency domain.
  • a speech recognizer 1530 having a coefficient processor 1532 and recognition engine 1534 .
  • the speech recognizer 1530 can be located after the magnitudes are computed 1518 .
  • the incoming speech signal is divided into blocks of 10 to 30 ms duration.
  • a cepstrum vector is computed, and cepstral coefficients are used for the next step of statistical and language pattern analysis.
  • the parameter N (the number of spectral coefficients computed) can be set between 10 and 20.
  • a fast Fourier transform (FFT) operator is preferably computed on the incoming block.
  • FFT fast Fourier transform
  • ⁇ U(k) ⁇ is the set of MCLT coefficients.
  • the speech recognition engine 1534 receives these coefficients for performing speech recognition.
  • this new cepstral vector C(r) is not identical to the original cepstral vector V(r)
  • the patterns present in V(r) will also be present in C(r).
  • Re-training of the spectral recognition engine can be performed, so it will re-adapt to the typical patterns in C(r).

Abstract

The present invention is embodied in a system and method for performing spectral analysis of a digital signal having a discrete duration by spectrally decomposing the digital signal at predefined frequencies uniformly distributed over a sampling frequency interval into complex frequency coefficients so that magnitude and phase information at each frequency is immediately available to produce a modulated complex lapped transform (MCLT). The present invention includes a MCLT processor, an acoustic echo cancellation device and a noise reducer integrated with an encoder/decoder device.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system and method for producing modulated complex lapped transforms (MCLTs), and in particular, a system and method for incorporating complex coefficients to modulated lapped transforms (MLTs) to derive MCLTs.
2. Related Art
In many engineering and scientific applications, it is desirable to analyze a signal in the frequency domain or represent the signal as a linear superposition of various sinusoids. The analysis of the amplitudes and phases of such sinusoids (the signal spectrum) can be useful in multimedia applications for operations such as noise reduction, compression, and pattern recognition, among other things. The Fourier transform is a classical tool used for frequency decomposition of a signal. The Fourier transform breaks a signal down to component frequencies. However, its usefulness is limited to signals that are stationary, i.e., spectral patterns of signals that do not change appreciably with time. Since most real-world signals, such as audio and video signals, are not stationary signals, localized frequency decompositions are used, such as time-frequency transforms. These transforms provide spectral information that is localized in time.
One such transform is the discrete cosine transform (DCT). The DCT breaks a signal down to component frequencies. For instance, a block of M samples of the signal can be mapped to a block of M frequency components via a matrix of M×M coefficients. To ensure a good energy compaction performance, the DCT approximates the eigenvectors of the autocorrelation matrix of typical signal blocks. Basis functions for the DCT (for type II) can be defined as: a nk = c ( k ) 2 M cos [ ( n + 1 2 ) k π M ]
Figure US06496795-20021217-M00001
where, ank is the element of an A transformation matrix in the nth row and kth column, or equivalently, the nth sample of the kth basis function. For orthonormality, the scaling factors are chosen as: c ( k ) { 1 / 2 if k = 0 1 otherwise
Figure US06496795-20021217-M00002
The transform coefficients X(k) are computed from the signal block samples x(n) by: X ( k ) = n = 0 M - 1 a nk x ( n )
Figure US06496795-20021217-M00003
The DCT can be used for convolution and correlation, because it satisfies a modified shift property. Typical uses of the DCT are in transform coding, spectral analysis, and frequency-domain adaptive filtering.
An alternative transform for spectral analysis is the discrete cosine transform, type IV (DCT-IV). The DCT-IV is obtained by shifting the frequencies of the DCT basis functions in eqn. (A) by π/2M, in the form: a nk 2 M cos [ ( n + 1 2 ) ( k + 1 2 ) π M ]
Figure US06496795-20021217-M00004
Unlike the DCT, the scaling factor is identical for all basis functions. It should be noted that the DCT-IV basis functions have a frequency shift, when compared to the DCT basis. Nevertheless, these transforms still lead to orthogonal basis.
The DCT and DCT-IV are useful tools for frequency-domain signal decomposition. However, they suffer from blocking artifacts. In typical applications, the transform coefficients X(k) are processed in some desired way: quantization, filtering, noise reduction, etc. Reconstructed signal blocks are obtained by applying the inverse transform to such modified coefficients. When such reconstructed signal blocks are pasted together to form the reconstructed signal (e.g. a decoded audio or video signal), there will be discontinuities at the block boundaries.
The modulated lapped transform (MLT) eliminates such discontinuities. The MLT is a particular form of a cosine-modulated filter bank that allows for perfect reconstruction. For example, a signal can be recovered exactly from its MLT coefficients. Also, the MLT does not have blocking artifacts, namely, the MLT provides a reconstructed signal that decays smoothly to zero at its boundaries, avoiding discontinuities along block boundaries. In addition, the MLT has almost optimal performance for transform coding of a wide variety of signals. Because of these properties, the MLT is being used in many applications, such as many modern audio and video coding systems, including Dolby AC-3, MPEG-2 Layer III, and others.
However, one disadvantage of the MLT for some applications is that its transform coefficients are real, and so they do not explicitly carry phase information. In some multimedia applications, such as audio processing, complex subbands are typically needed by noise reduction devices, via spectral subtraction, and acoustic echo cancellation devices. Namely, in many audio processing applications digital audio representations are commonplace. For example, music compact discs (CDs), Internet audio clips, satellite television, digital video discs (DVDs), and telephony (wired or cellular) rely on digital audio techniques.
Digital representation of an audio signal is achieved by converting the analog audio signal into a digital signal with an analog-to-digital (A/D) converter. The digital representation can then be encoded, compressed, stored, transferred, utilized, etc. The digital signal can then be converted back to an analog signal with a digital-to-analog (D/A) converter, if desired. The A/D and D/A converters sample the analog signal periodically, usually at one of the following standard frequencies: 8 kHz for telephony, Internet, videoconferencing; 11.025 kHz for Internet, CD-ROMs, 16 kHz for videoconferencing, long-distance audio broadcasting, Internet, future telephony; 22.05 kHz for CD-ROMs, Internet; 32 kHz for CD-ROMs, videoconferencing, ISDN audio; 44.1 kHz for Audio CDs; and 48 kHz for Studio audio production.
Typically, if the audio signal is to be encoded or compressed after conversion, raw bits produced by the A/D are usually formatted at 16 bits per audio sample. For audio CDs, for example, the raw bit rate is 44.1 kHz×16 bits/sample=705.6 kbps (kilobits per second). For telephony, the raw rate is 8 kHz×8 bits/sample=64 kbps. For audio CDs, where the storage capacity is about 700 megabytes (5,600 megabits), the raw bits can be stored, and there is no need for compression. MiniDiscs, however, can only store about 140 megabytes, and so a compression of about 4:1 is necessary to fit 30 min to 1 hour of audio in a 2.5″ MiniDisc.
For Internet telephony and most other applications, the raw bit rate is too high for most current channel capacities. As such, an efficient encoder/decoder (commonly referred to as coder/decoder, or codec) with good compressions is used. For example, for Internet telephony, the raw bit rate is 64 kbps, but the desired channel rate varies between 5 and 10 kbps. Therefore, a codec needs to compress the bit rate by a factor between 5 and 15, with minimum loss of perceived audio signal quality.
With the recent advances in processing chips, codecs can be implemented either in dedicated hardware, typically with programmable digital signal processor (DSP) chips, or in software in a general-purpose computer. Currently, commercial systems use many different digital audio technologies. Some examples include: ITU-T standards: G.711, G.726, G.722, G.728, G.723.1, and G.729; other telephony standards: GSM, half-rate GSM, cellular CDMA (IS-733); high-fidelity audio: Dolby AC-2 and AC-3, MPEG LII and LIII, Sony MiniDisc; Internet audio: ACELP-Net, DolbyNet, PictureTel Siren, RealAudio; and military applications: LPC-10 and USFS-1016 vocoders.
It is desirable to have codecs that can achieve low computational complexity and exhibit robustness to signal variations for allowing the codec to handle wider range of signals, i.e., the audio signals can be clean speech, noisy speech, multiple talkers, music, etc. without unduly compromising performance. Therefore what is needed is a new audio processing system that integrates an acoustic echo cancellation device and noise reducer with a codec for improving performance, reducing computational complexity, and reducing memory usage and processing delay. Whatever the merits of the above mentioned systems and methods, they do not achieve the benefits of the present invention.
SUMMARY OF THE INVENTION
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention is embodied in a system and method for performing spectral analysis of a digital signal having a discrete duration. The present invention performs spectral analysis by spectrally decomposing the digital signal at predefined frequencies uniformly distributed over a sampling frequency interval into complex frequency coefficients so that magnitude and phase information at each frequency is immediately available.
Namely, the system of the present invention produces a modulated complex lapped transform (MCLT) and includes real and imaginary window processors and real and imaginary transform processors. Each window processor has window functions and operators. The real window processor receives the input signal as sample blocks and applies and computes butterfly coefficients for the real part of the signal to produce resulting real vectors. The imaginary window processor receives the input signal as sample blocks and applies and computes butterfly coefficients for the imaginary part of the signal to produce resulting imaginary vectors. The real transform processor computes a spatial transform on the real vectors to produce a real transform coefficient for the MCLT. The imaginary transform processor computes a spatial transform on the imaginary vectors to produce an imaginary transform coefficient for the MCLT.
In addition, the system can include inverse transform module for inverse transformation of the encoded output. The inverse transform module can include components that are the exact inverse of the inverse real and imaginary transform processors and the real and imaginary inverse window processors. The encoded output is received and processed by inverse real and imaginary transform processors, and then received and processed by real and imaginary inverse window processors to produce an output signal that substantially matches the input signal.
The foregoing and still further features and advantages of the present invention as well as a more complete understanding thereof will be made apparent from a study of the following detailed description of the invention in connection with the accompanying drawings and appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
FIG. 1 is a block diagram illustrating an apparatus for carrying out the invention;
FIG. 2 is a general block diagram illustrating a system for computing and encoding modulated complex lapped transforms in accordance with the present invention;
FIG. 3 is a general block/flow diagram illustrating a system and method for computing modulated lapped transforms in accordance with the present invention;
FIG. 4 is a detailed block/flow diagram illustrating computation of a modulated complex lapped transform in accordance with the present invention;
FIG. 5 is a detailed diagram illustrating the window operation of the modulated complex lapped transform of FIG. 4;
FIG. 6 is a flow diagram illustrating operational computation of a modulated complex lapped transform in accordance with the present invention;
FIG. 7 is a general block diagram of a full-band adaptive filter;
FIG. 8 is a general block diagram of a frequency-domain MCLT-based adaptive filter in accordance with the present invention;
FIG. 9 is a block diagram of a working example of the adaptive filer of FIG. 8 of the present invention in the form of an acoustic echo cancellation device;
FIG. 10 is a general block diagram of an acoustic echo cancellation device with MCLT-based adaptive filters in accordance with the present invention;
FIG. 11 is a wave signal illustrating sample results of the working example of FIG. 9;
FIG. 12 is a general block diagram of a noise reduction device with MCLT-based adaptive filters in accordance with the present invention.
FIG. 13 is a flow diagram illustrating operational computation of a noise reduction device incorporating the modulated complex lapped transform of the present invention; and
FIG. 14 is a wave signal illustrating sample results of the working example of FIGS. 12-13.
FIG. 15 is a block diagram of a working system of the present invention illustrated as an integrated signal enhancer and noise reducer with a codec.
DETAILED DESCRIPTION OF THE INVENTION
In the following description of the invention, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration a specific example in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
Introduction
The MCLT of the present invention can achieve short-time spectral decomposition of signals with explicit magnitude and phase information and perfect signal reconstruction. For instance, the MCLT of the present invention can use sine functions at defined frequencies and phases to generate an additional orthogonal decomposition. The defined frequencies and phases are preferably the same that the MLT basis functions use for cosine modulation of a particular window function with certain properties.
In addition, the MCLT of the present invention is easily integrated with MLT-based systems. Once the MCLT of a signal has been computed, its MLT can be trivially obtained simply by discarding the imaginary parts. The present invention can use both the cosine and sine modulating functions for producing a frame decomposition with desirable properties. Further, the cosine and sine modulations can be used to compute the real and imaginary parts of a transform that has all the magnitude/phase properties of the short-time Fourier transform, while allowing for perfect signal reconstruction. Consequently, the novel MCLT of the present invention can be used in applications such as high-fidelity audio coding, adaptive filtering, acoustic echo cancellation, noise reduction, or any other application where high-fidelity signal reconstruction is required.
Exemplary Operating Environment
FIG. 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with a variety of computer system configurations, including personal computers, server computers, hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located on both local and remote computer storage media including memory storage devices.
With reference to FIG. 1, an exemplary system for implementing the invention includes a general purpose computing device in the form of a conventional computer 100, including a processing unit 102, a system memory 104, and a system bus 106 that couples various system components including the system memory 104 to the processing unit 102. The system bus 106 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes computer storage media in the form of read only memory (ROM) 110 and random access memory (RAM) 112. A basic input/output system 114 (BIOS), containing the basic routines that helps to transfer information between elements within computer 100, such as during start-up, is stored in ROM 110. The computer 100 may include a hard disk drive 116 for reading from and writing to a hard disk, not shown, a magnetic disk drive 118 for reading from or writing to a removable magnetic disk 120, and an optical disk drive 122 for reading from or writing to a removable optical disk 124 such as a CD ROM or other optical media. The hard disk drive 116, magnetic disk drive 128, and optical disk drive 122 are connected to the system bus 106 by a hard disk drive interface 126, a magnetic disk drive interface 128, and an optical drive interface 130, respectively. The drives and their associated computer-readable media provide storage of computer readable instructions, data structures, program modules and other data for the computer 100. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 120 and a removable optical disk 130, it should be appreciated by those skilled in the art that other types of computer readable media can store data that is accessible by a computer. Such computer readable media can be any available media that can be accessed by computer 100. By way of example, and not limitation, such computer readable media may comprise communication media and computer storage media. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set of changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as wired network or direct wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. By way of example, and not limitation, communication media includes wired media such as a wired network or direct wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Computer storage media includes any method or technology for the storage of information such as computer readable instructions, data structures, program modules or other data. By way of example, such storage media includes RAM, ROM, EPROM, flash memory or other memory technology, CD-ROM, digital video disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 100. Combinations of any of the above should also be included within the scope of computer readable media.
A number of program modules may be stored on the hard disk, magnetic disk 120, optical disk 124, ROM 110 or RAM 112, including an operating system 132, one or more application programs 134, other program modules 136, and program data 138. A user may enter commands and information into the computer 100 through input devices such as a keyboard 140 and pointing device 142. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 102 through a serial port interface 144 that is coupled to the system bus 106, but may be connected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 146 or other type of display device is also connected to the system bus 106 via an interface, such as a video adapter 148. In addition to the monitor 146, computers may also include other peripheral output devices (not shown), such as speakers and printers.
The computer 100 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 150. The remote computer 150 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 100, although only a memory storage device 152 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 154 and a wide area network (WAN) 156. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and Internet.
When used in a LAN networking environment, the computer 100 is connected to the local network 154 through a network interface or adapter 158. When used in a WAN networking environment, the computer 100 typically includes a modem 160 or other means for establishing communications over the wide area network 156, such as the Internet. The modem 160, which may be internal or external, is connected to the system bus 106 via the serial port interface 144. In a networked environment, program modules depicted relative to the computer 100, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Overview of Components and Operation
FIG. 2 is a general block diagram illustrating a system for computing and encoding modulated complex lapped transforms in accordance with the present invention. In the system 200 of the present invention, an input signal 206 is received by a sampling device 208, which breaks the signal into blocks. Each block contains L samples, and each new block is formed by discarding the M oldest samples of the block and adding the M newest input samples to the block. In a typical implementation, L=2M. Also included in the system 200 are real and imaginary window processors 210, 212 for reducing blocking effects, and real and imaginary transformation processors 220, 222 for coding each block. It should be noted that one window processor with dual real and imaginary computational devices can be used instead of separate real and imaginary window processors. Similarly, one transform processor with dual real and imaginary computational devices can be used instead of separate real and imaginary transform processors.
The real and imaginary window processors 210, 212 receive and process the input block by applying and computing butterfly coefficients for the real and imaginary parts of the signal, respectively, to produce resulting real and imaginary vectors. The butterfly coefficients are determined by a given window function, which will be discussed in detail below. The real and imaginary transformation processors 220, 222 compute spatial transforms on the resulting real and imaginary vectors to produce real and imaginary transform coefficients of the MCLT, respectively.
FIG. 3 is a general block/flow diagram illustrating a system and method for computing modulated lapped transforms in accordance with the present invention. In general, the MCLT computation system 300 first receives an input signal 310. Second, a single window processor 312 with real and imaginary computational devices or dual real and imaginary window processors receives a block of M samples of the input signal (box 314). The window processor 312 applies and computes butterfly coefficients, for the real and imaginary parts of the signal (boxes 315, 316), respectively, to produce real and imaginary resulting vectors (boxes 318, 320), respectively.
Third, a single transform processor 322 with real and imaginary computational devices or dual real and imaginary transform processors receives the real and imaginary resulting vectors (box 323). The transform processor 318 performs a discrete cosine transform (DCT) operation on the real vectors (box 324) and a discrete sine transform (DST) operation on the imaginary vectors (box 326). Fourth, real and imaginary output signals are respectively produced as vectors with real and imaginary MCLT coefficients corresponding to the input block of samples (boxes 328, 330). Fifth, the output signal can be processed by transmitting, storing, enhancing, filtering, etc. the signal (box 332). For example, interference within the signal can be reduced with a noise reducer, echo canceller, etc., compression can be achieved by scalar or vector quantization of the MLT coefficients, etc., as desired.
Structural and Operational Details of the System
FIG. 4 is a detailed block/flow diagram illustrating a modulated complex lapped transform (MCLT) extended from a modulated lapped transform processor (MLT) in accordance with the present invention. Referring back to FIGS. 2-3 along with FIG. 4, the incoming signal is decomposed into frequency components by a transform processor, such as a modulated complex lapped transform processor (MCLT) of the present invention, that is preferably an extension of a modulated lapped transform processor (MLT). An MLT is preferably the basis for the MCLT because among other things, although other transform processors, such as discrete cosine transforms (DCT and DCT-IV) are useful tools for frequency-domain signal decomposition, they suffer from blocking artifacts. For example, transform coefficients X(k) are processed by DCT and DCT-IV transform processors in some desired way, such as quantization, filtering, noise reduction, etc.
Reconstructed signal blocks are obtained by applying the inverse transform to such modified coefficients. When such reconstructed signal blocks are pasted together to form the reconstructed signal (e.g. a decoded audio or video signal), there will be discontinuities at the block boundaries. In contrast, the modulated lapped transform (MLT) eliminates such discontinuities by extending the length of the basis functions to twice the block size, i.e. 2M.
The basis functions of the MLT are obtained by extending the DCT-IV functions and multiplying them by an appropriate window, in the form: a nk = h ( n ) cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ]
Figure US06496795-20021217-M00005
where k varies from 0 to M−1, but n now varies from 0 to 2M−1.
Thus, the MLT can lead to orthogonal or biorthogonal basis and can achieve short-time decomposition of signals as a superposition of overlapping windowed cosine and sine functions. Such functions provide a more efficient. tool for localized frequency decomposition of signals than the DCT or DCT-IV. The MLT is a particular form of a cosine-modulated filter bank that allows for perfect reconstruction. For example, a signal can be recovered exactly from its MLT coefficients. Also, the MLT does not have blocking artifacts, namely, the MLT provides a reconstructed signal that decays smoothly to zero at its boundaries, avoiding discontinuities along block boundaries. In addition, the MLT has almost optimal performance, in a rate/distortion sense, for transform coding of a wide variety of signals.
Specifically, the MLT is based on the oddly-stacked time-domain aliasing cancellation (TDAC) filter bank. In general, the standard MLT transformation for a vector containing 2M samples of an input signal x(n), n=0, 1, 2, . . . , 2M−1 (which are determined by shifting in the latest M samples. of the input signal, and combining them with the previously acquired M samples), is transformed into another vector containing M coefficients X(k), k=0, 1, 2, . . . , M−1. The transformation can be redefined by a standard MLT computation: X ( k ) 2 M n = 0 2 M - 1 x ( n ) h ( n ) cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ]
Figure US06496795-20021217-M00006
where h(n) is the MLT window.
Window functions are primarily employed for reducing blocking effects. For example, Signal Processing with Lapped Transforms, by H. S. Malvar, Boston: Artech House, 1992, which is herein incorporated by reference, demonstrates obtaining its basis functions by cosine modulation of smooth window operators, in the form: p a ( n , k ) = h a ( n ) 2 M cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] p s ( n , k ) = h s ( n ) 2 M cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] ( 1 )
Figure US06496795-20021217-M00007
where pa(n,k) and ps(n,k) are the basis functions for the direct (analysis) and inverse (synthesis) transforms, and ha(n) and hs(n) are the analysis and synthesis windows, respectively. The time index n varies from 0 to 2M−1 and the frequency index k varies from 0 to M−1, where M is the block size. The MLT is the TDAC for which the windows generate a lapped transform with maximum DC concentration, that is: h a ( n ) = h s ( n ) = sin [ ( n + 1 2 ) π 2 M ] ( 2 )
Figure US06496795-20021217-M00008
The direct transform matrix Pa has an entry in the n-th row and k-th column of pa(n,k). Similarly, the inverse transform matrix Ps has entries ps (n,k). For a block x of 2M input samples of a signal x(n), its corresponding vector X of transform coefficients is computed by X=Pa Tx. For a vector Y of processed transform coefficients, the reconstructed 2M-sample vector Y is given by y=PsY. Reconstructed y vectors are superimposed with M-sample overlap, generating the reconstructed signal y(n).
The MLT can be compared with the DCT-IV. For a signal u(n), its length-M orthogonal DCT-IV is defined by: U ( k ) 2 M n = 0 M - 1 u ( n ) cos [ ( n + 1 2 ) ( k + 1 2 ) π M ]
Figure US06496795-20021217-M00009
The frequencies of the cosine functions that form the DCT-IV basis are (k+1/2)π/M, the same as those of the MLT. Therefore, a simple relationship between the two transforms exists. For instance, for a signal x(n) with MLT coefficients X(k), it can be shown that X(k)=U(k) if u(n) is related to x(n), for n=0, 1, K, M/2−1, by:
u(n+M/2)=ΔM {x(M−1−n)h a(M−1−n)−x(n)h a(n)}u(M/2−1−n)=x(M−1−n)h a(n)+x(n)h a(M−1−n)
where ΔM{.} is the M-sample (one block) delay operator. For illustrative purposes, by combining a DCT-IV with the above, the MLT can be computed from a standard DCT-IV. An inverse MLT can be obtained in a similar way. For example, if Y(k)=X(k), i.e., without any modification of the transform coefficients (or subband signals), then cascading the direct and inverse MLT processed signals leads to y(n)=x(n−2M), where M samples of delay come from the blocking operators and another M samples come from the internal overlapping operators of the MLT (the z−M operators).
Assuming symmetrical analysis and synthesis windows, i.e. ha(n)=ha(2M−1−n) and hs(n)=hs(2M−1−n), it is easy to verify that perfect reconstruction is obtained with: h a ( n ) = h s ( n ) h s 2 ( n ) + h s 2 ( M - 1 - n ) ( 3 )
Figure US06496795-20021217-M00010
Consider the product window hp(n)=ha(n)hs(n). From eqn. (3), it follows that:
h p(n)+h p(n+M)=h p(n)+h p(M−1−n)=1  (4)
With either the MLT window in (2) or the biorthogonal windows, the product window satisfies: h p ( n ) = sin 2 [ ( n + 1 2 ) π 2 M ] = 1 2 - 1 2 cos [ ( n + 1 2 ) π M ] ( 5 )
Figure US06496795-20021217-M00011
In accordance with the present, a modulated complex lapped transform (MCLT) is derived. The basis functions of the MCLT are defined by cosine and sine modulation of the analysis and synthesis windows, in the form: p a ( n , k ) = p a c ( n , k ) - j p a s ( n , k ) p a c ( n , k ) = h a ( n ) 2 M cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] p a s ( n , k ) = h a ( n ) 2 M sin [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] ( 6 )
Figure US06496795-20021217-M00012
with j≡{square root over (−1)}, and p s ( n , k ) = 1 2 [ p s c ( n , k ) + j p s s ( n , k ) ] p s c ( n , k ) = h s ( n ) 2 M cos [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] p s s ( n , k ) = h s ( n ) 2 M sin [ ( n + M + 1 2 ) ( k + 1 2 ) π M ] ( 7 )
Figure US06496795-20021217-M00013
The MCLT transform coefficients X(k) are computed from the input signal block x(n) by X=Pa Tx, or X ( k ) = n = 0 2 M - 1 x ( n ) p a ( n , k ) ( 8 )
Figure US06496795-20021217-M00014
Comparing (1) and (6), it is clear that the MLT of a signal is given by the real part of its MCLT.
Construction of the MCLT can be viewed as providing additional sine-modulated functions as a 2x oversampling in the frequency domain, because for every new M real-valued input samples the MCLT computes M complex frequency components. In addition, the MCLT functions above form an over-complete basis. Consequently, the MCLT is in fact a 2x oversampled DFT filter bank (using a doubly-odd DFT instead of the traditional DFT), in which the DFT length size is 2M and the frame (block) size is M. It should be noted that, unlike in DFT filter banks, the lowest-frequency subband (the “DC” subband) is complex-valued. With the MCLT, if the direct and inverse transforms are cascaded for a block, without modifying the transform coefficients, the following is obtained:
X=P a T x, Y=X, y=P s Yy=P s P a T x  (9)
with
P s P a T=diag{h p(n)}  (10)
Thus, it should be noted that the mapping from the input block x to the reconstructed block y is done via a diagonal matrix of order 2M. This is in n contrast to the MLT, for which the product PsPa T is not diagonal. In fact, the off-diagonal terms of PsPa T for the MLT are the time-domain aliasing terms, which are cancelled when the overlapped blocks are superimposed. When the subband signals are processed such that Y≠X, then the time-domain aliasing terms will not cancel exactly, producing artifacts. The MCLT, because of its 2x oversampling, does not rely on time-domain aliasing cancellation. Moreover, another property of the MCLT is that the reconstruction formula: y ( n ) = k = 0 M - 1 Y ( k ) p s ( n , k ) ( 11 )
Figure US06496795-20021217-M00015
is achieved. Perfect reconstruction (with X(k)=Y(k), of course) can also be achieved with the choices: y c ( n ) = k = 0 M - 1 Re { Y ( k ) } p s c ( n , k ) ( 12 ) y s ( n ) = k = 0 M - 1 Im { Y ( k ) } p s s ( n , k ) ( 13 )
Figure US06496795-20021217-M00016
In eqn. (12), an inverse MLT is recognized. Although y(n), yc(n), and ys(n) in eqns. (11)-(13) are not block-by-block identical, they build exactly the same reconstructed signal after overlapping.
The magnitude frequency responses of the MCLT filter bank are the same as those of the MLT. For each frequency ωk=(k+1/2)π/M there are two subbands with the same magnitude frequency response but π/2 radians out of phase. As such, there is significant overlap among the frequency responses of neighboring subbands, and the stopband attenuation is around −22 dB with the sine window in eqn. (2).
Fast Computation
As with the MLT, the MCLT can be computed via the type-IV discrete cosine transform (DCT-IV). For a signal u(n), its length-M orthogonal DCT-IV is defined by: U ( k ) 2 M n = 0 M - 1 u ( n ) cos [ ( n + 1 2 ) ( k + 1 2 ) π M ] ( 14 )
Figure US06496795-20021217-M00017
The frequencies of the cosine functions that form the DCT-IV basis are (k+1/2)π/M, the same as those of the MLT and MCLT. The type-IV discrete sine transform (DST-IV) of a signal v(n) is defined by. V ( k ) 2 M n = 0 M - 1 v ( n ) sin [ ( n + 1 2 ) ( k + 1 2 ) π M ] ( 15 )
Figure US06496795-20021217-M00018
For a signal x(n) with MCLT coefficients X(k) determined by eqn. (8), Re{X(k)}=U(k) and Im{X(k)}=V(k), if u(n) in eqn. (14) is related to x(n), for n=0, 1, K, M/2−1, by:
u(n+M/2)=ΔM {x(M−1−n)h a(M−1−n)−x(n)h a(n)}u(M/2−1−n)=x(M−1−n)h a(n)+x(n)h a(M−1−n)
and v(n) in (15) is related to x(n) by
v(n+M/2)=ΔM {x(M−1−n)h a(M−1−n)+x(n)h a(n)}v(M/2−1−n)=−x(M−1−n)h a(n)+x(n)h a(M1−n)
where ΔM{.} is the M-sample (one block) delay operator.
Thus, the MCLT can be computed from a MCLT computational system 400 having a window processor 410, which receives M sample blocks 412 of an input signal 414. The window processor 410 has real and imaginary window operators, real and imaginary transform processors 416, 418, such as a length-M DCT-IV and a length-M DST-IV, respectively, as shown in the simplified block diagram of FIG. 4. The real and imaginary window operators of the window processor 410 applies and computes real and imaginary butterfly coefficients, respectively to produce resulting real and imaginary vectors.
After a predefined delay of the real and imaginary vectors, such as a one-block delay, from a real delay block 420 and an imaginary delay block 422, respectively, the length-M DCT-IV 416 receives the real vectors and the length-M DST-IV 418 receives the imaginary vectors. The real transform processor 416 performs a discrete cosine transform (DCT) operation on the real vectors and the imaginary transform processor 418 performs a discrete sine transform (DST) operation on the imaginary vectors. Output signals with real and imaginary parts 424, 426 are produced as vectors with MCLT coefficients corresponding to the input block of samples.
As shown in FIG. 4, for the fast direct MCLT, n=0, 1, . . . , M/2−1, k=0, 1, . . . , M/2−1. The DCT-IV and DST-IV can be implemented with the fast techniques. The inverse MCLT can be computed by simply transposing the components, moving the delays to the bottom half ouputs of the DCT-IV and DST-IV, replacing the coefficients ha(n) by hs(n), and multiplying the contents of the final buffer by ½. The fast MCLT computation shown in FIG. 4 does not assume identical analysis and synthesis windows. Therefore, it can be used to compute a biorthogonal MCLT, as long as the windows satisfy the perfect reconstruction condition in eqn. (3).
FIGS. 5A and 5B are detailed diagrams illustrating the window operation of the modulated complex lapped transform of FIG. 4 for the case M=8. It is easy to infer from those diagrams the general structure for any choice of the block size M. In general, as shown in FIGS. 5A and 5B, the MCLT computational system 400 of FIG. 4 includes real and imaginary window operators 502, 504. Initially, a first a block of M samples of an input signal x(n) is obtained. Second, for each window operator 502, 504 butterfly coefficients 512, 513 are applied and computed to produce resulting real vectors u(n) and imaginary vectors v(n). The butterfly coefficients are determined by a window function {h(n)} 514, 515.
For each window operator 502, 504 half of the resulting vectors are stored in a buffer of a one block delay 516, 518 to be used for the next block, while the current contents of the buffer are recovered. Next, the real and imaginary vectors are received by the real and imaginary transform processors 520, 522, which are preferably discrete cosine transform (DCT) and discrete sine transform (DST) processors to produce vectors with real and imaginary MCLT transform coefficients corresponding to the input signal.
Working Operational Example
FIG. 6 is a flow diagram illustrating operational computation of a working example of a modulated complex lapped transform in accordance with the present invention. Referring to FIGS. 3, 4, 5A and 5B along with FIG. 6, first, an input buffer x containing M signal samples are read by a MCLT system (box 610). Second, h(n) based butterflies are computed for a cosine (real) part u(n) (box 612), a top half of u(n) is stored in a buffer 616 for use in the next block (box 614) and the top half of u(n) is read from a previous block (box 618). Next, h(n) based butterflies are computed for a sine (imaginary) part v(n) (box 620), a top half of v(n) is stored in a buffer 624 for use in the next block (box 622) and the top half of v(n) is read from a previous block (box 626). A discrete cosine transform, type IV (DCT IV) is then computed on u(n) (box 628) and a discrete sine transform, type IV (DST IV) is computed on v(n) (box 630). Last, an output buffer U(k) containing the real part of the MCLT is produced (box 632) and an output buffer V(k) containing the imaginary part of the MCLT is produced (box 634).
This allows the present invention to provide perfect reconstruction, in that a signal x(n) (where n denotes the discrete-time index) can be recovered exactly from its MCLT coefficients. The MCLT is a linear operator that projects the input block into a frame containing 2M basis functions. The MCLT corresponds to a tight frame (all blocks of same energy are mapped into transform vectors with the same energy), with a magnitude amplification factor equal to two.
An advantage of the novel MCLT of the present invention is that the MCLT maps a block of M input signal samples into M complex frequency coefficients. As a result, magnitude and phase information at each frequency is immediately available with the MCLT. In addition, the real part of the MCLT is the MLT, which makes for simplified computation of the MLT of a signal whose MCLT is known. This allows for efficient integration with multimedia applications, such as acoustic echo cancellation and audio coding. Moreover, because the MCLT maps a block of M input signal samples into M complex frequency coefficients, the MCLT leads to data expansion of a factor of two. In other words, an oversampling factor of two. This oversampling actually provides good performance in acoustic echo cancellation applications.
In summary, the MCLT of the present invention provides short-time spectral decomposition of signals with explicit magnitude and phase information and perfect signal reconstruction. Fast computability by means of butterflies followed by discrete cosine transform operators. Also, the real of the MCLT can be computed directly by the techniques discussed above, and the imaginary part can be computed with simple modifications. In addition, the MCLT is easily integrated with MLT-based systems. Further, once the MCLT of a signal has been computed, its MLT can be trivially obtained simply by discarding the imaginary part.
MCLT Used as an Adaptive Filter
General Overview
FIG. 7 is a general block diagram of a full-band adaptive filter using an adaptive FIR filtering approach. In general, a filtering system 700, such as a full-band adaptive filtering system, includes an input signal x(n) 710 and reference signal r(n) 712 received by an adaptive filter 720. The adaptive filter produces an output signal y(n) 722, which is sent back into the adaptive filter 720 for providing automatic refinement adjustments to the filtering process until the output signal y(n) approximates as closely as possible the reference signal r(n).
Specifically, the adaptive filter is preferably a filter with time-varying coefficients, which are automatically adjusted such that the output of the filter approximates as closely as possible a prescribed reference signal. If the adaptive-filter has a finite impulse response (FIR), the output signal y(n) is computed from the input signal x(n) by y ( n ) = l = 0 L - 1 w l ( n ) x ( n - l )
Figure US06496795-20021217-M00019
where L is the length of the filter and {wl(n), I=0, 1, . . . , L−1} are the time-varying filter coefficients. The adaptive filter allows the output y(n) to approximate a reference signal r(n), or equivalently, drives the error signal e(n)=(n)−y(n) as close to zero as possible.
Given an initial setting for the filter coefficient vector wl(0) the coefficients can be updated by using a LMS update equation:
w l(n+1)=w l(n)+2μe(n)x(n−l)
where μ is a parameter that controls the speed of adaptation. For any coefficient position I, the LMS performs updates as an adaptation rule if the error e(n) has the same sign as the input e(n), i.e. if their product is positive, then |y(n)| is too small, and thus wl should be increased. The adaptation rule above corresponds to adjusting the coefficient vector w in the negative direction of the gradient of the error with respect to w, i.e. a steepest descent update.
The adaptive filter of FIG. 7 can be used in many applications where the response of the filter needs to change in view of varying conditions. Examples include modems (to equalize the signal distortions caused by the telephone lines) and acoustic echo cancellers (to remove the feedback from loudspeakers into microphones). If the input signal has a frequency spectrum that contains many peaks and valleys (i.e., if it is heavily colored), the parameter μ in the LMS update equation has to be set to a very low value, which reduces the speed of adaptation, i.e., the speed in which the error signal e(n) converges to values near zero. On the other hand, if the input signal has a flat (white) spectrum, the LMS update equation is optimal, in the sense that it will lead to the fastest possible convergence
Frequency-Domain Adaptive Filters
FIG. 8 is a general block diagram of a frequency-domain MCLT-based adaptive filter in accordance with the present invention. The performance of the LMS adaptive filter of FIG. 7 discussed above can be improved for colored input signals by using the new structure in FIG. 8. For instance, the signals can be broken into frequency subbands and an adaptive LMS filter can be performed in each subband, as shown in FIG. 8. Although FIG. 8 depicts the MCLT as the transform operator that performs the frequency decomposition, other transforms could be used, such as a modulated lapped transform (MLT). If real transforms such as the MLT are used, the adaptive filters of each of the subbands have real coefficients. With a complex-valued transform such as the MCLT, the filter coefficients will have complex values.
In general, the frequency-domain adaptive filter of FIG. 8 includes a first MCLT processor 810 for receiving and processing an input signal x(n) for producing input signal vectors, such as X(0) through X(M−1) and a second MCLT processor 812 for receiving and processing a reference signal r(n) for producing reference signal vectors, such as R(0) through R(M−1). Also included in system 800 are plural adaptive filters 814 for receiving the input signal vectors X(0) through X(M−1) and the reference signal vectors R(0) through R(M−1) for producing corrected signal vectors, such as Y(0) through Y(M−1) and an inverse modulated complex lapped transform processor (IMCLT) 816. The IMCLT 816 receives and processes the corrected signal vectors Y(0) through Y(M−1) for producing a final output signal y(n) that substantially matches the input signal x(n).
Thus, in the frequency-domain the adaptive filter of FIG. 8, there is an adaptive filter for each subband k. Consequently, the subband signals are modified according to the adaptive filter learning computation. The final output y(n) is obtained by applying an inverse MCLT (IMCLT) on the corrected subband/transform coefficients {Y(k)}. If the original adaptive filter of FIG. 7 had L coefficients, each adaptive filter in FIG. 8 needs only to have L/M coefficients, for the same time span.
The advantages of using the frequency-domain adaptive filter of FIG. 8 include faster convergence, because the signals within each subband are approximately white, even for a heavily colored input. Also, the device of FIG. 8 provides improved error control, because the μ factors for the adaptive filters in each subband can be adjusted independently. Finally, the system in FIG. 8 can have a reduced computational complexity, because of the fast FFT-based algorithms available to compute the transforms.
Acoustic Echo Cancellation (AEC)
One application of the adaptive filter of FIG. 8 is in acoustic echo cancellation (AEC), such as for real-time full-duplex communication systems (for instance, speakerphones and videoconference systems). For instance, in a speakerphone system, the AEC can use an adaptive filter that estimates the feedback transfer function from the loudspeaker to the microphone. The estimated echo return is then subtracted from the microphone signal. Simple FIR filters are not ideal because of the length of the impulse response necessary to obtain a reasonable amount of echo reduction (for a 16 kHz sampling rate and an echo window of 100 ms, a 1,600-point impulse response is needed). With subband adaptive filtering, the long FIR full-band filter is replaced by a collection of short FIR filters, one for each subband.
A critically sampled filter bank such as the MLT can be used for adaptive filtering, but the uncancelled aliasing due to subband processing may limit the amount of echo reduction to 10 dB or less. Performance can be improved by using cross-filters among neighboring subbands, but the extra degrees of freedom in such adaptive cross-filters usually slows down convergence significantly. With the MLCT, subband acoustic echo cancellation (AEC) can be performed without cross-filters. Each subband can be processed by a short FIR filter with complex taps, as shown in FIG. 8. With a large number of subbands, the subband signals are essentially white, and so each adaptive filter can be adjusted via the normalized LMS computation.
FIG. 9 is a block diagram of a speakerphone working example of the adaptive filer of FIG. 8 of the present invention in the form of an acoustic echo cancellation device. The speakerphone system 900 of FIG. 9 includes communication equipment 910 comprising microphone input signals 912 received from a microphone 914, which can be amplified by an amplifier 915 and speaker output signals 916 transmitted to a speaker 918. The system 900 also includes a filter 920, such as the MCLT-based adaptive filter discussed above, for receiving input signals produced by the speaker 918 and reference signals received by the microphone 914.
For example, in typical speakerphone systems, the local microphone not only captures audio signals intended to be transmitted (such as voice signals of a local person), it also captures audio signals that are being attenuated at the local loudspeaker (such as voice signals from a remote person's transmission) as feedback. Unless the speaker feedback is cancelled, that feedback signal is sent back to the remote person. As such, the remote person will hear an echo of his or her own voice transmitted to the local person.
To solve this problem, the adaptive filter of the present invention includes an MCLT-based adaptive filter for processing and filtering the input and references signal for producing an output signal with information indicating the estimated echo portion of the signal. The estimated echo portion of the output signal is removed or canceled and a resulting clean output signal is sent to the microphone input of the communication equipment 910. Consequently, after an initial audio signal is sent through the system 900, subsequent audio signals with feedback or echoes produced by the loudspeaker 918 are canceled by the adaptive filter 920 before the microphone input is received.
Since the input to the adaptive filter 920 is the signal from the speaker 918 and the reference input is the signal from the microphone 914, the output of the adaptive filter will be a good estimate of the portion of the microphone signal that is dependent on the loudspeaker signal, which is precisely the echo. When the echo is subtracted from the signal of the microphone 914, as shown in FIG. 9, only the part of the microphone signal that is not correlated with the loudspeaker signal will remain. The remaining part (which is the “cleaned” microphone signal 912 in FIG. 9) corresponds to the other local sounds, such as the voice of the person, speaking and other ambient sounds.
FIG. 10 is a general block diagram of an acoustic echo cancellation device (AEC) with MCLT-based adaptive filters in accordance with the present invention. In general, referring to FIGS. 8-9 along with FIG. 10, the AEC 1000 of FIG. 10 includes a first MCLT processor 1010 for receiving and processing a loudspeaker signal as an input signal x(n) for producing input signal vectors X(0) through X(M−1) and a second MCLT processor 1012 for receiving and processing a microphone signal as a reference signal r(n) for producing reference signal vectors R(0) through R(M−1).
Also included in system 1000 are plural adaptive filters 1014 for receiving the input signal vectors X(0) through X(M−1) and the reference signal vectors R(0) through R(M−1). The adaptive filters estimate the echo within the signals, which are then combined with the reference signals for canceling the echoes and producing cleaned and corrected signal vectors, such as Y(0) through Y(M−1). An inverse modulated complex lapped transform processor (IMCLT) 1016 receives and processes the corrected signal vectors Y(0) through Y(M−1), which have MCLT coefficients without echo, for producing a final output signal with the echo canceled.
Subtraction of the estimated echo from the microphone signal is preferably performed for each subband, resulting in a set of subband signals Y(k) with the echo substantially removed.
The AEC and spectral subtraction can be combined using a single MLCT decomposition. For example, spectral subtraction can be applied to the subband signals immediately after the AEC adaptive filters. If the resulting signal is to be encoded by an MLT-based codec, then the MLT coefficients for the audio codec can be obtained by simply taking the real part of the outputs of the spectral subtraction. Therefore, only a single transformation step with the MCLT is necessary to perform simultaneous signal enhancement and coding.
If the waveform y(n) corresponding to the echo-cancelled subband signals Y(k) in FIG. 10 is desired, then an inverse modulated complex transform (IMCLT) can be performed on Y(k), as shown in FIG. 8. However, if the signals are to be encoded with an MLT-based coder/decoder (codec), such as MSAudio, then y(n) need not be computed since a codec can work directly with the Y(k) subband/transform coefficients.
One advantage of using the MCLT-based adaptive filters is that the MCLT uses short windows, which leads to low processing delay. Another advantage is that the MCLT allows for perfect signal reconstruction. Also, integrating an MCLT adaptive filter with an MLT-based processing system (for example, an audio codec) is very easy, since the MLT is obtained directly as the real part of the MCLT. Further, for a given number of subbands M (which is also the block size), a windowed Fourier transform decomposes the signal into M/2+1 distinct subbands. The MCLT breaks the signal into M subbands, and so it provides essentially twice the frequency resolution. Therefore, an MCLT-based adaptive filter will converge faster, because narrower subbands tend to have a flatter spectrum.
FIG. 11 is a wave signal illustrating sample results of the working example of FIG. 9. The first wave signal (Wave A) is the microphone signal as a recorded echo return. The second wave signal (Wave B) is the output of a full-band AEC, echo reduction ratio (ERR)≈26 dB (it should be noted that the vertical scale is different). The third wave signal (Wave C) is an output of a 512-band MLT AEC without cross filters, ERR≈5 dB. The bottom wave signal (Wave D) is an output of a 512-band MCLT AEC without cross filters, ERR≈20 dB.
Specifically, the original signal is an actual echo return recorded at 16 kHz sampling from a microphone located at about 20″ from the loudspeaker (using a 4″ driver). The signals in FIG. 11 show the cancelled echo after convergence of each AEC (which takes a few seconds in all cases). The MLT and MCLT AECs used M=512 subbands and a four-tap adaptive filter in each band (corresponding to an echo window of about 128 ms). The echo attenuation for the MCLT is about 20 dB, which is adequate for many practical teleconferencing applications.
Noise Reduction
In addition, the MCLT of the present invention is amenable to other types of frequency-domain processing while allowing for perfect signal reconstruction. For instance, another kind of processing that can be efficiently performed in the frequency domain, especially with the MCLT, is noise reduction. For the audio/voice communication system in FIG. 9, even after the loudspeaker echo is cancelled the signal may still be noisy. The AEC usually removes feedback from the loudspeaker, but may not remove other noises, such as ambient noises that may be generated by computers and fans in an office.
An efficient approach to attack noise reduction is with spectral subtraction. For each subband k, the signal Y(k) is considered as having a desired signal and a noise component, in the form:
Y(k)=S(k)+N(k)
where S(k) is the desired signal and N(k) is the interfering noise. Assuming the signal and noise are uncorrelated, the energy of the subband signal is just the sum of the signal and noise energies:
|Y(k)|2 =|S(k)|2 +|N(k)|2
With spectral subtraction, noise reduction is achieved by estimating the average noise magnitude |N(k)| during low-amplitude signals, i.e., during periods where |S(k)| is assumed to be zero. The variable Ne(k) is the noise level estimate for the kth subband, which can be subtracted from Y(k), in the form
|Y f(k)|=|Y(k)|−α(k)|N e(k),  (A)
where Yf(k) is the filtered signal. As such, a portion of the estimated magnitude noise is subtracted from the magnitude of each subband signal. The phase is not affected, since the average noise phase is always zero. The parameters α(k) control how much of the noise estimate is subtracted from each subband signal, and so 0<α(k)<1 is preferably set. These parameters are preferably adjusted depending on the quality of the noise estimates. For example, if the noise estimate is significantly above the true noise level, the subtraction in eqn. (A) will remove part of the signal, also, leading to noticeable artifacts.
Specifically, FIG. 12 is a general block diagram of a noise reduction device with MCLT-based adaptive filters in accordance with the present invention. In general, the noise reduction device 1200 of FIG. 12 includes an MCLT processor 1210 for receiving and processing an input signal x(n) corrupted by noise for producing input signal vectors X(0) through X(M−1), plural subtraction devices 1212, such as subband noise subtraction devices and-an inverse modulated complex lapped transform processor (IMCLT) 1216.
The plural subband noise subtraction devices 1212 receive the input signal vectors X(0) through X(M−1) and compute magnitude, Xmag(0) through Xmag(M−1), and phase, Xph(0) through Xph(M−1), information (box 1218). Noise levels, Ne(0) through Ne(M−1) are estimated from the magnitude, Xmag(0) through Xmag(M−1), information (box 1220). The noise level estimates are combined with the magnitude information for reducing the noise based on the noise level estimated to produce cleaned and corrected magnitude information, which is then sent to a recovery device 1222 for recovering the real and imaginary parts of this information. An inverse modulated complex lapped transform processor (IMCLT) 1216 receives and processes the corrected information as signal vectors Y(0) through Y(M−1), which have MCLT coefficients with reduced noise, for producing a final output signal with noise reduction.
FIG. 13 is a flow diagram illustrating operational computation of a noise reduction device incorporating the modulated complex lapped transform of the present invention. Referring to FIG. 12 along with FIG. 13, first, an input buffer x containing M subband coefficients are read (box 1310) by MCLT processor 1210 of FIG. 12. Second, the MCLT coefficients X(k) are computed (box 1312) and this information is sent to the subband subtraction device 1212 of FIG. 12. Third, a first subband k=0 is analyzed (box 1314) by the subband subtraction device 1212 of FIG. 12. Fourth, it is determined whether a transform coefficient |X(k)| is less than a threshold value, Th. If it is, the noise level estimate is adjusted (box 1318), for example with an update function such as:
|N e(k)|2 ←β|N e(k)|2+(1−β)|X(k)|2
If the coefficient |X(k)| is not less than the threshold value, Th, and after the above function is performed, spectral subtraction is performed (box 1320) by the subband subtraction device 1212 of FIG. 12, preferably with the following expression:
|Y f(k)|=|X(k)|−α(k)|N e(k)|
Next, the subband subtraction device processes the next subband k=k+1 (box 1322). It is then determined whether k=M (box 1324). If not, the process returns to step 1316. Otherwise, last, an output buffer Y(k) containing M filtered subband coefficients is produced (box 1326) with reduced noise by the IMCLT 1216 of FIG. 12.
In practice, the noise reduction process is preferably performed right after the echo cancellation process of FIG. 10, otherwise the loudspeaker echo would interfere with the noise estimate and make it less reliable.
Although the spectral subtraction as in eqn. (A) can be performed with subband signals derived from a windowed Fourier transform, there are several advantages of using an MCLT instead of the Fourier transform. First, with the MCLT, perfect reconstruction of the signal can be obtained, which is important in low-noise, high fidelity applications. However, with a windowed Fourier transform, usually long windows are needed for good enough signal reconstruction, increasing the processing delay. Next, as discussed above, for a given number of subbands M (which is also the block size), a windowed Fourier transform decomposes the signal into M/2+1 distinct subbands. In contrast, the MCLT breaks the signal into M subbands, and so it provides essentially twice the frequency resolution. Therefore, an MCLT-based noise reducer allows for finer discrimination, which is important in reducing noise with periodic components, such as high-frequency tones generated by computer hard disks.
FIG. 14 is a wave signal illustrating sample results of the working example of FIGS. 12-13 using the MCLT of the present invention with spectral subtraction. The top wave signal (Wave A) is the original speech, corrupted by PC noise, SNR≈15 dB and the bottom wave signal (Wave B) is the processed speech, SNR≈30 dB.
Specifically, an original 8-second speech signal was captured at 16 kHz sampling rate, with the microphone near a very noisy personal computer (PC), whose noise spectrum is approximately pink. The depth of subtraction for a noise reduction of about 15 dB was adjusted. The results are shown in FIG. 14, where the signal-to-noise ratio (SNR) was successfully increased from 15 dB to 30 dB. More importantly, the processed file has fewer artifacts than the results obtained using a commercial product that uses standard DFT filter banks for spectral subtraction.
Integrating AEC, Noise Reduction and Codec
FIG. 15 is a block diagram of a working system of the present invention shown as a signal enhancer and noise reducer integrated with a codec. The MCLT, AEC and noise reducer of the present invention as described above, can be integrated with an audio codec for use by a real time communication system, such as audio applications including Internet telephony or other forms of hands-free teleconferencing or telephony. The integration of the MCLT, AEC, noise reducer and codec leads to improved performance, reduced computational complexity, and reduced memory usage and processing delay over current systems.
In general, referring to FIG. 15, the AEC (element 800 of FIG. 8) and noise reducer (element 1200 of FIG. 12) of the present invention can be integrated with an audio codec to form a novel integrated processor 1500. The integrated processor 1500 of the present invention includes a MCLT processor 1510 (similar to MCLT processor 810 of FIG. 8), an AEC processor 1512 (similar to the AEC processor 800 of FIG. 8), a noise reducer 1516 (similar to the noise reducer 1200 of FIG. 12), a coefficient filter 1516, a magnitude processor 1518, and a codec 1520, which can be any suitable codec. The noise reducer 1516 is preferably included in the codec 1520, as shown in FIG. 15. The codec can be the audio codec (with suitable modifications in accordance with the present invention) described in co-pending U.S. patent application Ser. No. 09/085,620, filed on May 27, 1998 by Henrique Malvar, entitled “Scalable Audio Coder and Decoder” and assigned to the current assignee, which is herein incorporated by reference.
During operation, in a real-time communication system (i.e. an application that utilizes a digital communication channel), such as a digital network or online Internet communications for dynamically transmitting audio and video signals, the audio that is captured can be enhanced by operations of the AEC 1512 and noise reducer 1514 of the present invention. After enhancement, the audio signal is preferably coded (compressed) by the codec 1520 to limit the bit rate to a rate that is adequate for the communication channel.
As shown in FIG. 15, the MCLT processor 1510 receives and processes an input signal x(n) for producing input signal vectors, such as X(0) through X(M−1) and the AEC processor 1512 receives and processes a reference signal for producing reference signal vectors 1522. The noise reducer 1516 then receives the signal vectors 1514 from the AEC processor 1512 and produces enhanced MCLT coefficients (real and imaginary parts), such as coefficients Y(0) through Y(M−1), in accordance with the noise reducer 1200 of FIG. 12.
Specifically, in a communication system, the AEC does not need to perform an inverse MCLT after the adaptive filters in each subband, if a codec operates in the frequency domain. Thus, the same principle applies when the noise reduction process of eqn. (A) is added. For example, after computing the subband signals Yf(k) at the output of the AEC, the noise reduction step is applied to generate the filtered subband coefficients Yf(k), which are then sent directly to the codec, without the need to return to the corresponding time-domain signal. Alternatively, if the codec does not operate in the frequency domain (such as many telephony codecs), then the echo-cancelled and noise filtered signal yf(n) can be obtained simply by computing an inverse MCLT on the subband signals Yf(k).
It should be noted that instead of using the modulated lapped transform (MLT) as the first processing step (the audio coder of co-pending U.S. patent application Ser. No. 09/085,620), the present invention uses the MCLT as the first processing step in order to avoid performing inverse MCLT computations. This is because, as described above, the MLT is the real part of the MCLT. Thus, if the enhanced audio signal is available in the MCLT domain, it is not necessary to compute the inverse MCLT to recover the time domain waveform and then compute its MLT. Instead, the imaginary part of the MCLT coefficients are discarded, thereby allowing the system 1500 to obtain the MLT coefficients directly from the real part of the MCLT.
From the Y(k) enhanced MCLT coefficients produced by the AEC 1512 and noise reducer 1514, coefficients R(k) can be obtained by:
R(k)=Re{Y(k)}
where Re{x} denotes taking the real part. In particular, as shown in FIG. 15, the coefficient filter 1516 and the magnitude processor 1518 receive coefficients Y(0) through Y(M−1). The coefficient filter 1516 processes the MCLT coefficients (real and imaginary parts) and discards the imaginary parts of the MCLT coefficients.
Improving Computation of the Auditory Masking Functions
In addition to obtaining the MLT coefficients needed by the codec directly from the real part of the MCLT coefficients, the integrated system 1500 of FIG. 15 produces accurate masking functions. For example, the audio codec of U.S. patent application Ser. No. 09/085,620 computes weighting functions based on hearing thresholds, defined by functions that approximate the masking phenomena in the human auditory system. Such masking functions can be computed based on the power spectrum of the incoming audio, i.e.. the power values at each frequency index k. As such, the spectral magnitudes are approximated by magnitudes of the MLT coefficients.
However, since the MLT coefficients are obtained from projected the signal into modulated cosines, their magnitudes are typically not directly proportional to the actual physical r.m.s. (root mean-square) power contained in the signal at each frequency subband. With the MCLT, the magnitudes can be computed directly from the real and imaginary parts (cosine and sine projections, respectively), and such magnitudes are then directly proportional to the physical r.m.s. power at each frequency subband k. In that way, the computation of the masking functions are more precise as compared to computations based solely on the MLT (real part) coefficients.
As shown in FIG. 15, the magnitude processor 1518 computes the magnitudes of the MCLT coefficients, such as U(0) through U(M−1). Computation of the magnitudes U(k) can be performed by the following expression:
U(k)={square root over (Re {Y(k)}2 +Im{Y(k)}2)}
The codec 1520 further includes a weighting processor 1524, a masking functions processor 1526 and an encoding processor 1528. The masking functions processor 1526 receives the magnitude coefficients produced by the magnitude processor 1518 and computes masking functions. The weighting processor 1524 receives the masking functions and the real part of the signal from the coefficient filter 1516, such as R(0) through R(M−1) for producing the weighted signal, as described above. Last, the encoding processor 1528 performs quantization and encoding processing to produce the output bitstream.
As shown in FIG. 15, the input signal is transformed from the time domain to the frequency domain once, by means of an MCLT. Therefore, the enhancement functions of the AEC 1512 and the noise reducer 1514 are computed on the MCLT coefficients, while the codec 1520 uses the real part of the MCLT coefficients for quantization and encoding and the magnitudes of the MCLT coefficients for computation of precise auditory masking functions and weighting functions.
Integrating AEC, Noise Reduction, Coding, and Speech Recognition
For speech recognition applications, the computational load can be minimized by performing the MCLT computation once by integrating several components that process the signal in the frequency domain. As shown in FIG. 15, a speech recognizer 1530 having a coefficient processor 1532 and recognition engine 1534. The speech recognizer 1530 can be located after the magnitudes are computed 1518.
In general, in most automatic speech recognition (ASR) systems, the incoming speech signal is divided into blocks of 10 to 30 ms duration. For each block, a cepstrum vector is computed, and cepstral coefficients are used for the next step of statistical and language pattern analysis. With the set of Fourier transform coefficients for the input signal block {X(k)}, the cepstral coefficients V(r) can be defined by: V ( r ) = k = 0 M - 1 log Y ( k ) j 2 π rk M , r = 0 , 1 , K , N
Figure US06496795-20021217-M00020
which is the inverse Fourier transform of the log magnitude spectrum of the block. The parameter N (the number of spectral coefficients computed) can be set between 10 and 20.
To compute the Fourier transform coefficients X(k), a fast Fourier transform (FFT) operator is preferably computed on the incoming block. When the incoming signal has already been processed by an MCLT-based AEC and noise reducer, however, a spectral representation of the signal is already computed, namely, the MCLT coefficients.
Thus, an approximate cepstral vector C(r) can be computed by the coefficient processor 1532 using the MCLT coefficients instead of other coefficients (such as FFT coefficients), such that: C ( r ) = k = 0 M - 1 log U ( k ) j 2 π rk M , r = 0 , 1 , K , N
Figure US06496795-20021217-M00021
where {U(k)} is the set of MCLT coefficients. The speech recognition engine 1534 receives these coefficients for performing speech recognition. Although this new cepstral vector C(r) is not identical to the original cepstral vector V(r), the patterns present in V(r) will also be present in C(r). Re-training of the spectral recognition engine can be performed, so it will re-adapt to the typical patterns in C(r).
The foregoing description of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.

Claims (21)

What is claimed is:
1. A computer-implemented method for processing an audio signal, the method comprising:
applying butterfly coefficients determined by a real and an imaginary window function to a received input signal to respectively produce real and imaginary resulting vectors;
computing real and imaginary spatial transforms of the real and imaginary resulting vectors, respectively, to produce a modulated complex lapped transform having real and imaginary transform coefficients as an encoded output; and
producing enhanced complex frequency coefficients from the transform coefficients and discarding the imaginary portions of the enhanced complex frequency coefficients to produce filtered real coefficients.
2. The method of claim 1, wherein the encoded output is produced as a vector with modulated complex lapped transform coefficients corresponding to the input signal.
3. The method of claim 1, further comprising processing the encoded output by at least one of transmitting the output, storing the output, compressing the output, enhancing the output, and filtering the output.
4. The method of claim 1, wherein producing enhanced complex frequency coefficients is performed with an acoustic echo canceller.
5. The method of claim 3, wherein filtering the output comprises reducing interference within the input signal with a cancellation device.
6. The method of claim 3, wherein compressing the output is achieved by at least one of scalar and vector quantization.
7. The method of claim 1, wherein the window functions are adapted for reducing blocking effects.
8. The method of claim 1, wherein the real spatial transform is performed by a discrete cosine transform operation.
9. The method of claim 1, wherein the imaginary spatial transform is performed by a discrete sine transform operation.
10. The method of claim 1, wherein half of the resulting vectors are stored in a memory of a one block delay buffer.
11. An audio processor, comprising:
a modulated complex lapped transform process,or receiving and spectrally decomposing an input signal into modulated complex lapped transfoam coefficients having real and imaginary portions associated with the input signal;
an enhancer comprising an acoustic echo cancellation processor and a noise reducer, wherein the acoustic echo cancellation processor and the noise reducer receive the complex frequency coefficients and produce enhanced complex frequency coefficients; and
an encoder device receiving the enhanced complex coefficients and encoding them.
12. The audio processor of claim 11, further comprising producing an output signal as a vector with the complex frequency coefficients corresponding to the input signal.
13. The audio processor of claim 12, further comprising an external module for processing the output signal by at least one of transmitting the output, storing the output, compressing the output, enhancing the output, and filtering the output.
14. The audio processor of claim 11, wherein the modulated lapped transform processor includes a window processor capable of applying butterfly coefficients determined by a real window function and an imaginary window function to respectively produce real and imaginary resulting vectors.
15. The audio processor of claim 14, wherein the real and imaginary window functions are adapted for reducing blocking effects.
16. The audio processor of claim 11, wherein the modulated lapped transform processor has a real transform module with a discrete cosine transform operator and an imaginary transform module with a discrete sine transform operator.
17. The audio processor of claim 12, wherein the output signal is produced as a vector with biorthogonal modulated complex lapped transform coefficients corresponding to the input signal.
18. The audio processor of claim 14, wherein the window processor further comprises a memory of a one block delay buffer for storing a portion of the respective resulting vectors in the memory of a one block delay buffer and for recovering current contents of the delay buffer.
19. The audio processor of claim 11, wherein the encoder device encodes the real portion of the complex frequency coefficients.
20. The audio processor of claim 11, wherein the encoder device computes quantization weighting functions from the magnitudes of the complex frequency coefficients.
21. A method for recognizing human speech, comprising:
applying butterfly coefficients determined by a real and an imaginary window function to a received input signal representing human speech to respectively produce real and imaginary resulting vectors;
computing real and an imaginary spatial transforms of the real and imaginary resulting vectors, respectively, to produce a modulated complex lapped transform having real and imaginary transform coefficients as an encoded output;
producing enhanced complex frequency coefficients from the transform coefficients and discarding the imaginary portions of the enhanced complex frequency coefficients to produce filtered real coefficients;
computing an approximate cepstral vector from the enhanced complex frequency coefficients and the filtered real coefficients; and
performing human speech recognition from the approximate cepstral vector.
US09/305,690 1999-02-26 1999-05-05 Modulated complex lapped transform for integrated signal enhancement and coding Expired - Lifetime US6496795B1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US09/305,690 US6496795B1 (en) 1999-05-05 1999-05-05 Modulated complex lapped transform for integrated signal enhancement and coding
AU36060/00A AU3606000A (en) 1999-02-26 2000-02-25 Modulated complex lapped transform for integrated signal enhancement and coding
PCT/US2000/004868 WO2000051014A2 (en) 1999-02-26 2000-02-25 Modulated complex lapped transform for integrated signal enhancement and coding
AU37073/00A AU3707300A (en) 1999-02-26 2000-02-25 A system and method for producing modulated complex lapped transforms
AU36084/00A AU3608400A (en) 1999-02-26 2000-02-25 An adaptive filtering system and method for cancelling echoes and reducing noisein digital signals
PCT/US2000/004867 WO2000051013A2 (en) 1999-02-26 2000-02-25 A system and method for producing modulated complex lapped transforms
PCT/US2000/004996 WO2000051015A1 (en) 1999-02-26 2000-02-25 An adaptive filtering system and method for cancelling echoes and reducing noise in digital signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/305,690 US6496795B1 (en) 1999-05-05 1999-05-05 Modulated complex lapped transform for integrated signal enhancement and coding

Publications (1)

Publication Number Publication Date
US6496795B1 true US6496795B1 (en) 2002-12-17

Family

ID=23181902

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/305,690 Expired - Lifetime US6496795B1 (en) 1999-02-26 1999-05-05 Modulated complex lapped transform for integrated signal enhancement and coding

Country Status (1)

Country Link
US (1) US6496795B1 (en)

Cited By (94)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010051966A1 (en) * 2000-02-26 2001-12-13 Walter Pelton Apparatus, methods, and computer program products for accurately determining the coefficients of a function
US20020016808A1 (en) * 2000-06-09 2002-02-07 Walter Pelton Apparatus, methods and computer program products for performing high speed division calculations
US20020038326A1 (en) * 2000-06-09 2002-03-28 Walter Pelton Apparatus, methods, and computer program products for reducing the number of computations and number of required stored values for information processing methods
US20020138263A1 (en) * 2001-01-31 2002-09-26 Ibm Corporation Methods and apparatus for ambient noise removal in speech recognition
US20020156928A1 (en) * 2000-04-24 2002-10-24 Goodman David A. Universal interface system
US20030107503A1 (en) * 2000-01-12 2003-06-12 Juergen Herre Device and method for determining a coding block raster of a decoded signal
US20030115052A1 (en) * 2001-12-14 2003-06-19 Microsoft Corporation Adaptive window-size selection in transform coding
US6615227B2 (en) * 1999-09-17 2003-09-02 Globespanvirata, Inc. Circuit and method for computing a fast fourier transform
US20030206582A1 (en) * 2002-05-02 2003-11-06 Microsoft Corporation 2-D transforms for image and video coding
US6654716B2 (en) * 2000-10-20 2003-11-25 Telefonaktiebolaget Lm Ericsson Perceptually improved enhancement of encoded acoustic signals
US20040004995A1 (en) * 2002-07-03 2004-01-08 Commasic, Inc. Buffering method and apparatus for processing digital communication signals
US6687672B2 (en) * 2002-03-15 2004-02-03 Matsushita Electric Industrial Co., Ltd. Methods and apparatus for blind channel estimation based upon speech correlation structure
US20040064310A1 (en) * 2001-05-07 2004-04-01 Yutaka Banba Sub-band adaptive differential pulse code modulation/encoding apparatus, sub-band adaptive differential pulse code modulation/encoding method, wireless transmission system, sub-band adaptive differential pulse code modulation/decoding apparatus, sub-band adaptive differential pulse code modulation/decoding method, and wirel
US20040162866A1 (en) * 2003-02-19 2004-08-19 Malvar Henrique S. System and method for producing fast modulated complex lapped transforms
US20040184400A1 (en) * 2002-11-25 2004-09-23 Hisao Koga Multicarrier transmitter, multicarrier receiver, and multicarrier communications apparatus
US20040193663A1 (en) * 1999-04-29 2004-09-30 Pelton Walter E. Apparatus, methods, and computer program products for determining the coefficients of a function with decreased latency
US20040220805A1 (en) * 2001-06-18 2004-11-04 Ralf Geiger Method and device for processing time-discrete audio sampled values
US20050008086A1 (en) * 2003-06-18 2005-01-13 Matsushita Electric Industrial Co., Ltd. Receiver and method for digital multi carrier transmission
US20050013359A1 (en) * 2003-07-15 2005-01-20 Microsoft Corporation Spatial-domain lapped transform in digital media compression
US20050024981A1 (en) * 2002-12-05 2005-02-03 Intel Corporation. Byte aligned redundancy for memory array
US20050031048A1 (en) * 2003-07-03 2005-02-10 Matsushita Electric Industrial Co., Ltd. Transmitter and method for digital multi-carrier transmission
US20050053150A1 (en) * 2003-09-07 2005-03-10 Microsoft Corporation Conditional lapped transform
US20050089148A1 (en) * 2003-10-24 2005-04-28 Stokes Jack W.Iii Systems and methods for echo cancellation with arbitrary playback sampling rates
US20050141609A1 (en) * 2001-09-18 2005-06-30 Microsoft Corporation Block transform and quantization for image and video coding
US20050169483A1 (en) * 2004-02-04 2005-08-04 Microsoft Corporation Analog preamplifier measurement for a microphone array
US20050175179A1 (en) * 2004-02-10 2005-08-11 Mustafa Kesal Media watermarking by biasing randomized statistics
US20050175190A1 (en) * 2004-02-09 2005-08-11 Microsoft Corporation Self-descriptive microphone array
WO2005073959A1 (en) * 2004-01-28 2005-08-11 Koninklijke Philips Electronics N.V. Audio signal decoding using complex-valued data
US20050195988A1 (en) * 2004-03-02 2005-09-08 Microsoft Corporation System and method for beamforming using a microphone array
US20050238109A1 (en) * 2004-04-16 2005-10-27 Matsushita Electric Industrial Co., Ltd. Communication apparatus and communication method using digital wavelet multi carrier transmission system
US20050238089A1 (en) * 2004-04-16 2005-10-27 Matsushita Electric Industrial Co., Ltd. Communication apparatus and communication method using digital wavelet multi carrier transmission system
US20050256916A1 (en) * 2004-05-14 2005-11-17 Microsoft Corporation Fast video codec transform implementations
US20060116481A1 (en) * 2004-12-01 2006-06-01 Morgan William E Castable liquid rubber compositions for golf balls
US20060133528A1 (en) * 2004-12-16 2006-06-22 Matsushita Electric Industrial Co., Ltd. Receiver and communication method for digital multi-carrier communication
US20060133682A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US20060133684A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible 2-dimensional pre-/post-filtering for lapped biorthogonal transform
US20060133683A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible transform for lossy and lossless 2-D data compression
US7164724B2 (en) 2002-09-25 2007-01-16 Matsushita Electric Industrial Co., Ltd. Communication apparatus
US20070016405A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US20070036225A1 (en) * 2005-08-12 2007-02-15 Microsoft Corporation SIMD lapped transform-based digital media encoding/decoding
US20070081734A1 (en) * 2005-10-07 2007-04-12 Microsoft Corporation Multimedia signal processing using fixed-point approximations of linear transforms
US20070276656A1 (en) * 2006-05-25 2007-11-29 Audience, Inc. System and method for processing an audio signal
WO2008014853A1 (en) * 2006-07-31 2008-02-07 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Device and method for processing a real subband signal for reducing aliasing effects
US7412102B2 (en) 2003-09-07 2008-08-12 Microsoft Corporation Interlace frame lapped transform
US20080198935A1 (en) * 2007-02-21 2008-08-21 Microsoft Corporation Computational complexity and precision control in transform-based digital media codec
US20080259828A1 (en) * 2007-04-23 2008-10-23 Microsoft Corporation Method and apparatus for reducing timestamp noise in audio echo cancellation
US20090157785A1 (en) * 2007-12-13 2009-06-18 Qualcomm Incorporated Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
US20090290736A1 (en) * 2008-05-21 2009-11-26 Daniel Alfsmann Filter bank system for hearing aids
US20090297054A1 (en) * 2008-05-27 2009-12-03 Microsoft Corporation Reducing dc leakage in hd photo transform
US20090299754A1 (en) * 2008-05-30 2009-12-03 Microsoft Corporation Factorization of overlapping tranforms into two block transforms
US20090319278A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (mclt)
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US20100241423A1 (en) * 2009-03-18 2010-09-23 Stanley Wayne Jackson System and method for frequency to phase balancing for timbre-accurate low bit rate audio encoding
US20110191111A1 (en) * 2010-01-29 2011-08-04 Polycom, Inc. Audio Packet Loss Concealment by Transform Interpolation
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US8194880B2 (en) 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
US8275209B2 (en) 2008-10-10 2012-09-25 Microsoft Corporation Reduced DC gain mismatch and DC leakage in overlap transform processing
RU2464540C2 (en) * 2007-12-13 2012-10-20 Квэлкомм Инкорпорейтед Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
JP2013120253A (en) * 2011-12-07 2013-06-17 Nippon Telegr & Teleph Corp <Ntt> Spectrum vector storage device, spectrum vector storage method and spectrum vector storage program
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
US20130246492A1 (en) * 2010-03-11 2013-09-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window
TWI412022B (en) * 2010-12-30 2013-10-11 Univ Nat Cheng Kung Recursive discrete cosine transform and inverse discrete cosine transform system
TWI421858B (en) * 2007-05-24 2014-01-01 Audience Inc System and method for processing an audio signal
TWI423046B (en) * 2010-12-30 2014-01-11 Univ Nat Cheng Kung Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US20150100312A1 (en) * 2013-10-04 2015-04-09 At&T Intellectual Property I, L.P. System and method of using neural transforms of robust audio features for speech processing
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US20150149160A1 (en) * 2012-06-18 2015-05-28 Goertek, Inc. Method And Device For Dereverberation Of Single-Channel Speech
US9185487B2 (en) 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US20150371656A1 (en) * 2014-06-19 2015-12-24 Yang Gao Acoustic Echo Preprocessing for Speech Enhancement
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US20170243593A1 (en) * 2002-09-18 2017-08-24 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
US9978388B2 (en) 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
US10236008B2 (en) 2015-03-09 2019-03-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for decoding an encoded audio signal and encoder for encoding an audio signal
US10403295B2 (en) 2001-11-29 2019-09-03 Dolby International Ab Methods for improving high frequency reconstruction
RU2719285C1 (en) * 2016-07-29 2020-04-17 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Reduced overlapping of spectra in time domain for non-uniform filter banks, which use spectral analysis with subsequent partial synthesis
CN115392325A (en) * 2022-10-26 2022-11-25 中国人民解放军国防科技大学 Multi-feature noise reduction modulation identification method based on cycleGan

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4114189A (en) * 1975-12-09 1978-09-12 The United States Of America As Represented By The Secretary Of The Navy Digital signal synthesis system
US4837828A (en) * 1982-05-12 1989-06-06 Nec Corporation Pattern feature extracting system
US5339265A (en) * 1992-08-31 1994-08-16 University Of Maryland At College Park Optimal unified architectures for the real-time computation of time-recursive discrete sinusoidal transforms
DE4337653A1 (en) 1993-11-04 1995-05-11 Deutsche Bundespost Telekom Method and circuit arrangement for improving the quality of voice communication in hands-free telephone facilities
WO1996024128A1 (en) * 1995-01-30 1996-08-08 Telefonaktiebolaget Lm Ericsson Spectral subtraction noise suppression method
US5727119A (en) * 1995-03-27 1998-03-10 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
US5859788A (en) 1997-08-15 1999-01-12 The Aerospace Corporation Modulated lapped transform method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4114189A (en) * 1975-12-09 1978-09-12 The United States Of America As Represented By The Secretary Of The Navy Digital signal synthesis system
US4837828A (en) * 1982-05-12 1989-06-06 Nec Corporation Pattern feature extracting system
US5339265A (en) * 1992-08-31 1994-08-16 University Of Maryland At College Park Optimal unified architectures for the real-time computation of time-recursive discrete sinusoidal transforms
DE4337653A1 (en) 1993-11-04 1995-05-11 Deutsche Bundespost Telekom Method and circuit arrangement for improving the quality of voice communication in hands-free telephone facilities
WO1996024128A1 (en) * 1995-01-30 1996-08-08 Telefonaktiebolaget Lm Ericsson Spectral subtraction noise suppression method
US5727119A (en) * 1995-03-27 1998-03-10 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
US5859788A (en) 1997-08-15 1999-01-12 The Aerospace Corporation Modulated lapped transform method

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
A. Gilloire and M. Vetterli, "Adaptive filtering in subbands with critical sampling: analysis, experiments, and applications to acoustic echo cancellation," IEEE Trans. Signal Processing, vol. 40, pp. 1862-1875, Aug. 1992.
D. Kunz and T. Aach, Lapped Directional Transform: A New Transform for Spectral Image Analysis, Proc. IEE Int. Conf. Acoustics, Speech and Signal Processing, Phoenix, AZ 1999.
H. Malvar. "Enhancing the Performance of Subband Audio Coders for Speech Signals." Presented at the IEEE International Symposium on Circuits and Systems, pp. 1-4-Monterey, VA, Jun. 1998.
H.S. Malvar, "Biorthogonal and nonuniform lapped transforms for transform coding with reduced blocking and ringing artifacts," IEEE Trans. Signal Processing, vol. 46, pp. 1043-1053, Apr. 1998.
H.S. Malvar, Signal Processing with Lapped Transforms. Boston: Artech House, 1992, pp. vii-xi.
J.P. Princen, A.W. Johnson, and A.B. Bradley. "Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation." IEEE Intl. Conf. on Acoustic., Speech and Signal Proc., pp. 2161-2164, Dallas, 1987.
K.R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages, and Applications. New York: Academic Press, 1990, pp. v-viii.
M. Veterli, and J. Kovacevic, Wavelets and Subband Coding. Englewood Cliffs, NJ: Prentice Hall, 1995, pp. vii-xii.
Malvar, Henrique, S., "Lapped Transforms for Efficient Transform/Subband Coding," IEEE trans. Acoust., Speech, and Sig. Proc., vol. 38, No. 6, Jun., 1990, pp. 969-978.* *
P.L. De Leon and D, M. Etter, Acoustic echo cancellation using subband adaptive filtering, in Subband and Wavelet Transforms, A.N. Askansu and M.J.T. Smith, Eds. Boston, MA: Kluwer, 1996, pp. 347-367.
P.P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice Hall, 1993, pp. vii-xi.
R.W. Young and Kingsbury, "Frequency domain motion estimation using a complex lapped transform," IEEE Trans. Image Processing, vol. 2, pp. 2-17, Jan. 1993.
S. Godsill, P. Rayner, and O. Cappe, "Digital audio restoration," in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, Eds. Boston, MA: Kluwer, 1998, Chap 4, pp. 133-194.
S. Shien, "The modulated lapped transform, its time-varying forms, and applications to audio coding," IEEE Trans. Speech Audio Processing, vol. 5, pp. 359-366, Jul. 1997.
S. Wyrsch and A. Kaelin, Adaptive Feedback and Cancelling in Subbands for Hearing Aids, Proc. IEE Int. Conf. Acoustics, Speech and Signal Processing, Phoenix, AZ 1999.

Cited By (190)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7120659B2 (en) 1999-04-29 2006-10-10 Pelton Walter E Apparatus, methods, and computer program products for determining the coefficients of a function with decreased latency
US20040193663A1 (en) * 1999-04-29 2004-09-30 Pelton Walter E. Apparatus, methods, and computer program products for determining the coefficients of a function with decreased latency
US8005883B2 (en) 1999-04-29 2011-08-23 Pelton Walter E Apparatus, methods, and computer program products for determining the coefficients of a function with decreased latency
US20070260661A1 (en) * 1999-04-29 2007-11-08 Pelton Walter E Apparatus, methods, and computer program products for determining the coefficients of a function with decreased latency
US6615227B2 (en) * 1999-09-17 2003-09-02 Globespanvirata, Inc. Circuit and method for computing a fast fourier transform
US6750789B2 (en) * 2000-01-12 2004-06-15 Fraunhofer-Gesellschaft Zur Foerderung, Der Angewandten Forschung E.V. Device and method for determining a coding block raster of a decoded signal
US20030107503A1 (en) * 2000-01-12 2003-06-12 Juergen Herre Device and method for determining a coding block raster of a decoded signal
US20010051966A1 (en) * 2000-02-26 2001-12-13 Walter Pelton Apparatus, methods, and computer program products for accurately determining the coefficients of a function
US6922712B2 (en) * 2000-02-26 2005-07-26 Walter E. Pelton Apparatus, methods, and computer program products for accurately determining the coefficients of a function
US20020156928A1 (en) * 2000-04-24 2002-10-24 Goodman David A. Universal interface system
US6817017B2 (en) * 2000-04-24 2004-11-09 Leotel, Technologies, Inc. Universal interface system
US6952710B2 (en) 2000-06-09 2005-10-04 Walter Eugene Pelton Apparatus, methods and computer program products for performing high speed division calculations
US6820104B2 (en) 2000-06-09 2004-11-16 Walter Eugene Pelton Apparatus, methods, and computer program products for reducing the number of computations and number of required stored values for information processing methods
US20020038326A1 (en) * 2000-06-09 2002-03-28 Walter Pelton Apparatus, methods, and computer program products for reducing the number of computations and number of required stored values for information processing methods
US20020016808A1 (en) * 2000-06-09 2002-02-07 Walter Pelton Apparatus, methods and computer program products for performing high speed division calculations
US6654716B2 (en) * 2000-10-20 2003-11-25 Telefonaktiebolaget Lm Ericsson Perceptually improved enhancement of encoded acoustic signals
US6754623B2 (en) * 2001-01-31 2004-06-22 International Business Machines Corporation Methods and apparatus for ambient noise removal in speech recognition
US20020138263A1 (en) * 2001-01-31 2002-09-26 Ibm Corporation Methods and apparatus for ambient noise removal in speech recognition
US20040064310A1 (en) * 2001-05-07 2004-04-01 Yutaka Banba Sub-band adaptive differential pulse code modulation/encoding apparatus, sub-band adaptive differential pulse code modulation/encoding method, wireless transmission system, sub-band adaptive differential pulse code modulation/decoding apparatus, sub-band adaptive differential pulse code modulation/decoding method, and wirel
US7512539B2 (en) * 2001-06-18 2009-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and device for processing time-discrete audio sampled values
US20040220805A1 (en) * 2001-06-18 2004-11-04 Ralf Geiger Method and device for processing time-discrete audio sampled values
US7773671B2 (en) 2001-09-18 2010-08-10 Microsoft Corporation Block transform and quantization for image and video coding
US20050141609A1 (en) * 2001-09-18 2005-06-30 Microsoft Corporation Block transform and quantization for image and video coding
US20050180503A1 (en) * 2001-09-18 2005-08-18 Microsoft Corporation Block transform and quantization for image and video coding
US20050213659A1 (en) * 2001-09-18 2005-09-29 Microsoft Corporation Block transform and quantization for image and video coding
US20050175097A1 (en) * 2001-09-18 2005-08-11 Microsoft Corporation Block transform and quantization for image and video coding
US8971405B2 (en) 2001-09-18 2015-03-03 Microsoft Technology Licensing, Llc Block transform and quantization for image and video coding
US7106797B2 (en) 2001-09-18 2006-09-12 Microsoft Corporation Block transform and quantization for image and video coding
US7839928B2 (en) 2001-09-18 2010-11-23 Microsoft Corporation Block transform and quantization for image and video coding
US7881371B2 (en) 2001-09-18 2011-02-01 Microsoft Corporation Block transform and quantization for image and video coding
US20110116543A1 (en) * 2001-09-18 2011-05-19 Microsoft Corporation Block transform and quantization for image and video coding
US10403295B2 (en) 2001-11-29 2019-09-03 Dolby International Ab Methods for improving high frequency reconstruction
US20030115052A1 (en) * 2001-12-14 2003-06-19 Microsoft Corporation Adaptive window-size selection in transform coding
US7460993B2 (en) 2001-12-14 2008-12-02 Microsoft Corporation Adaptive window-size selection in transform coding
US6687672B2 (en) * 2002-03-15 2004-02-03 Matsushita Electric Industrial Co., Ltd. Methods and apparatus for blind channel estimation based upon speech correlation structure
US20030206582A1 (en) * 2002-05-02 2003-11-06 Microsoft Corporation 2-D transforms for image and video coding
US7242713B2 (en) 2002-05-02 2007-07-10 Microsoft Corporation 2-D transforms for image and video coding
WO2004006426A3 (en) * 2002-07-03 2004-04-01 Commasic Inc Flexible method and apparatus for performing digital modulation and demodulation
US7702035B2 (en) 2002-07-03 2010-04-20 Freescale Semiconductor, Inc. Searching method and apparatus for processing digital communication signals
US20040004995A1 (en) * 2002-07-03 2004-01-08 Commasic, Inc. Buffering method and apparatus for processing digital communication signals
US7596134B2 (en) 2002-07-03 2009-09-29 Freescale Semiconductor, Inc. Flexible method and apparatus for performing digital modulation and demodulation
US20040004997A1 (en) * 2002-07-03 2004-01-08 Commasic, Inc. Searching method and apparatus for processing digital communication signals
US20040047405A1 (en) * 2002-07-03 2004-03-11 Commasic, Inc. Flexible method and apparatus for performing digital modulation and demodulation
US20040071199A1 (en) * 2002-07-03 2004-04-15 Commasic, Inc. Virtual finger method and apparatus for processing digital communication signals
US7912999B2 (en) 2002-07-03 2011-03-22 Freescale Semiconductor, Inc. Buffering method and apparatus for processing digital communication signals
US20190362729A1 (en) * 2002-09-18 2019-11-28 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US10685661B2 (en) * 2002-09-18 2020-06-16 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US10157623B2 (en) * 2002-09-18 2018-12-18 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US9990929B2 (en) * 2002-09-18 2018-06-05 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US9842600B2 (en) * 2002-09-18 2017-12-12 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US11423916B2 (en) * 2002-09-18 2022-08-23 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US20180061427A1 (en) * 2002-09-18 2018-03-01 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US20170243593A1 (en) * 2002-09-18 2017-08-24 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US20180053517A1 (en) * 2002-09-18 2018-02-22 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US10418040B2 (en) * 2002-09-18 2019-09-17 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US10115405B2 (en) * 2002-09-18 2018-10-30 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US10013991B2 (en) * 2002-09-18 2018-07-03 Dolby International Ab Method for reduction of aliasing introduced by spectral envelope adjustment in real-valued filterbanks
US20070076809A1 (en) * 2002-09-25 2007-04-05 Matsushita Electric Industrial Co., Ltd. Communication apparatus
US7164724B2 (en) 2002-09-25 2007-01-16 Matsushita Electric Industrial Co., Ltd. Communication apparatus
US8189698B2 (en) 2002-09-25 2012-05-29 Panasonic Corporation Communication apparatus
US7590185B2 (en) 2002-09-25 2009-09-15 Panasonic Corporation Communication apparatus
US20090110101A1 (en) * 2002-09-25 2009-04-30 Panasonic Corporation Communication apparatus
US20040184400A1 (en) * 2002-11-25 2004-09-23 Hisao Koga Multicarrier transmitter, multicarrier receiver, and multicarrier communications apparatus
US20050024981A1 (en) * 2002-12-05 2005-02-03 Intel Corporation. Byte aligned redundancy for memory array
US20040162866A1 (en) * 2003-02-19 2004-08-19 Malvar Henrique S. System and method for producing fast modulated complex lapped transforms
US7760810B2 (en) 2003-06-18 2010-07-20 Panasonic Corporation Receiver and method for digital multi carrier transmission
US20050008086A1 (en) * 2003-06-18 2005-01-13 Matsushita Electric Industrial Co., Ltd. Receiver and method for digital multi carrier transmission
US20050031048A1 (en) * 2003-07-03 2005-02-10 Matsushita Electric Industrial Co., Ltd. Transmitter and method for digital multi-carrier transmission
US7526044B2 (en) 2003-07-03 2009-04-28 Panasonic Corporation Transmitter and method for digital multi-carrier transmission
US7471726B2 (en) 2003-07-15 2008-12-30 Microsoft Corporation Spatial-domain lapped transform in digital media compression
US20050013359A1 (en) * 2003-07-15 2005-01-20 Microsoft Corporation Spatial-domain lapped transform in digital media compression
US20050053150A1 (en) * 2003-09-07 2005-03-10 Microsoft Corporation Conditional lapped transform
US7369709B2 (en) 2003-09-07 2008-05-06 Microsoft Corporation Conditional lapped transform
US7412102B2 (en) 2003-09-07 2008-08-12 Microsoft Corporation Interlace frame lapped transform
US20050089148A1 (en) * 2003-10-24 2005-04-28 Stokes Jack W.Iii Systems and methods for echo cancellation with arbitrary playback sampling rates
US7251322B2 (en) 2003-10-24 2007-07-31 Microsoft Corporation Systems and methods for echo cancellation with arbitrary playback sampling rates
US20080249765A1 (en) * 2004-01-28 2008-10-09 Koninklijke Philips Electronic, N.V. Audio Signal Decoding Using Complex-Valued Data
WO2005073959A1 (en) * 2004-01-28 2005-08-11 Koninklijke Philips Electronics N.V. Audio signal decoding using complex-valued data
US7428309B2 (en) 2004-02-04 2008-09-23 Microsoft Corporation Analog preamplifier measurement for a microphone array
US20050169483A1 (en) * 2004-02-04 2005-08-04 Microsoft Corporation Analog preamplifier measurement for a microphone array
US7515721B2 (en) 2004-02-09 2009-04-07 Microsoft Corporation Self-descriptive microphone array
US20050175190A1 (en) * 2004-02-09 2005-08-11 Microsoft Corporation Self-descriptive microphone array
US20050175179A1 (en) * 2004-02-10 2005-08-11 Mustafa Kesal Media watermarking by biasing randomized statistics
US7539870B2 (en) * 2004-02-10 2009-05-26 Microsoft Corporation Media watermarking by biasing randomized statistics
AU2005200699B2 (en) * 2004-03-02 2009-05-14 Microsoft Technology Licensing, Llc A system and method for beamforming using a microphone array
US7415117B2 (en) * 2004-03-02 2008-08-19 Microsoft Corporation System and method for beamforming using a microphone array
US20050195988A1 (en) * 2004-03-02 2005-09-08 Microsoft Corporation System and method for beamforming using a microphone array
US7561614B2 (en) 2004-04-16 2009-07-14 Panasonic Corporation Communication apparatus and communication method using digital wavelet multi carrier transmission system
US7529308B2 (en) 2004-04-16 2009-05-05 Panasonic Corporation Communication apparatus and communication method using digital wavelet multi carrier transmission system
US20050238109A1 (en) * 2004-04-16 2005-10-27 Matsushita Electric Industrial Co., Ltd. Communication apparatus and communication method using digital wavelet multi carrier transmission system
US20050238089A1 (en) * 2004-04-16 2005-10-27 Matsushita Electric Industrial Co., Ltd. Communication apparatus and communication method using digital wavelet multi carrier transmission system
US20080273581A1 (en) * 2004-04-16 2008-11-06 Matsushita Electric Industrial Co., Ltd. Communication apparatus and communication method using digital wavelet multi carrier transmission system
US7487193B2 (en) 2004-05-14 2009-02-03 Microsoft Corporation Fast video codec transform implementations
US20050256916A1 (en) * 2004-05-14 2005-11-17 Microsoft Corporation Fast video codec transform implementations
US20060116481A1 (en) * 2004-12-01 2006-06-01 Morgan William E Castable liquid rubber compositions for golf balls
US20060133528A1 (en) * 2004-12-16 2006-06-22 Matsushita Electric Industrial Co., Ltd. Receiver and communication method for digital multi-carrier communication
US7552162B2 (en) 2004-12-16 2009-06-23 Panasonic Corporation Receiver and communication method for digital multi-carrier communication
US7428342B2 (en) 2004-12-17 2008-09-23 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US7305139B2 (en) 2004-12-17 2007-12-04 Microsoft Corporation Reversible 2-dimensional pre-/post-filtering for lapped biorthogonal transform
US7471850B2 (en) 2004-12-17 2008-12-30 Microsoft Corporation Reversible transform for lossy and lossless 2-D data compression
US20060133684A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible 2-dimensional pre-/post-filtering for lapped biorthogonal transform
US20060133682A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US20060133683A1 (en) * 2004-12-17 2006-06-22 Microsoft Corporation Reversible transform for lossy and lossless 2-D data compression
US7551789B2 (en) 2004-12-17 2009-06-23 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US20080317368A1 (en) * 2004-12-17 2008-12-25 Microsoft Corporation Reversible overlap operator for efficient lossless data compression
US20070016405A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US7546240B2 (en) * 2005-07-15 2009-06-09 Microsoft Corporation Coding with improved time resolution for selected segments via adaptive block transformation of a group of samples from a subband decomposition
US8036274B2 (en) 2005-08-12 2011-10-11 Microsoft Corporation SIMD lapped transform-based digital media encoding/decoding
US20070036225A1 (en) * 2005-08-12 2007-02-15 Microsoft Corporation SIMD lapped transform-based digital media encoding/decoding
US20070081734A1 (en) * 2005-10-07 2007-04-12 Microsoft Corporation Multimedia signal processing using fixed-point approximations of linear transforms
US7689052B2 (en) 2005-10-07 2010-03-30 Microsoft Corporation Multimedia signal processing using fixed-point approximations of linear transforms
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8867759B2 (en) 2006-01-05 2014-10-21 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
US8194880B2 (en) 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US9185487B2 (en) 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8150065B2 (en) * 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US9830899B1 (en) 2006-05-25 2017-11-28 Knowles Electronics, Llc Adaptive noise cancellation
US20070276656A1 (en) * 2006-05-25 2007-11-29 Audience, Inc. System and method for processing an audio signal
US8411731B2 (en) 2006-07-31 2013-04-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for processing a real subband signal for reducing aliasing effects
US9893694B2 (en) 2006-07-31 2018-02-13 Fraunhofer-Gesellschaft Zur Foerdung Der Angewandten Forschung E.V. Device and method for processing a real subband signal for reducing aliasing effects
WO2008014853A1 (en) * 2006-07-31 2008-02-07 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Device and method for processing a real subband signal for reducing aliasing effects
CN101501760B (en) * 2006-07-31 2012-06-27 弗劳恩霍夫应用研究促进协会 Device and method for processing a real subband signal for reducing aliasing effects
US20100013987A1 (en) * 2006-07-31 2010-01-21 Bernd Edler Device and Method for Processing a Real Subband Signal for Reducing Aliasing Effects
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US20080198935A1 (en) * 2007-02-21 2008-08-21 Microsoft Corporation Computational complexity and precision control in transform-based digital media codec
US8942289B2 (en) 2007-02-21 2015-01-27 Microsoft Corporation Computational complexity and precision control in transform-based digital media codec
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
US20080259828A1 (en) * 2007-04-23 2008-10-23 Microsoft Corporation Method and apparatus for reducing timestamp noise in audio echo cancellation
US8259928B2 (en) 2007-04-23 2012-09-04 Microsoft Corporation Method and apparatus for reducing timestamp noise in audio echo cancellation
TWI421858B (en) * 2007-05-24 2014-01-01 Audience Inc System and method for processing an audio signal
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US8886525B2 (en) 2007-07-06 2014-11-11 Audience, Inc. System and method for adaptive intelligent noise suppression
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US20090157785A1 (en) * 2007-12-13 2009-06-18 Qualcomm Incorporated Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
US8631060B2 (en) 2007-12-13 2014-01-14 Qualcomm Incorporated Fast algorithms for computation of 5-point DCT-II, DCT-IV, and DST-IV, and architectures
RU2464540C2 (en) * 2007-12-13 2012-10-20 Квэлкомм Инкорпорейтед Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
US9076456B1 (en) 2007-12-21 2015-07-07 Audience, Inc. System and method for providing voice equalization
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US8085960B2 (en) * 2008-05-21 2011-12-27 Siemens Medical Instruments Pte Ltd Filter bank system for hearing aids
US20090290736A1 (en) * 2008-05-21 2009-11-26 Daniel Alfsmann Filter bank system for hearing aids
US8369638B2 (en) 2008-05-27 2013-02-05 Microsoft Corporation Reducing DC leakage in HD photo transform
US8724916B2 (en) 2008-05-27 2014-05-13 Microsoft Corporation Reducing DC leakage in HD photo transform
US20090297054A1 (en) * 2008-05-27 2009-12-03 Microsoft Corporation Reducing dc leakage in hd photo transform
US20090299754A1 (en) * 2008-05-30 2009-12-03 Microsoft Corporation Factorization of overlapping tranforms into two block transforms
US8447591B2 (en) 2008-05-30 2013-05-21 Microsoft Corporation Factorization of overlapping tranforms into two block transforms
US9037454B2 (en) 2008-06-20 2015-05-19 Microsoft Technology Licensing, Llc Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (MCLT)
US20090319278A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Efficient coding of overcomplete representations of audio using the modulated complex lapped transform (mclt)
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8275209B2 (en) 2008-10-10 2012-09-25 Microsoft Corporation Reduced DC gain mismatch and DC leakage in overlap transform processing
US20100241423A1 (en) * 2009-03-18 2010-09-23 Stanley Wayne Jackson System and method for frequency to phase balancing for timbre-accurate low bit rate audio encoding
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US20110191111A1 (en) * 2010-01-29 2011-08-04 Polycom, Inc. Audio Packet Loss Concealment by Transform Interpolation
US8428959B2 (en) 2010-01-29 2013-04-23 Polycom, Inc. Audio packet loss concealment by transform interpolation
US9252803B2 (en) * 2010-03-11 2016-02-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window
US20130246492A1 (en) * 2010-03-11 2013-09-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Signal processor, window provider, encoded media signal, method for processing a signal and method for providing a window
TWI423046B (en) * 2010-12-30 2014-01-11 Univ Nat Cheng Kung Recursive modified discrete cosine transform and inverse discrete cosine transform system with a computing kernel of rdft
TWI412022B (en) * 2010-12-30 2013-10-11 Univ Nat Cheng Kung Recursive discrete cosine transform and inverse discrete cosine transform system
JP2013120253A (en) * 2011-12-07 2013-06-17 Nippon Telegr & Teleph Corp <Ntt> Spectrum vector storage device, spectrum vector storage method and spectrum vector storage program
US20150149160A1 (en) * 2012-06-18 2015-05-28 Goertek, Inc. Method And Device For Dereverberation Of Single-Channel Speech
US9269369B2 (en) * 2012-06-18 2016-02-23 Goertek, Inc. Method and device for dereverberation of single-channel speech
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US10096318B2 (en) 2013-10-04 2018-10-09 Nuance Communications, Inc. System and method of using neural transforms of robust audio features for speech processing
US9754587B2 (en) 2013-10-04 2017-09-05 Nuance Communications, Inc. System and method of using neural transforms of robust audio features for speech processing
US9280968B2 (en) * 2013-10-04 2016-03-08 At&T Intellectual Property I, L.P. System and method of using neural transforms of robust audio features for speech processing
US20150100312A1 (en) * 2013-10-04 2015-04-09 At&T Intellectual Property I, L.P. System and method of using neural transforms of robust audio features for speech processing
US9508359B2 (en) * 2014-06-19 2016-11-29 Yang Gao Acoustic echo preprocessing for speech enhancement
US20150371656A1 (en) * 2014-06-19 2015-12-24 Yang Gao Acoustic Echo Preprocessing for Speech Enhancement
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
US9978388B2 (en) 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
US11335354B2 (en) 2015-03-09 2022-05-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for decoding an encoded audio signal and encoder for encoding an audio signal
US10706864B2 (en) 2015-03-09 2020-07-07 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for decoding an encoded audio signal and encoder for encoding an audio signal
US10236008B2 (en) 2015-03-09 2019-03-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for decoding an encoded audio signal and encoder for encoding an audio signal
US11854559B2 (en) 2015-03-09 2023-12-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Decoder for decoding an encoded audio signal and encoder for encoding an audio signal
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
RU2719285C1 (en) * 2016-07-29 2020-04-17 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Reduced overlapping of spectra in time domain for non-uniform filter banks, which use spectral analysis with subsequent partial synthesis
US10978082B2 (en) 2016-07-29 2021-04-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Time domain aliasing reduction for non-uniform filterbanks which use spectral analysis followed by partial synthesis
CN115392325A (en) * 2022-10-26 2022-11-25 中国人民解放军国防科技大学 Multi-feature noise reduction modulation identification method based on cycleGan
CN115392325B (en) * 2022-10-26 2023-08-18 中国人民解放军国防科技大学 Multi-feature noise reduction modulation identification method based on CycleGan

Similar Documents

Publication Publication Date Title
US6496795B1 (en) Modulated complex lapped transform for integrated signal enhancement and coding
US6487574B1 (en) System and method for producing modulated complex lapped transforms
US6473409B1 (en) Adaptive filtering system and method for adaptively canceling echoes and reducing noise in digital signals
Hermansky et al. RASTA-PLP speech analysis
US6144937A (en) Noise suppression of speech by signal processing including applying a transform to time domain input sequences of digital signals representing audio information
Malvar A modulated complex lapped transform and its applications to audio processing
USRE43191E1 (en) Adaptive Weiner filtering using line spectral frequencies
Tribolet et al. Frequency domain coding of speech
US5706395A (en) Adaptive weiner filtering using a dynamic suppression factor
CA2550654C (en) Frequency extension of harmonic signals
US20080091415A1 (en) System and method for canceling acoustic echoes in audio-conference communication systems
CN100361405C (en) Scalable audio coder and decoder
CN1223166C (en) Methods and apparatus for improved sub-band adaptive filtering in echo cancellation systems
US8218780B2 (en) Methods and systems for blind dereverberation
CA2157708A1 (en) Adaptive finite impulse response filtering method and apparatus
EP1526510B1 (en) Systems and methods for echo cancellation with arbitrary playback sampling rates
US20040162866A1 (en) System and method for producing fast modulated complex lapped transforms
US6718036B1 (en) Linear predictive coding based acoustic echo cancellation
US7062039B1 (en) Methods and apparatus for improving adaptive filter performance by inclusion of inaudible information
US20020177995A1 (en) Method and arrangement for performing a fourier transformation adapted to the transfer function of human sensory organs as well as a noise reduction facility and a speech recognition facility
WO1998006090A1 (en) Speech/audio coding with non-linear spectral-amplitude transformation
WO2000051014A2 (en) Modulated complex lapped transform for integrated signal enhancement and coding
EP1395982B1 (en) Adpcm speech coding system with phase-smearing and phase-desmearing filters
Stokes et al. Acoustic echo cancellation with arbitrary playback sampling rate
Sinha et al. Low bit rate transparent audio compression using a dynamic dictionary and optimized wavelets

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MALVAR, HENRIQUE S.;REEL/FRAME:009952/0046

Effective date: 19990504

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034541/0001

Effective date: 20141014