US20110014981A1 - Tracking device with sound emitter for use in obtaining information for controlling game program execution - Google Patents

Tracking device with sound emitter for use in obtaining information for controlling game program execution Download PDF

Info

Publication number
US20110014981A1
US20110014981A1 US12/891,633 US89163310A US2011014981A1 US 20110014981 A1 US20110014981 A1 US 20110014981A1 US 89163310 A US89163310 A US 89163310A US 2011014981 A1 US2011014981 A1 US 2011014981A1
Authority
US
United States
Prior art keywords
tracking device
sound
signal
controller
listening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/891,633
Inventor
Xiadong Mao
Richard L. Marks
Gary M. Zalewski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Inc
Original Assignee
Sony Computer Entertainment Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/382,256 external-priority patent/US7803050B2/en
Application filed by Sony Computer Entertainment Inc filed Critical Sony Computer Entertainment Inc
Priority to US12/891,633 priority Critical patent/US20110014981A1/en
Publication of US20110014981A1 publication Critical patent/US20110014981A1/en
Assigned to SONY INTERACTIVE ENTERTAINMENT INC. reassignment SONY INTERACTIVE ENTERTAINMENT INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SONY COMPUTER ENTERTAINMENT INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1081Input via voice recognition
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6063Methods for processing data by generating or executing the game program for sound processing
    • A63F2300/6072Methods for processing data by generating or executing the game program for sound processing of an input signal, e.g. pitch and rhythm extraction, voice recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/23Direction finding using a sum-delay beam-former

Definitions

  • 11/382,033 entitled “SYSTEM, METHOD, AND APPARATUS FOR THREE-DIMENSIONAL INPUT CONTROL”, (Attorney Docket SCEA06INRT1), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,035, entitled “INERTIALLY TRACKABLE HAND-HELD CONTROLLER”, (Attorney Docket SCEA06INRT2), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,036, entitled “METHOD AND SYSTEM FOR APPLYING GEARING EFFECTS TO VISUAL TRACKING”, (Attorney Docket SONYP058A), filed on May 6, 2006; U.S.
  • Embodiments of the present invention are directed to audio signal processing and more particularly to processing of audio signals from microphone arrays.
  • Microphone arrays are often used to provide beam-forming for either noise reduction or echo-position, or both, by detecting the sound source direction or location.
  • a typical microphone array has two or more microphones in fixed positions relative to each other with adjacent microphones separated by a known geometry, e.g., a known distance and/or known layout of the microphones.
  • a sound originating from a source remote from the microphone array can arrive at different microphones at different times. Differences in time of arrival at different microphones in the array can be used to derive information about the direction or location of the source.
  • Conventional microphone direction detection techniques analyze the correlation between signals from different microphones to determine the direction to the location of the source. Although effective, this technique is computationally intensive and is not robust. Such drawbacks make such techniques unsuitable for use in hand-held devices and consumer electronic applications, such as video game controllers.
  • a particular pre-calibrated listening zone is selected at a runtime by applying to the plurality of filters a set of filter coefficients corresponding to the particular pre-calibrated listening zone.
  • the microphone array may detect sounds originating within the particular listening sector and filter out sounds originating outside the particular listening zone. Sounds are detected with the microphone array.
  • a particular listening zone containing a source of the sound is identified. The sound or the source of the sound is characterized and the sound is emphasized or filtered out depending on how the sound is characterized.
  • FIG. 1A is a schematic diagram of a microphone array according to an embodiment of the present invention.
  • FIG. 1B is a flow diagram illustrating a method for targeted sound detection according to an embodiment of the present invention.
  • FIG. 1C is a schematic diagram illustrating targeted sound detection according to a preferred embodiment of the present invention.
  • FIG. 1D is a flow diagram illustrating a method for targeted sound detection according to the preferred embodiment of the present invention.
  • FIG. 1E is a top plan view of a sound source location and characterization apparatus according to an embodiment of the present invention.
  • FIG. 1F is a flow diagram illustrating a method for sound source location and characterization according to an embodiment of the present invention.
  • FIG. 1G is a top plan view schematic diagram of an apparatus having a camera and a microphone array for targeted sound detection from within a field of view of the camera according to an embodiment of the present invention.
  • FIG. 1H is a front elevation view of the apparatus of FIG. 1E .
  • FIGS. 1I-1J are plan view schematic diagrams of an audio-video apparatus according to an alternative embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a microphone array and filter apparatus according to an embodiment of the present invention.
  • FIG. 3 is a flow diagram of a method for processing a signal from an array of two or more microphones according to an embodiment of the present invention.
  • FIG. 5 is a block diagram of a cell processor implementation of a signal processing system according to an embodiment of the present invention.
  • a microphone array 102 may include four microphones M 0 , M 1 , M 2 , and M 3 that are coupled to corresponding signal filters F 0 , F 1 , F 2 and F 3 .
  • Each of the filters may implement some combination of finite impulse response (FIR) filtering and time delay of arrival (TDA) filtering.
  • FIR finite impulse response
  • TDA time delay of arrival
  • the microphones M 0 , M 1 , M 2 , and M 3 may be omni-directional microphones, i.e., microphones that can detect sound from essentially any direction. Omni-directional microphones are generally simpler in construction and less expensive than microphones having a preferred listening direction.
  • the microphones M 0 , M 1 , M 2 , and M 3 produce corresponding outputs x 0 (t), x 1 (t), x 2 (t), x 3 (t). These outputs serve as inputs to the filters F 0 , F 1 , F 2 and F 3 .
  • Each filter may apply a time delay of arrival (TDA) and/or a finite impulse response (FIR) to its input.
  • TDA time delay of arrival
  • FIR finite impulse response
  • the outputs of the filters may be combined into a filtered output y(t).
  • Each signal x m generally includes subcomponents due to different sources of sounds. The subscript m ranges from 0 to 3 in this example and is used to distinguish among the different microphones in the array.
  • the filters F 0 , F 1 , F 2 and F 3 are pre-calibrated with filter parameters (e.g., FIR filter coefficients and/or TDA values) that define one or more pre-calibrated listening zones Z.
  • filter parameters e.g., FIR filter coefficients and/or TDA values
  • the parameters are chosen such that sounds originating from a source 104 located within the listening zone Z are detected while sounds originating from a source 106 located outside the listening zone Z are filtered out, i.e., substantially attenuated.
  • the listening zone Z is depicted as being a more or less wedge-shaped sector having an origin located at or proximate the center of the microphone array 102 .
  • the listening zone Z may be a discrete volume, e.g., a rectangular, spherical, conical or arbitrarily-shaped volume in space. Wedge-shaped listening zones can be robustly established using a linear array of microphones.
  • Robust listening zones defined by arbitrarily-shaped volumes may be established using a planar array or an array of at least four microphones where in at least one microphone lies in a different plane from the others. Such an array is referred to herein as a “concave” microphone array.
  • a method 110 for targeted voice detection using the microphone array 102 may proceed as follows. As indicated at 112 , one or more sets of the filter coefficients for the filters F 0 , F 1 , F 2 and F 3 are determined corresponding to one or more pre-calibrated listening zones Z. Each set of filter coefficients is selected to detect portions of the input signals corresponding to sounds originating within a given listening sector and filters out sounds originating outside the given listening sector. To pre-calibrate the listening sectors S one or more known calibration sound sources may be placed at several different known locations within and outside the sector S.
  • the calibration source(s) may emit sounds characterized by known spectral distributions similar to sounds the microphone array 102 is likely to encounter at runtime. The known locations and spectral characteristics of the sources may then be used to select the values of the filter parameters for the filters F 0 , F 1 , F 2 and F 3
  • Blind Source Separation may be used to pre-calibrate the filters F 0 , F 1 , F 2 and F 3 to define the listening zones Z.
  • Blind source separation separates a set of signals into a set of other signals, such that the regularity of each resulting signal is maximized, and the regularity between the signals is minimized (i.e., statistical independence is maximized or decorrelation is minimized).
  • the blind source separation may involve an independent component analysis (ICA) that is based on second-order statistics.
  • ICA independent component analysis
  • [ x m ⁇ ⁇ 1 ⁇ x mn ] [ a m ⁇ ⁇ 11 ... a m ⁇ ⁇ 1 ⁇ n ⁇ ... ⁇ a mn ⁇ ⁇ 1 ... a mnn ] ⁇ [ s 1 ⁇ s n ]
  • Cal_Cov(j,k) E((X′ jk ) T *X′ jk ), where E refers to the operation of determining the expectation value and (X′ jk ) T is the transpose of the vector X′ jk .
  • the vector X′ jk is a M+1 dimensional vector representing the Fourier transform of calibration signals for the j th frame and the k th frequency bin.
  • Each calibration covariance matrix Cal_Cov(j,k) may be decomposed by means of “Principal Component Analysis”(PCA) and its corresponding eigenmatrix C may be generated.
  • PCA Principal Component Analysis
  • the inverse C ⁇ 1 of the eigenmatrix C may thus be regarded as a “listening direction” that essentially contains the most information to de-correlate the covariance matrix, and is saved as a calibration result.
  • the term “eigenmatrix” of the calibration covariance matrix Cal_Cov(j,k) refers to a matrix having columns (or rows) that are the eigenvectors of the covariance matrix.
  • this inverse eigenmatrix C ⁇ 1 may be used to de-correlate the mixing matrix A by a simple linear transformation. After de-correlation, A is well approximated by its diagonal principal vector, thus the computation of the unmixing matrix (i.e., A ⁇ 1 ) is reduced to computing a linear vector inverse of:
  • a 1 is the new transformed mixing matrix in independent component analysis (ICA).
  • ICA independent component analysis
  • the process may be refined by repeating the above procedure with the user standing at different locations within the listening zone Z.
  • microphone-array noise reduction it is preferred for the user to move around inside the listening sector during calibration so that the beamforming has a certain tolerance (essentially forming a listening cone area) that provides a user some flexible moving space while talking.
  • voice/sound detection need not be calibrated for the entire cone area of the listening sector S. Instead the listening sector is preferably calibrated for a very narrow beam B along the center of the listening zone Z, so that the final sector determination based on noise suppression ratio becomes more robust.
  • the process may be repeated for one or more additional listening zones.
  • Recalibration in runtime may follow the preceding steps.
  • the default calibration in manufacture takes a very large amount of recording data (e.g., tens of hours of clean voices from hundreds of persons) to ensure an unbiased, person-independent statistical estimation.
  • the recalibration at runtime requires small amount of recording data from a particular person, the resulting estimation of C ⁇ 1 is thus biased and person-dependant.
  • PCA principal component analysis
  • SBSS semi-blind source separation
  • Embodiments of the present invention may also make use of anti-causal filtering.
  • anti-causal filtering consider a situation in which one microphone, e.g., M 0 is chosen as a reference microphone for the microphone array 102 .
  • M 0 the reference microphone for the microphone array 102 .
  • signals from the source 104 must arrive at the reference microphone M 0 first.
  • M 0 cannot be used as a reference microphone.
  • the signal will arrive first at the microphone closest to the source 104 .
  • Embodiments of the present invention adjust for variations in the position of the source 104 by switching the reference microphone among the microphones M 0 , M 1 , M 2 , M 3 in the array 102 so that the reference microphone always receives the signal first.
  • this anti-causality may be accomplished by artificially delaying the signals received at all the microphones in the array except for the reference microphone while minimizing the length of the delay filter used to accomplish this.
  • the fractional delay ⁇ t m may be adjusted based on a change in the signal to noise ratio (SNR) of the system output y(t).
  • SNR signal to noise ratio
  • the delay is chosen in a way that maximizes SNR.
  • Appropriate configuration of the filters F 0 , F 1 , F 2 and F 3 and the delays ⁇ t 0 , ⁇ t 0 , ⁇ t 0 , and ⁇ t 0 may be used to establish the pre-calibrated listening sector S.
  • a particular pre-calibrated listening zone Z may be selected at a runtime by applying to the filters F 0 , F 1 , F 2 and F 3 a set of filter parameters corresponding to the particular pre-calibrated listening zone Z.
  • the microphone array may detect sounds originating within the particular listening sector and filter out sounds originating outside the particular listening sector.
  • a single listening sector is shown in FIG. 1A , embodiments of the present invention may be extended to situations in which a plurality of different listening sectors are pre-calibrated.
  • the microphone array 102 can then track between two or more pre-calibrated sectors at runtime to determine in which sector a sound source resides.
  • the space surrounding the microphone array 102 may be divided into multiple listening zones in the form of eighteen different pre-calibrated 20 degree wedge-shaped listening sectors S 0 . . . S 17 that encompass about 360 degrees surrounding the microphone array 102 by repeating the calibration procedure outlined above each of the different sectors and associating a different set of FIR filter coefficients and TDA values with each different sector.
  • an appropriate set of pre-determined filter settings e.g., FIR filter coefficients and/or TDA values determined during calibration as described above
  • any of the listening sectors S 0 . . . S 17 may be selected.
  • the microphone array 102 can switch from one sector to another to track a sound source 104 from one sector to another. For example, referring again to FIG. 1C , consider a situation where the sound source 104 is located in sector S 7 and the filters F 0 , F 1 , F 2 , F 3 are set to select sector S 4 . Since the filters are set to filter out sounds coming from outside sector S 4 the input energy E of sounds from the sound source 104 will be attenuated.
  • the input energy E may be defined as a dot product:
  • x m T (t) is the transpose of the vector x m (t), which represents microphone output x m (t). And the sum is an average taken over all M microphones in the array.
  • the attenuation of the input energy E may be determined from the ratio of the input energy E to the filter output energy, i.e.:
  • Attenuation 1 / M ⁇ ⁇ m ⁇ x m T ⁇ ( t ) ⁇ x m ⁇ ( t ) y T ⁇ ( t ) ⁇ y ⁇ ( t ) .
  • the filters are set to select the sector containing the sound source 104 the attenuation is approximately equal to 1.
  • the sound source 104 may be tracked by switching the settings of the filters F 0 , F 1 , F 2 , F 3 from one sector setting to another and determining the attenuation for different sectors.
  • a targeted voice detection 120 method using determination of attenuation for different listening sectors may proceed as depicted in the flow diagram of FIG. 1D .
  • any pre-calibrated listening sector may be selected initially. For example, sector S 4 , which corresponds roughly to a forward listening direction, may be selected as a default initial listening sector.
  • an input signal energy attenuation is determined for the initial listen sector. If, at 126 the attenuation is not an optimum value another pre-calibrated sector may be selected at 128 . If, at 126 the attenuation is an optimum value, the tracking is stopped at 129 .
  • the mounting of the microphone array may introduce a built-in attenuation of sounds coming from these sectors such that there is a minimum attenuation, e.g., of about 1 dB, when the source 104 is located in any of these sectors. Consequently it may be determined from the input signal attenuation whether the source 104 is “in front” or “behind” the microphone array 102 .
  • the sound source 104 might be expected to be closer to the microphone having the larger input signal energy.
  • the right hand microphone M 3 would have the larger input signal energy and, by process of elimination, the sound source 104 would be in one of sectors S 6 , S 7 , S 8 , S 9 , S 10 , S 11 , S 12 .
  • the next sector selected is one that is approximately 90 degrees away from the initial sector S 4 in a direction toward the right hand microphone M 3 , e.g., sector S 8 .
  • the input signal energy attenuation for sector S 8 may be determined as indicated at 124 .
  • next sector may be one that is approximately 45 degrees away from the previous sector in the direction back toward the initial sector, e.g., sector S 6 .
  • the input signal energy attenuation may be determined and compared to the optimum attenuation. If the input signal energy is not close to the optimum only two sectors remain in this example. Thus, for the example depicted in FIG. 1C , in a maximum of four sector switches, the correct sector may be determined The process of determining the input signal energy attenuation and switching between different listening sectors may be accomplished in about 100 milliseconds if the input signal is sufficiently strong.
  • FIG. 1E depicts an example of a sound source location and characterization apparatus 130 having a microphone array 102 described above coupled to an electronic device 132 having a processor 134 and memory 136 .
  • the device may be a video game, television or other consumer electronic device.
  • the processor 134 may execute instructions that implement the FIR filters and time delays described above.
  • the memory 136 may contain data 138 relating to pre-calibration of a plurality of listening zones.
  • the pre-calibrated listening zones may include wedge shaped listening sectors S 0 , S 1 , S 2 , S 3 , S 4 , S 5 , S 6 , S 7 , S 8 .
  • the instructions run by the processor 134 may operate the apparatus 130 according to a method as set forth in the flow diagram 131 of FIG. 1F .
  • Sound sources 104 , 105 within the listening zones can be detected using the microphone array 102 .
  • One sound source 104 may be of interest to the device 132 or a user of the device.
  • Another sound source 105 may be a source of background noise or otherwise not of interest to the device 132 or its user.
  • the apparatus 130 determines which listening zone contains the sound's source 104 as indicated at 133 of FIG. 1F .
  • the iterative sound source sector location routine described above with respect to FIGS. 1C-1D may be used to determine the pre-calibrated listening zones containing the sound sources 104 , 105 (e.g., sectors S 3 and S 6 respectively).
  • the microphone array may be refocused on the sound source, e.g., using adaptive beam forming
  • adaptive beam forming techniques is described, e.g., in US Patent Application Publication Number 2005/0047611 A1. to Xiadong Mao, which is incorporated herein by reference.
  • the sound source 104 may then be characterized as indicated at 135 , e.g., through analysis of an acoustic spectrum of the sound signals originating from the sound source. Specifically, a time domain signal from the sound source may be analyzed over a predetermined time window and a fast Fourier transform (FFT) may be performed to obtain a frequency distribution characteristic of the sound source.
  • FFT fast Fourier transform
  • the detected frequency distribution may be compared to a known acoustic model.
  • the known acoustic model may be a frequency distribution generated from training data obtained from a known source of sound.
  • a number of different acoustic models may be stored as part of the data 138 in the memory 136 or other storage medium and compared to the detected frequency distribution. By comparing the detected sounds from the sources 104 , 105 against these acoustic models a number of different possible sound sources may be identified.
  • the apparatus 132 may take appropriate action depending upon whether the sound source is of interest or not. For example, if the sound source 104 is determined to be one of interest to the device 132 , the apparatus may emphasize or amplify sounds coming from sector S 3 and/or take other appropriate action. For example, if the device 132 is a video game controller and the source 104 is a video game player, the device 132 may execute game instructions such as “jump” or “swing” in response to sounds from the source 104 that are interpreted as game commands. Similarly, if the sound source 105 is determined not to be of interest to the device 132 or its user, the device may filter out sounds coming from sector S 6 or take other appropriate action. In some embodiments, for example, an icon may appear on a display screen indicating the listening zone containing the sound source and the type of sound source.
  • amplifying sound or taking other appropriate action may include reducing noise disturbances associated with a source of sound.
  • a noise disturbance of an audio signal associated with sound source 104 may be magnified relative to a remaining component of the audio signal.
  • a sampling rate of the audio signal may be decreased and an even order derivative is applied to the audio signal having the decreased sampling rate to define a detection signal.
  • the noise disturbance of the audio signal may be adjusted according to a statistical average of the detection signal.
  • a system capable of canceling disturbances associated with an audio signal, a video game controller, and an integrated circuit for reducing noise disturbances associated with an audio signal are included. Details of a such a technique are described, e.g., in commonly-assigned U.S. patent application Ser.
  • the apparatus 130 may be used in a baby monitoring application.
  • an acoustic model stored in the memory 136 may include a frequency distribution characteristic of a baby or even of a particular baby. Such a sound may be identified as being of interest to the device 130 or its user. Frequency distributions for other known sound sources, e.g., a telephone, television, radio, computer, persons talking, etc., may also be stored in the memory 136 . These sound sources may be identified as not being of interest.
  • Sound source location and characterization apparatus and methods may be used in ultrasonic- and sonic-based consumer electronic remote controls, e.g., as described in commonly assigned U.S. patent application Ser. No. 11/418,993 to Steven Osman, entitled “SYSTEM AND METHOD FOR CONTROL BY AUDIBLE DEVICE” (attorney docket no. SCEAJP 3.0-001), the entire disclosures of which are incorporated herein by reference.
  • a sound received by the microphone array may 102 be analyzed to determine whether or not it has one or more predetermined characteristics. If it is determined that the sound does have one or more predetermined characteristics, at least one control signal may be generated for the purpose of controlling at least one aspect of the device 132 .
  • the pre-calibrated listening zone Z may correspond to the field-of-view of a camera.
  • an audio-video apparatus 140 may include a microphone array 102 and signal filters F 0 , F 1 , F 2 , F 3 , e.g., as described above, and an image capture unit 142 .
  • the image capture unit 142 may be a digital camera.
  • An example of a suitable digital camera is a color digital camera sold under the name “EyeToy” by Logitech of Fremont, Calif.
  • the image capture unit 142 may be mounted in a fixed position relative to the microphone array 102 , e.g., by attaching the microphone array 102 to the image capture unit 142 or vice versa. Alternatively, both the microphone array 102 and image capture unit 142 may be attached to a common frame or mount (not shown). Preferably, the image capture unit 142 is oriented such that an optical axis 144 of its lens system 146 is aligned parallel to an axis perpendicular to a common plane of the microphones M 0 , M 1 , M 2 , M 3 of the microphone array 102 .
  • the lens system 146 may be characterized by a volume of focus FOV that is sometimes referred to as the field of view of the image capture unit.
  • the listening zone Z may be said to “correspond” to the field of view FOV if there is a significant overlap between the field of view FOV and the listening zone Z.
  • there is “significant overlap” if an object within the field of view FOV is also within the listening zone Z and an object outside the field of view FOV is also outside the listening zone Z. It is noted that the foregoing definitions of the terms “correspond” and “significant overlap” within the context of the embodiment depicted in FIGS. 1G-1H allow for the possibility that an object may be within the listening zone Z and outside the field of view FOV.
  • the listening zone Z may be pre-calibrated as described above, e.g., by adjusting FIR filter coefficients and TDA values for the filters F 0 , F 1 , F 2 , F 3 using one or more known sources placed at various locations within the field of view FOV during the calibration stage.
  • the FIR filter coefficients and TDA values are selected (e.g., using ICA) such that sounds from a source 104 located within the FOV are detected and sounds from a source 106 outside the FOV are filtered out.
  • the apparatus 140 allows for improved processing of video and audio images.
  • sounds originating from sources within the FOV may be enhanced while those originating outside the FOV may be attenuated.
  • Applications for such an apparatus include audio-video (AV) chat.
  • AV audio-video
  • FIGS. 1I-1J depict an apparatus 150 having a microphone array 102 and an image capture unit 152 (e.g., a digital camera) that is mounted to one or more pointing actuators 154 (e.g., servo-motors).
  • the microphone array 102 , image capture unit 152 and actuators may be coupled to a controller 156 having a processor 157 and memory 158 .
  • Software data 155 stored in the memory 158 and instructions 159 stored in the memory 158 and executed by the processor 157 may implement the signal filter functions described above.
  • the actuators 154 may point the image capture unit 152 in a direction of the particular pre-calibrated listening zone containing the sound source 104 as shown in FIG. 1J .
  • the microphone array 102 may remain in a fixed position while the pointing actuators point the camera in the direction of a selected listening zone.
  • FIG. 2 depicts a system 200 having microphone array 102 of M+1 microphones M 0 , Each microphone is connected to one of M+1 corresponding filters 202 0 , 204 _, 202 m .
  • Each of the filters 202 0 , 202 1 , . . . , 202 m includes a corresponding set of N+1 filter taps 204 00 , . . .
  • the delays and filter taps may be implemented in hardware or software or a combination of both hardware and software.
  • An output y m (t) from a given filter tap 204 mi is just the convolution of the input signal to filter tap 204 mi with the corresponding finite impulse response coefficient b mi . It is noted that for all filter taps 204 mi except for the first one 204 mo the input to the filter tap is just the output of the delay section z ⁇ 1 of the preceding filter tap 204 mi-1 .
  • the symbol “*” represents the convolution operation. Convolution between two discrete time functions f(t) and g(t) is defined as
  • the general problem in audio signal processing is to select the values of the finite impulse response filter coefficients b m0 , b m1 , . . . , b mN that best separate out different sources of sound from the signal y m (t).
  • each delay z ⁇ 1 is necessarily an integer delay and the size of the delay is inversely related to the maximum frequency of the microphone. This ordinarily limits the resolution of the system 200 . A higher than normal resolution may be obtained if it is possible to introduce a fractional time delay A into the signal y m (t) so that:
  • y m ( t + ⁇ ) x m ( t + ⁇ )* b m0 +x m ( t ⁇ 1+ ⁇ )* b m1 +x m ( t ⁇ 2+ ⁇ )* b m2 + . . . +x m ( t ⁇ N + ⁇ ) b mN ,
  • y m ⁇ ( t ) [ x m ⁇ ( t ) x m ⁇ ( t - 1 ) ⁇ x m ⁇ ( t - J ) ] T * [ b m ⁇ ⁇ 00 b m 01 ⁇ b m ⁇ ⁇ 0 ⁇ j ] + [ x m ⁇ ( t - 1 ) x m ⁇ ( t - 2 ) ⁇ x m ⁇ ( t - J - 1 ) ] T * [ b m ⁇ ⁇ 10 b m ⁇ ⁇ 11 ⁇ b m ⁇ ⁇ 1 ⁇ J ] + ⁇ ... + [ x m ⁇ ( t - N - J ) x m ⁇ ( t - N - J + 1 ) ⁇ x m ⁇ ( t - N ) ] T * [ b mN ⁇ ⁇ 0
  • the expected statistical precision of the fractional value ⁇ is inversely proportional to J+1, which is the number of “rows” in the immediately preceding expression for y m (t).
  • FT( ) represents th taking the Fourier transform of the quantity in parentheses.
  • X 01 FT([ x 1 ( t ⁇ 0), x 1 ( t ⁇ 1), x 1 ( t ⁇ 2), . . . x 1 ( t ⁇ N ⁇ 1+0)])
  • X 20 FT([ x 2 ( t ⁇ 0), x 2 ( t ⁇ 1), x 2 ( t ⁇ 2), . . . x 2 ( t ⁇ N ⁇ 1+0)])
  • X 30 FT([ x 3 ( t ⁇ 0), x 3 ( t ⁇ 1), x 3 ( t ⁇ 2), . . . x 3 ( t ⁇ N ⁇ 1+0)])
  • X 31 FT([ x 3 ( t ⁇ 1), x 3 ( t ⁇ 2), x 3 ( t ⁇ 3), . . . x 3 ( t ⁇ N ⁇ 1+1)])
  • X 39 FT([ x 3 ( t ⁇ 9), x 3 ( t ⁇ 10) x 3 ( t ⁇ 2), . . . x 3 ( t ⁇ N ⁇ 1+10)])
  • 10 frames may be used to construct a fractional delay.
  • X jk [X 0j ( k ), hd 1 j ( k ), X 2j ( k ), X 3j ( k )]
  • ICA independent component analysis
  • b jk [b 0j ( k ), b 1j ( k ), b 2j ( k ), b 3j ( k )].
  • the independent frequency-domain components of the individual sound sources making up each vector X jk may be determined from:
  • each S(j,k) T is a 1 ⁇ 4 vector containing the independent frequency-domain components of the original input signal x(t).
  • the unmixing matrix A becomes a vector A 1 , since it is has already been decorrelated by the inverse eigenmatrix C ⁇ 1 which is the result of the prior calibration described above.
  • Multiplying the run-time covariance matrix Cov(j,k) with the pre-calibrated inverse eigenmatrix C ⁇ 1 essentially picks up the diagonal elements of A and makes them into a vector A 1 .
  • Each element of A 1 is the strongest -cross-correlation, the inverse of A will essentially remove this correlation.
  • Y i [ X i ⁇ ⁇ 0 X i ⁇ ⁇ 1 ... X iJ ] ⁇ [ b i ⁇ ⁇ 0 b i ⁇ ⁇ 1 ⁇ b iJ ]
  • FIG. 3 depicts a flow diagram of a signal processing method 300 that utilizes the concepts described above with respect to FIG. 2 .
  • a discrete time domain input signal x m (t) may be produced from microphones M 0 . . . M M as indicated at 302 .
  • a listening direction may be determined for the microphone array as indicated at 304 , e.g., by computing an inverse eigenmatrix C ⁇ 1 for a calibration covariance matrix as described above.
  • the listening direction e.g., one or more listening sectors
  • the listening direction may be determined during calibration of the microphone array during design or manufacture or may be re-calibrated at runtime.
  • a signal from a source located within a defined listening sector with respect to the microphone array may be recorded for a predetermined period of time.
  • Analysis frames of the signal may be formed at predetermined intervals and the analysis frames may be transformed into the frequency domain.
  • a calibration covariance matrix may be estimated from a vector of the analysis frames that have been transformed into the frequency domain.
  • An eigenmatrix C of the calibration covariance matrix may be computed and an inverse of the eigenmatrix provides the listening direction.
  • one or more fractional delays may optionally be applied to selected input signals x m (t) other than an input signal x 0 (t) from a reference microphone M 0 .
  • Each fractional delay is selected to optimize a signal to noise ratio of a discrete time domain output signal y(t) from the microphone array.
  • the fractional delays are selected to such that a signal from the reference microphone M 0 is first in time relative to signals from the other microphone(s) of the array.
  • the listening direction (e.g., the inverse eigenmatrix C ⁇ 1 ) determined at 304 is used in a semi-blind source separation to select the finite impulse response filter coefficients b 0 , b 1 . . . , b N to separate out different sound sources from input signal x m (t).
  • filter coefficients for each microphone m, each frame j and each frequency bin k, [b 0j (k), b 1j (k), . . . b Mj (k)] may be computed that best separate out two or more sources of sound from the input signals x m (t).
  • a runtime covariance matrix may be generated from each frequency domain input signal vector X jk .
  • the runtime covariance matrix may be multiplied by the inverse C ⁇ 1 of the eigenmatrix C to produce a mixing matrix A and a mixing vector may be obtained from a diagonal of the mixing matrix A.
  • the values of filter coefficients may be determined from one or more components of the mixing vector.
  • a signal processing method of the type described above with respect to FIGS. 1A-1J , 2 and 3 operating as described above may be implemented as part of a signal processing apparatus 400 , as depicted in FIG. 4 .
  • the apparatus 400 may include a processor 401 and a memory 402 (e.g., RAM, DRAM, ROM, and the like).
  • the signal processing apparatus 400 may have multiple processors 401 if parallel processing is to be implemented.
  • the memory 402 includes data and code configured as described above.
  • the memory 402 may include signal data 406 which may include a digital representation of the input signals x m (t), and code and/or data implementing the filters 202 0 . . .
  • the memory 402 may also contain calibration data 408 , e.g., data representing one or more inverse eigenmatrices C ⁇ 1 for one or more corresponding pre-calibrated listening zones obtained from calibration of a microphone array 422 as described above.
  • calibration data 408 e.g., data representing one or more inverse eigenmatrices C ⁇ 1 for one or more corresponding pre-calibrated listening zones obtained from calibration of a microphone array 422 as described above.
  • the memory 402 may contain eignematrices for eighteen 20 degree sectors that encompass a microphone array 422 .
  • the apparatus 400 may also include well-known support functions 410 , such as input/output (I/O) elements 411 , power supplies (P/S) 412 , a clock (CLK) 413 and cache 414 .
  • the apparatus 400 may optionally include a mass storage device 415 such as a disk drive, CD-ROM drive, tape drive, or the like to store programs and/or data.
  • the controller may also optionally include a display unit 416 and user interface unit 418 to facilitate interaction between the controller 400 and a user.
  • the display unit 416 may be in the form of a cathode ray tube (CRT) or flat panel screen that displays text, numerals, graphical symbols or images.
  • the user interface 418 may include a keyboard, mouse, joystick, light pen or other device.
  • the user interface 418 may include a microphone, video camera or other signal transducing device to provide for direct capture of a signal to be analyzed.
  • the processor 401 , memory 402 and other components of the system 400 may exchange signals (e.g., code instructions and data) with each other via a system bus 420 as shown in FIG. 4 .
  • the microphone array 422 may be coupled to the apparatus 400 through the I/O functions 411 .
  • the microphone array may include between about 2 and about 8 microphones, preferably about 4 microphones with neighboring microphones separated by a distance of less than about 4 centimeters, preferably between about 1 centimeter and about 2 centimeters.
  • the microphones in the array 422 are omni-directional microphones.
  • An optional image capture unit 423 e.g., a digital camera
  • One or more pointing actuators 425 that are mechanically coupled to the camera may exchange signals with the processor 401 via the I/O functions 411 .
  • I/O generally refers to any program, operation or device that transfers data to or from the system 400 and to or from a peripheral device. Every data transfer may be regarded as an output from one device and an input into another.
  • Peripheral devices include input-only devices, such as keyboards and mouses, output-only devices, such as printers as well as devices such as a writable CD-ROM that can act as both an input and an output device.
  • peripheral device includes external devices, such as a mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive or scanner as well as internal devices, such as a CD-ROM drive, CD-R drive or internal modem or other peripheral such as a flash memory reader/writer, hard drive.
  • the apparatus 400 may be a video game unit, which may include a game controller 430 coupled to the processor via the I/O functions 411 either through wires (e.g., a USB cable) or wirelessly.
  • the game controller 430 may be mountable to a user's body.
  • the game controller 430 may have analog joystick controls 431 and conventional buttons 433 that provide control signals commonly used during playing of video games.
  • Such video games may be implemented as processor readable data and/or instructions which may be stored in the memory 402 or other processor readable medium such as one associated with the mass storage device 415 .
  • the joystick controls 431 may generally be configured so that moving a control stick left or right signals movement along the X axis, and moving it forward (up) or back (down) signals movement along the Y axis. In joysticks that are configured for three-dimensional movement, twisting the stick left (counter-clockwise) or right (clockwise) may signal movement along the Z axis.
  • X Y and Z are often referred to as roll, pitch, and yaw, respectively, particularly in relation to an aircraft.
  • the game controller 430 may include a communications interface operable to conduct digital communications with at least one of the processor 402 , a game controller 430 or both.
  • the communications interface may include a universal asynchronous receiver transmitter (“UART”).
  • UART universal asynchronous receiver transmitter
  • the UART may be operable to receive a control signal for controlling an operation of a tracking device, or for transmitting a signal from the tracking device for communication with another device.
  • the communications interface includes a universal serial bus (“USB”) controller.
  • USB controller may be operable to receive a control signal for controlling an operation of the tracking device, or for transmitting a signal from the tracking device for communication with another device.
  • the game controller 430 may include one or more inertial sensors 432 , which may provide position and/or orientation information to the processor 401 via an inertial signal. Orientation information may include angular information such as a tilt, roll or yaw of the game controller 430 .
  • the inertial sensors 432 may include any number and/or combination of accelerometers, gyroscopes or tilt sensors.
  • the inertial sensors 432 include tilt sensors adapted to sense orientation of the game controller with respect to tilt and roll axes, a first accelerometer adapted to sense acceleration along a yaw axis and a second accelerometer adapted to sense angular acceleration with respect to the yaw axis.
  • An accelerometer may be implemented, e.g., as a MEMS device including a mass mounted by one or more springs with sensors for sensing displacement of the mass relative to one or more directions. Signals from the sensors that are dependent on the displacement of the mass may be used to determine an acceleration of the game controller 430 .
  • Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401 .
  • an accelerometer suitable as the inertial sensor 432 may be a simple mass elastically coupled at three or four points to a frame, e.g., by springs.
  • Pitch and roll axes lie in a plane that intersects the frame, which is mounted to the game controller 430 .
  • the mass will displace under the influence of gravity and the springs will elongate or compress in a way that depends on the angle of pitch and/or roll.
  • the displacement and of the mass can be sensed and converted to a signal that is dependent on the amount of pitch and/or roll.
  • Angular acceleration about the yaw axis or linear acceleration along the yaw axis may also produce characteristic patterns of compression and/or elongation of the springs or motion of the mass that can be sensed and converted to signals that are dependent on the amount of angular or linear acceleration.
  • Such an accelerometer device can measure tilt, roll angular acceleration about the yaw axis and linear acceleration along the yaw axis by tracking movement of the mass or compression and expansion forces of the springs.
  • resistive strain gauge material including resistive strain gauge material, photonic sensors, magnetic sensors, hall-effect devices, piezoelectric devices, capacitive sensors, and the like.
  • the game controller 430 may include one or more light sources 434 , such as light emitting diodes (LEDs).
  • the light sources 434 may be used to distinguish one controller from the other.
  • one or more LEDs can accomplish this by flashing or holding an LED pattern code.
  • 5 LEDs can be provided on the game controller 430 in a linear or two-dimensional pattern.
  • the LEDs may alternatively, be arranged in a rectangular pattern or an arcuate pattern to facilitate determination of an image plane of the LED array when analyzing an image of the LED pattern obtained by the image capture unit 423 .
  • the LED pattern codes may also be used to determine the positioning of the game controller 430 during game play.
  • the LEDs can assist in identifying tilt, yaw and roll of the controllers. This detection pattern can assist in providing a better user/feel in games, such as aircraft flying games, etc.
  • the image capture unit 423 may capture images containing the game controller 430 and light sources 434 . Analysis of such images can determine the location and/or orientation of the game controller. Such analysis may be implemented by program code instructions 404 stored in the memory 402 and executed by the processor 401 . To facilitate capture of images of the light sources 434 by the image capture unit 423 , the light sources 434 may be placed on two or more different sides of the game controller 430 , e.g., on the front and on the back (as shown in phantom). Such placement allows the image capture unit 423 to obtain images of the light sources 434 for different orientations of the game controller 430 depending on how the game controller 430 is held by a user.
  • the light sources 434 may provide telemetry signals to the processor 401 , e.g., in pulse code, amplitude modulation or frequency modulation format. Such telemetry signals may indicate which joystick buttons are being pressed and/or how hard such buttons are being pressed. Telemetry signals may be encoded into the optical signal, e.g., by pulse coding, pulse width modulation, frequency modulation or light intensity (amplitude) modulation. The processor 401 may decode the telemetry signal from the optical signal and execute a game command in response to the decoded telemetry signal. Telemetry signals may be decoded from analysis of images of the game controller 430 obtained by the image capture unit 423 .
  • the apparatus 401 may include a separate optical sensor dedicated to receiving telemetry signals from the lights sources 434 .
  • a separate optical sensor dedicated to receiving telemetry signals from the lights sources 434 .
  • the use of LEDs in conjunction with determining an intensity amount in interfacing with a computer program is described, e.g., in commonly-assigned U.S. patent application Ser. No. 11/429,414, to Richard L. Marks et al., entitled “USE OF COMPUTER IMAGE AND AUDIO PROCESSING IN DETERMINING AN INTENSITY AMOUNT WHEN INTERFACING WITH A COMPUTER PROGRAM” (Attorney Docket No. SONYP052), which is incorporated herein by reference in its entirety.
  • analysis of images containing the light sources 434 may be used for both telemetry and determining the position and/or orientation of the game controller 430 .
  • Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401 .
  • the processor 401 may use the inertial signals from the inertial sensor 432 in conjunction with optical signals from light sources 434 detected by the image capture unit 423 and/or sound source location and characterization information from acoustic signals detected by the microphone array 422 to deduce information on the location and/or orientation of the game controller 430 and/or its user.
  • “acoustic radar” sound source location and characterization may be used in conjunction with the microphone array 422 to track a moving voice while motion of the game controller is independently tracked (through the inertial sensor 432 and or light sources 434 ).
  • Any number of different combinations of different modes of providing control signals to the processor 401 may be used in conjunction with embodiments of the present invention.
  • Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401 .
  • Signals from the inertial sensor 432 may provide part of a tracking information input and signals generated from the image capture unit 423 from tracking the one or more light sources 434 may provide another part of the tracking information input.
  • such “mixed mode” signals may be used in a football type video game in which a Quarterback pitches the ball to the right after a head fake head movement to the left.
  • a game player holding the controller 430 may turn his head to the left and make a sound while making a pitch movement swinging the controller out to the right like it was the football.
  • the microphone array 420 in conjunction with “acoustic radar” program code can track the user's voice.
  • the image capture unit 423 can track the motion of the user's head or track other commands that do not require sound or use of the controller.
  • the sensor 432 may track the motion of the game controller (representing the football).
  • the image capture unit 423 may also track the light sources 434 on the controller 430 .
  • the user may release of the “ball” upon reaching a certain amount and/or direction of acceleration of the game controller 430 or upon a key command triggered by pressing a button on the game controller 430 .
  • an inertial signal e.g., from an accelerometer or gyroscope may be used to determine a location of the game controller 430 .
  • an acceleration signal from an accelerometer may be integrated once with respect to time to determine a change in velocity and the velocity may be integrated with respect to time to determine a change in position. If values of the initial position and velocity at some time are known then the absolute position may be determined using these values and the changes in velocity and position.
  • the inertial sensor 432 may be subject to a type of error known as “drift” in which errors that accumulate over time can lead to a discrepancy D between the position of the joystick 430 calculated from the inertial signal (shown in phantom) and the actual position of the game controller 430 .
  • drift a type of error known as “drift” in which errors that accumulate over time can lead to a discrepancy D between the position of the joystick 430 calculated from the inertial signal (shown in phantom) and the actual position of the game controller 430 .
  • Embodiments of the present invention allow a number of ways to deal with such errors.
  • the drift may be cancelled out manually by re-setting the initial position of the game controller 430 to be equal to the current calculated position.
  • a user may use one or more of the buttons on the game controller 430 to trigger a command to re-set the initial position.
  • image-based drift may be implemented by re-setting the current position to a position determined from an image obtained from the image capture unit 423 as a reference.
  • image-based drift compensation may be implemented manually, e.g., when the user triggers one or more of the buttons on the game controller 430 .
  • image-based drift compensation may be implemented automatically, e.g., at regular intervals of time or in response to game play.
  • Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401 .
  • the signal from the inertial sensor 432 may be oversampled and a sliding average may be computed from the oversampled signal to remove spurious data from the inertial sensor signal.
  • a sliding average may be computed from the oversampled signal to remove spurious data from the inertial sensor signal.
  • other data sampling and manipulation techniques may be used to adjust the signal from the inertial sensor to remove or reduce the significance of spurious data. The choice of technique may depend on the nature of the signal, computations to be performed with the signal, the nature of game play or some combination of two or more of these.
  • Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401 .
  • the processor 401 may perform digital signal processing on signal data 406 as described above in response to the data 406 and program code instructions of a program 404 stored and retrieved by the memory 402 and executed by the processor module 401 .
  • Code portions of the program 404 may conform to any one of a number of different programming languages such as Assembly, C++, JAVA or a number of other languages.
  • the processor module 401 forms a general-purpose computer that becomes a specific purpose computer when executing programs such as the program code 404 .
  • the program code 404 is described herein as being implemented in software and executed upon a general purpose computer, those skilled in the art will realize that the method of task management could alternatively be implemented using hardware such as an application specific integrated circuit (ASIC) or other hardware circuitry.
  • ASIC application specific integrated circuit
  • the program code 404 may include a set of processor readable instructions that implement a method having features in common with the method 110 of FIG. 1B , the method 120 of FIG. 1D , the method 140 of FIG. 1F , the method 300 of FIG. 3 or some combination of two or more of these.
  • the program code 404 may generally include one or more instructions that direct the one or more processors to select a pre-calibrated listening zone at runtime and filter out sounds originating from sources outside the pre-calibrated listening zone.
  • the pre-calibrated listening zones may include a listening zone that corresponds to a volume of focus or field of view of the image capture unit 423 .
  • the program code may include one or more instructions which, when executed, cause the apparatus 400 to select a pre-calibrated listening sector that contains a source of sound. Such instructions may cause the apparatus to determine whether a source of sound lies within an initial sector or on a particular side of the initial sector. If the source of sound does not lie within the default sector, the instructions may, when executed, select a different sector on the particular side of the default sector. The different sector may be characterized by an attenuation of the input signals that is closest to an optimum value. These instructions may, when executed, calculate an attenuation of input signals from the microphone array 422 and the attenuation to an optimum value. The instructions may, when executed, cause the apparatus 400 to determine a value of an attenuation of the input signals for one or more sectors and select a sector for which the attenuation is closest to an optimum value.
  • the program code 404 may optionally include one or more instructions that direct the one or more processors to produce a discrete time domain input signal x m (t) from the microphones M 0 . . . M M , determine a listening sector, and use the listening sector in a semi-blind source separation to select the finite impulse response filter coefficients to separate out different sound sources from input signal x m (t).
  • the program 404 may also include instructions to apply one or more fractional delays to selected input signals x m (t) other than an input signal x 0 (t) from a reference microphone M 0 . Each fractional delay may be selected to optimize a signal to noise ratio of a discrete time domain output signal y(t) from the microphone array.
  • the fractional delays may be selected to such that a signal from the reference microphone M 0 is first in time relative to signals from the other microphone(s) of the array.
  • the program code 404 may optionally include processor executable instructions including one or more instructions which, when executed cause the image capture unit 423 to monitor a field of view in front of the image capture unit 423 , identify one or more of the light sources 434 within the field of view, detect a change in light emitted from the light source(s) 434 ; and in response to detecting the change, triggering an input command to the processor 401 .
  • processor executable instructions including one or more instructions which, when executed cause the image capture unit 423 to monitor a field of view in front of the image capture unit 423 , identify one or more of the light sources 434 within the field of view, detect a change in light emitted from the light source(s) 434 ; and in response to detecting the change, triggering an input command to the processor 401 .
  • the program code 404 may optionally include processor executable instructions including one or more instructions which, when executed, use signals from the inertial sensor and signals generated from the image capture unit from tracking the one or more light sources as inputs to a game system, e.g., as described above.
  • the program code 404 may optionally include processor executable instructions including one or more instructions which, when executed compensate for drift in the inertial sensor 432 .
  • the program code 404 may optionally include processor executable instructions including one or more instructions which, when executed adjust the gearing and mapping of controller manipulations to game a environment.
  • processor executable instructions including one or more instructions which, when executed adjust the gearing and mapping of controller manipulations to game a environment.
  • Such a feature allows a user to change the “gearing” of manipulations of the game controller 430 to game state.
  • a 45 degree rotation of the game controller 430 may be geared to a 45 degree rotation of a game object.
  • this 1:1 gearing ratio may be modified so that an X degree rotation (or tilt or yaw or “manipulation”) of the controller translates to a Y rotation (or tilt or yaw or “manipulation”) of the game object.
  • Gearing may be 1:1 ratio, 1:2 ratio, 1:X ratio or X:Y ratio, where X and Y can take on arbitrary values.
  • mapping of input channel to game control may also be modified over time or instantly. Modifications may comprise changing gesture trajectory models, modifying the location, scale, threshold of gestures, etc. Such mapping may be programmed, random, tiered, staggered, etc., to provide a user with a dynamic range of manipulatives. Modification of the mapping, gearing or ratios can be adjusted by the program code 404 according to game play, game state, through a user modifier button (key pad, etc.) located on the game controller 430 , or broadly in response to the input channel.
  • the input channel may include, but may not be limited to elements of user audio, audio generated by controller, tracking audio generated by the controller, controller button state, video camera output, controller telemetry data, including accelerometer data, tilt, yaw, roll, position, acceleration and any other data from sensors capable of tracking a user or the user manipulation of an object.
  • the program code 404 may change the mapping or gearing over time from one scheme or ratio to another scheme, respectively, in a predetermined time-dependent manner.
  • Gearing and mapping changes can be applied to a game environment in various ways.
  • a video game character may be controlled under one gearing scheme when the character is healthy and as the character's health deteriorates the system may gear the controller commands so the user is forced to exacerbate the movements of the controller to gesture commands to the character.
  • a video game character who becomes disoriented may force a change of mapping of the input channel as users, for example, may be required to adjust input to regain control of the character under a new mapping.
  • Mapping schemes that modify the translation of the input channel to game commands may also change during gameplay. This translation may occur in various ways in response to game state or in response to modifier commands issued under one or more elements of the input channel.
  • Gearing and mapping may also be configured to influence the configuration and/or processing of one or more elements of the input channel.
  • a sound emitter 436 e.g., a speaker, a buzzer, a horn or a pipe, may be mounted to the game controller 430 .
  • the sound emitter may be detachably mounted to a “body” of the game controller 430 .
  • the sound emitter 436 may provide an audio signal that can be detected by the microphone array 422 and used by the program code 404 to track the position of the game controller 430 .
  • the sound emitter 436 may also be used to provide an additional “input channel” from the game controller 430 to the processor 401 .
  • Audio signals from the sound emitter 436 may be periodically pulsed to provide a beacon for the acoustic radar to track location.
  • the audio signals (pulsed or otherwise) may be audible or ultrasonic.
  • the acoustic radar may track the user manipulation of the game controller 430 and where such manipulation tracking may include information about the position and orientation (e.g., pitch, roll or yaw angle) of the game controller 430 .
  • the pulses may be triggered at an appropriate duty cycle as one skilled in the art is capable of applying. Pulses may be initiated based on a control signal arbitrated from the system.
  • the apparatus 400 (through the program code 404 ) may coordinate the dispatch of control signals amongst two or more game controllers 430 coupled to the processor 401 to assure that multiple controllers can be tracked.
  • FIG. 5 illustrates a type of cell processor 500 according to an embodiment of the present invention.
  • the cell processor 500 may be used as the processor 401 of FIG. 4 .
  • the cell processor 500 includes a main memory 502 , power processor element (PPE) 504 , and a number of synergistic processor elements (SPEs) 506 .
  • the cell processor 500 includes a single PPE 504 and eight SPE 506 .
  • a cell processor may alternatively include multiple groups of PPEs (PPE groups) and multiple groups of SPEs (SPE groups). In such a case, hardware resources can be shared between units within a group. However, the SPEs and PPEs must appear to software as independent elements. As such, embodiments of the present invention are not limited to use with the configuration shown in FIG. 5 .
  • the main memory 502 typically includes both general-purpose and nonvolatile storage, as well as special-purpose hardware registers or arrays used for functions such as system configuration, data-transfer synchronization, memory-mapped I/O, and I/O subsystems.
  • a signal processing program 503 and a signal 509 may be resident in main memory 502 .
  • the signal processing program 503 may be configured as described with respect to FIG. 1B , 1 D, 1 F or 3 above or some combination of two or more of these.
  • the signal processing program 503 may run on the PPE.
  • the program 503 may be divided up into multiple signal processing tasks that can be executed on the SPEs and/or PPE.
  • the PPE 504 may be a 64-bit PowerPC Processor Unit (PPU) with associated caches L1 and L2.
  • the PPE 504 is a general-purpose processing unit, which can access system management resources (such as the memory-protection tables, for example). Hardware resources may be mapped explicitly to a real address space as seen by the PPE. Therefore, the PPE can address any of these resources directly by using an appropriate effective address value.
  • a primary function of the PPE 504 is the management and allocation of tasks for the SPEs 506 in the cell processor 500 .
  • the cell processor 500 may have multiple PPEs organized into PPE groups, of which there may be more than one. These PPE groups may share access to the main memory 502 . Furthermore the cell processor 500 may include two or more groups SPEs. The SPE groups may also share access to the main memory 502 . Such configurations are within the scope of the present invention.
  • CBEA cell broadband engine architecture
  • Each SPE 506 is includes a synergistic processor unit (SPU) and its own local storage area LS.
  • the local storage LS may include one or more separate areas of memory storage, each one associated with a specific SPU.
  • Each SPU may be configured to only execute instructions (including data load and data store operations) from within its own associated local storage domain.
  • data transfers between the local storage LS and elsewhere in a system 500 may be performed by issuing direct memory access (DMA) commands from the memory flow controller (MFC) to transfer data to or from the local storage domain (of the individual SPE).
  • DMA direct memory access
  • MFC memory flow controller
  • the SPUs are less complex computational units than the PPE 504 in that they do not perform any system management functions.
  • the SPU generally have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by the PPE) in order to perform their allocated tasks.
  • SIMD single instruction, multiple data
  • the purpose of the SPU is to enable applications that require a higher computational unit density and can effectively use the provided instruction set.
  • a significant number of SPEs in a system managed by the PPE 504 allow for cost-effective processing over a wide range of applications.
  • Each SPE 506 may include a dedicated memory flow controller (MFC) that includes an associated memory management unit that can hold and process memory-protection and access-permission information.
  • MFC provides the primary method for data transfer, protection, and synchronization between main storage of the cell processor and the local storage of an SPE.
  • An MFC command describes the transfer to be performed. Commands for transferring data are sometimes referred to as MFC direct memory access (DMA) commands (or MFC DMA commands).
  • DMA direct memory access
  • Each MFC may support multiple DMA transfers at the same time and can maintain and process multiple MFC commands
  • Each MFC DMA data transfer command request may involve both a local storage address (LSA) and an effective address (EA).
  • LSA local storage address
  • EA effective address
  • the local storage address may directly address only the local storage area of its associated SPE.
  • the effective address may have a more general application, e.g., it may be able to reference main storage, including all the SPE local storage areas, if they are aliased into the real address space.
  • the SPEs 506 and PPE 504 may include signal notification registers that are tied to signaling events.
  • the PPE 504 and SPEs 506 may be coupled by a star topology in which the PPE 504 acts as a router to transmit messages to the SPEs 506 .
  • each SPE 506 and the PPE 504 may have a one-way signal notification register referred to as a mailbox.
  • the mailbox can be used by an SPE 506 to host operating system (OS) synchronization.
  • OS operating system
  • the cell processor 500 may include an input/output (I/O) function 508 through which the cell processor 500 may interface with peripheral devices, such as a microphone array 512 and optional image capture unit 513 .
  • I/O input/output
  • Element Interconnect Bus 510 may connect the various components listed above.
  • Each SPE and the PPE can access the bus 510 through a bus interface units BIU.
  • the cell processor 500 may also includes two controllers typically found in a processor: a Memory Interface Controller MIC that controls the flow of data between the bus 510 and the main memory 502 , and a Bus Interface Controller BIC, which controls the flow of data between the I/O 508 and the bus 510 .
  • a Memory Interface Controller MIC that controls the flow of data between the bus 510 and the main memory 502
  • BIC Bus Interface Controller
  • the cell processor 500 may also include an internal interrupt controller IIC.
  • the IIC component manages the priority of the interrupts presented to the PPE.
  • the IIC allows interrupts from the other components the cell processor 500 to be handled without using a main system interrupt controller.
  • the IIC may be regarded as a second level controller.
  • the main system interrupt controller may handle interrupts originating external to the cell processor.
  • certain computations such as the fractional delays described above, may be performed in parallel using the PPE 504 and/or one or more of the SPE 506 .
  • Each fractional delay calculation may be run as one or more separate tasks that different SPE 506 may take as they become available.
  • Embodiments of the present invention may utilize arrays of between about 2 and about 8 microphones in an array characterized by a microphone spacing d between about 0.5 cm and about 2 cm.
  • the microphones may have a dynamic range from about 120 Hz to about 16 kHz. It is noted that the introduction of fractional delays in the output signal y(t) as described above allows for much greater resolution in the source separation than would otherwise be possible with a digital processor limited to applying discrete integer time delays to the output signal. It is the introduction of such fractional time delays that allows embodiments of the present invention to achieve high resolution with such small microphone spacing and relatively inexpensive microphones.
  • Embodiments of the invention may also be applied to ultrasonic position tracking by adding an ultrasonic emitter to the microphone array and tracking objects locations through analysis of the time delay of arrival of echoes of ultrasonic pulses from the emitter.
  • FIG. 1 depicts linear arrays of microphones embodiments of the invention are not limited to such configurations.
  • three or more microphones may be arranged in a two-dimensional array, or four or more microphones may be arranged in a three-dimensional array.
  • a system based on 2-microphone array may be incorporated into a controller unit for a video game.
  • Signal processing systems of the present invention may use microphone arrays that are small enough to be utilized in portable hand-held devices such as cell phones personal digital assistants, video/digital cameras, and the like.
  • increasing the number of microphones in the array has no beneficial effect and in some cases fewer microphones may work better than more.
  • a four-microphone array has been observed to work better than an eight-microphone array.
  • Embodiments of the present invention may be used as presented herein or in combination with other user input mechanisms and notwithstanding mechanisms that track or profile the angular direction or volume of sound and/or mechanisms that track the position of the object actively or passively, mechanisms using machine vision, combinations thereof and where the object tracked may include ancillary controls or buttons that manipulate feedback to the system and where such feedback may include but is not limited light emission from light sources, sound distortion means, or other suitable transmitters and modulators as well as controls, buttons, pressure pad, etc. that may influence the transmission or modulation of the same, encode state, and/or transmit commands from or to a device, including devices that are tracked by the system and whether such devices are part of, interacting with or influencing a system used in connection with embodiments of the present invention.

Abstract

A tracking device may include a body, a sound emitter operable to emit a sound, an array of two or more microphones adapted to produce discrete time domain input signals at a runtime, one or more processors coupled to the array of two or more microphones; and a memory coupled to the microphones and the processor. The memory has a set of processor readable instructions embodied therein. The instructions include one or more instructions for:
determining a listening direction of the sound source relative to the microphone array from the discrete time domain input signals; using the listening direction in a semi-blind source separation to select filtering functions that separate out sounds from the sound emitter from the discrete time domain input signals; identifying a listening zone containing the sound emitter from the filtering functions; and determining a location of the sound emitter from the identified listening zone.

Description

    PRIORITY CLAIM
  • This application is a continuation of co-pending U.S. patent application Ser. No. 11/382,256, entitled “TRACKING DEVICE WITH SOUND EMITTER FOR USE IN OBTAINING INFORMATION FOR CONTROLLING GAME PROGRAM EXECUTION”, filed May 8, 2006, which is hereby incorporated by reference.
  • This application also claims priority benefit of U.S. Provisional Patent Application No. 60/718,145, entitled “AUDIO, VIDEO, SIMULATION, AND USER INTERFACE PARADIGMS”, filed Sep. 15, 2005, which is hereby incorporated by reference.
  • This application also claims priority benefit of U.S. patent application Ser. No. 10/207,677, entitled, “MAN-MACHINE INTERFACE USING A DEFORMABLE DEVICE”, filed on Jul. 27, 2002; U.S. patent application Ser. No. 10/650,409, entitled, “AUDIO INPUT SYSTEM”, filed on Aug. 27, 2003; U.S. patent application Ser. No. 10/663,236, entitled “METHOD AND APPARATUS FOR ADJUSTING A VIEW OF A SCENE BEING DISPLAYED ACCORDING TO TRACKED HEAD MOTION”, filed on Sep. 15, 2003; U.S. patent application Ser. No. 10/759,782, entitled “METHOD AND APPARATUS FOR LIGHT INPUT DEVICE”, filed on Jan. 16, 2004; U.S. patent application Ser. No. 10/820,469, entitled “METHOD AND APPARATUS TO DETECT AND REMOVE AUDIO DISTURBANCES”, filed on Apr. 7, 2004; and U.S. patent application Ser. No. 11/301,673, entitled “METHOD FOR USING RELATIVE HEAD AND HAND POSITIONS TO ENABLE A POINTING INTERFACE VIA CAMERA TRACKING”, filed on Dec. 12, 2005, all of which are hereby incorporated by reference.
  • This application also claims priority benefit of U.S. patent application Ser. No. 11/381,729, to Xiao Dong Mao, entitled ULTRA SMALL MICROPHONE ARRAY, (Attorney Docket SCEA05062US00), filed on May 4, 2006, application Ser. No. 11/381,728, to Xiao Dong Mao, entitled ECHO AND NOISE CANCELLATION, (Attorney Docket SCEA05064US00), filed on May 4, 2006, U.S. patent application Ser. No. 11/381,725, to Xiao Dong Mao, entitled “METHODS AND APPARATUS FOR TARGETED SOUND DETECTION”, (Attorney Docket SCEA05072US00), filed on May 4, 2006, U.S. patent application Ser. No. 11/381,727, to Xiao Dong Mao, entitled “NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE”, (Attorney Docket SCEA05073US00), filed on May 4, 2006, U.S. patent application Ser. No. 11/381,724, to Xiao Dong Mao, entitled “METHODS AND APPARATUS FOR TARGETED SOUND DETECTION AND CHARACTERIZATION”, (Attorney Docket SCEA05079US00), filed on May 4, 2006, U.S. patent application Ser. No. 11/381,721, to Xiao Dong Mao, entitled “SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING”, (Attorney Docket SCEA04005JUMBOUS), filed on May 4, 2006; all of which are hereby incorporated by reference.
  • This application is also claims priority benefit of: co-pending application Ser. No. 11/418,988, to Xiao Dong Mao, entitled “METHODS AND APPARATUSES FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS”, (Attorney Docket SCEA-00300) filed on May 4, 2006; co-pending application Ser. No. 11/418,989, to Xiao Dong Mao, entitled “METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON VISUAL IMAGE”, (Attorney Docket SCEA-00400), filed on May 4, 2006; co-pending application Ser. No. 11/429,047, to Xiao Dong Mao, entitled “METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON A LOCATION OF THE SIGNAL”, (Attorney Docket SCEA-00500), filed on May 4, 2006; co-pending application Ser. No. 11/429,133, to Richard Marks et al., entitled “SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING”, (Attorney Docket SCEA04005US01-SONYP045), filed on May 4, 2006; and co-pending application Ser. No. 11/429,414, to Richard Marks et al., entitled “Computer Image and Audio Processing of Intensity and Input Devices for Interfacing With A Computer Program”, (Attorney Docket SONYP052), filed on May 4, 2006, all of the entire disclosures of which are incorporated herein by reference.
  • This application also claims priority benefit of U.S. patent application Ser. No. 11/382,031, entitled “MULTI-INPUT GAME CONTROL MIXER”, (Attorney Docket SCEA06MXR1), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,032, entitled “SYSTEM FOR TRACKING USER MANIPULATIONS WITHIN AN ENVIRONMENT”, (Attorney Docket SCEA06MXR2), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,033, entitled “SYSTEM, METHOD, AND APPARATUS FOR THREE-DIMENSIONAL INPUT CONTROL”, (Attorney Docket SCEA06INRT1), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,035, entitled “INERTIALLY TRACKABLE HAND-HELD CONTROLLER”, (Attorney Docket SCEA06INRT2), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,036, entitled “METHOD AND SYSTEM FOR APPLYING GEARING EFFECTS TO VISUAL TRACKING”, (Attorney Docket SONYP058A), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,041, entitled “METHOD AND SYSTEM FOR APPLYING GEARING EFFECTS TO INERTIAL TRACKING”, (Attorney Docket SONYP058B), filed on May 7, 2006; U.S. patent application Ser. No. 11/382,038, entitled “METHOD AND SYSTEM FOR APPLYING GEARING EFFECTS TO ACOUSTICAL TRACKING”, (Attorney Docket SONYP058C), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,040, entitled “METHOD AND SYSTEM FOR APPLYING GEARING EFFECTS TO MULTI-CHANNEL MIXED INPUT”, (Attorney Docket SONYP058D), filed on May 7, 2006; U.S. patent application Ser. No. 11/382,034, entitled “SCHEME FOR DETECTING AND TRACKING USER MANIPULATION OF A GAME CONTROLLER BODY”, (Attorney Docket 86321 SCEA05082US00), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,037, entitled “SCHEME FOR TRANSLATING MOVEMENTS OF A HAND-HELD CONTROLLER INTO INPUTS FOR A SYSTEM”, (Attorney Docket 86324), filed on May 6, 2006; U.S. patent application Ser. No. 11/382,043, entitled “DETECTABLE AND TRACKABLE HAND-HELD CONTROLLER”, (Attorney Docket 86325), filed on May 7, 2006; U.S. patent application Ser. No. 11/382,039, entitled “METHOD FOR MAPPING MOVEMENTS OF A HAND-HELD CONTROLLER TO GAME COMMANDS”, (Attorney Docket 86326), filed on May 7, 2006; U.S. Design patent application Ser. No. 29/259,349, entitled “CONTROLLER WITH INFRARED PORT”, (Attorney Docket SCEA06007US00), filed on May 6, 2006; U.S. Design patent application Ser. No. 29/259,350, entitled “CONTROLLER WITH TRACKING SENSORS”, (Attorney Docket SCEA06008US00), filed on May 6, 2006; U.S. Patent Application No. 60/798,031, entitled “DYNAMIC TARGET INTERFACE”, (Attorney Docket SCEA06009US00), filed on May 6, 2006; and U.S. Design patent application Ser. No. 29/259,348, entitled “TRACKED CONTROLLER DEVICE”, (Attorney Docket SCEA06010US00), filed on May 6, 2006; all of which are hereby incorporated herein by reference in their entireties.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/430,594, to Gary Zalewski and Riley R. Russell, entitled “Profile Detection”, (Attorney Docket SCEA05059US00), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/430,593, to Gary Zalewski and Riley R. Russell, entitled “Using Audio/Visual Environment To Select Ads On Game Platform”, (Attorney Docket SCEAJP 3.0-003 CIP V), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/400,997, filed on Apr. 10, 2006, to Larsen and Chen, entitled “System And Method For Obtaining User Information From Voices”, (Attorney Docket SCEA05040US00), the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/382,259, to Gary Zalewski et al., entitled “Method and apparatus for use in determining lack of user activity in relation to a system”, (Attorney Docket 86327), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/382,258, to Gary Zalewski et al., entitled “Method and apparatus for use in determining an activity level of a user in relation to a system”, (Attorney Docket 86328), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/382,251, to Gary Zalewski et al., entitled “Hand-held controller having detectable elements for tracking purposes”, (Attorney Docket 86329), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/382,252, entitled “TRACKING DEVICE FOR USE IN OBTAINING INFORMATION FOR CONTROLLING GAME PROGRAM EXECUTION”, (Attorney Docket SCEA06INRT3), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. patent application Ser. No. 11/382,250, entitled “OBTAINING INPUT FOR CONTROLLING EXECUTION OF A GAME PROGRAM”, (Attorney Docket SCEA06COMB), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,744, entitled “VIDEO GAME CONTROLLER FRONT FACE”, (Attorney Docket SCEACTR-D3), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,743, entitled “VIDEO GAME CONTROLLER”, (Attorney Docket SCEACTRL-D2), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,767, entitled “VIDEO GAME CONTROLLER”, (Attorney Docket SONYP059A), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,768, entitled “VIDEO GAME CONTROLLER”, (Attorney Docket SONYP059B), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,763, entitled “ERGONOMIC GAME CONTROLLER DEVICE WITH LEDS AND OPTICAL PORTS”, (Attorney Docket PA3760US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,759, entitled “GAME CONTROLLER DEVICE WITH LEDS AND OPTICAL PORTS”, (Attorney Docket PA3761US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,765, entitled “DESIGN FOR OPTICAL GAME CONTROLLER INTERFACE”, (Attorney Docket PA3762US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,766, entitled “DUAL GRIP GAME CONTROL DEVICE WITH LEDS AND OPTICAL PORTS”, (Attorney Docket PA3763US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,764, entitled “GAME INTERFACE DEVICE WITH LEDS AND OPTICAL PORTS”, (Attorney Docket PA3764US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • This application is also related to co-pending U.S. Design patent application Ser. No. 29/246,762, entitled “ERGONOMIC GAME INTERFACE DEVICE WITH LEDS AND OPTICAL PORTS”, (Attorney Docket PA3765US), filed May 8, 2006, the entire disclosure of which is incorporated herein by reference.
  • FIELD OF THE INVENTION
  • Embodiments of the present invention are directed to audio signal processing and more particularly to processing of audio signals from microphone arrays.
  • BACKGROUND OF THE INVENTION
  • Many consumer electronic devices could benefit from a directional microphone that filters out sounds coming from outside a relatively narrow listening zone. Although such directional microphones are available they tend to be either bulky or expensive or both. Consequently such directional microphones are unsuitable for applications in consumer electronics.
  • Microphone arrays are often used to provide beam-forming for either noise reduction or echo-position, or both, by detecting the sound source direction or location. A typical microphone array has two or more microphones in fixed positions relative to each other with adjacent microphones separated by a known geometry, e.g., a known distance and/or known layout of the microphones. Depending on the orientation of the array, a sound originating from a source remote from the microphone array can arrive at different microphones at different times. Differences in time of arrival at different microphones in the array can be used to derive information about the direction or location of the source. Conventional microphone direction detection techniques analyze the correlation between signals from different microphones to determine the direction to the location of the source. Although effective, this technique is computationally intensive and is not robust. Such drawbacks make such techniques unsuitable for use in hand-held devices and consumer electronic applications, such as video game controllers.
  • Thus, there is a need in the art, for microphone array technique that overcomes the above disadvantages.
  • SUMMARY OF THE INVENTION
  • Embodiments of the invention are directed to methods and apparatus for targeted sound detection. In embodiments of the invention may be implemented with a microphone array having two or more microphones M0 . . . MM. Each microphone is coupled to a plurality of filters. The filters are configured to filter input signals corresponding to sounds detected by the microphones thereby generating a filtered output. One or more sets of filter parameters for the plurality of filters are pre-calibrated to determine one or more corresponding pre-calibrated listening zones. Each set of filter parameters is selected to detect portions of the input signals corresponding to sounds originating within a given listening zone and filter out sounds originating outside the given listening zone. A particular pre-calibrated listening zone is selected at a runtime by applying to the plurality of filters a set of filter coefficients corresponding to the particular pre-calibrated listening zone. As a result, the microphone array may detect sounds originating within the particular listening sector and filter out sounds originating outside the particular listening zone. Sounds are detected with the microphone array. A particular listening zone containing a source of the sound is identified. The sound or the source of the sound is characterized and the sound is emphasized or filtered out depending on how the sound is characterized.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
  • FIG. 1A is a schematic diagram of a microphone array according to an embodiment of the present invention.
  • FIG. 1B is a flow diagram illustrating a method for targeted sound detection according to an embodiment of the present invention.
  • FIG. 1C is a schematic diagram illustrating targeted sound detection according to a preferred embodiment of the present invention.
  • FIG. 1D is a flow diagram illustrating a method for targeted sound detection according to the preferred embodiment of the present invention.
  • FIG. 1E is a top plan view of a sound source location and characterization apparatus according to an embodiment of the present invention.
  • FIG. 1F is a flow diagram illustrating a method for sound source location and characterization according to an embodiment of the present invention.
  • FIG. 1G is a top plan view schematic diagram of an apparatus having a camera and a microphone array for targeted sound detection from within a field of view of the camera according to an embodiment of the present invention.
  • FIG. 1H is a front elevation view of the apparatus of FIG. 1E.
  • FIGS. 1I-1J are plan view schematic diagrams of an audio-video apparatus according to an alternative embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a microphone array and filter apparatus according to an embodiment of the present invention.
  • FIG. 3 is a flow diagram of a method for processing a signal from an array of two or more microphones according to an embodiment of the present invention.
  • FIG. 4 is a block diagram illustrating a signal processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a block diagram of a cell processor implementation of a signal processing system according to an embodiment of the present invention.
  • DESCRIPTION OF THE SPECIFIC EMBODIMENTS
  • Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
  • As depicted in FIG. 1A, a microphone array 102 may include four microphones M0, M1, M2, and M3 that are coupled to corresponding signal filters F0, F1, F2 and F3. Each of the filters may implement some combination of finite impulse response (FIR) filtering and time delay of arrival (TDA) filtering. In general, the microphones M0, M1, M2, and M3 may be omni-directional microphones, i.e., microphones that can detect sound from essentially any direction. Omni-directional microphones are generally simpler in construction and less expensive than microphones having a preferred listening direction. The microphones M0, M1, M2, and M3 produce corresponding outputs x0(t), x1(t), x2(t), x3(t). These outputs serve as inputs to the filters F0, F1, F2 and F3. Each filter may apply a time delay of arrival (TDA) and/or a finite impulse response (FIR) to its input. The outputs of the filters may be combined into a filtered output y(t). Although four microphones M0, M1, M2 and M3 and four filters F0, F1, F2 and F3 are depicted in FIG. 1A for the sake of example, those of skill in the art will recognize that embodiments of the present invention may include any number of microphones greater than two and any corresponding number of filters.
  • An audio signal arriving at the microphone array 102 from one or more sources 104, 106 may be expressed as a vector x=[x0, x1, x2, x3], where x0, x1, x2 and x3 are the the microphones M0, M1, M2 and M3 respectively. Each signal xm generally includes subcomponents due to different sources of sounds. The subscript m ranges from 0 to 3 in this example and is used to distinguish among the different microphones in the array. The subcomponents may be expressed as a vector s=[s1, s2, . . . sK], where K is the number of different sources.
  • To separate out sounds from the signal s originating from different sources one must determine the best TDA filter for each of the filters F0, F1, F2 and F3. To facilitate separation of sounds from the sources 104, 106, the filters F0, F1, F2 and F3 are pre-calibrated with filter parameters (e.g., FIR filter coefficients and/or TDA values) that define one or more pre-calibrated listening zones Z. Each listening zone Z is a region of space proximate the microphone array 102. The parameters are chosen such that sounds originating from a source 104 located within the listening zone Z are detected while sounds originating from a source 106 located outside the listening zone Z are filtered out, i.e., substantially attenuated. In the example depicted in FIG. 1A, the listening zone Z is depicted as being a more or less wedge-shaped sector having an origin located at or proximate the center of the microphone array 102. Alternatively, the listening zone Z may be a discrete volume, e.g., a rectangular, spherical, conical or arbitrarily-shaped volume in space. Wedge-shaped listening zones can be robustly established using a linear array of microphones. Robust listening zones defined by arbitrarily-shaped volumes may be established using a planar array or an array of at least four microphones where in at least one microphone lies in a different plane from the others. Such an array is referred to herein as a “concave” microphone array.
  • As depicted in the flow diagram of FIG. 1B, a method 110 for targeted voice detection using the microphone array 102 may proceed as follows. As indicated at 112, one or more sets of the filter coefficients for the filters F0, F1, F2 and F3 are determined corresponding to one or more pre-calibrated listening zones Z. Each set of filter coefficients is selected to detect portions of the input signals corresponding to sounds originating within a given listening sector and filters out sounds originating outside the given listening sector. To pre-calibrate the listening sectors S one or more known calibration sound sources may be placed at several different known locations within and outside the sector S. During calibration, the calibration source(s) may emit sounds characterized by known spectral distributions similar to sounds the microphone array 102 is likely to encounter at runtime. The known locations and spectral characteristics of the sources may then be used to select the values of the filter parameters for the filters F0, F1, F2 and F3
  • By way of example, and without limitation, Blind Source Separation (BSS) may be used to pre-calibrate the filters F0, F1, F2 and F3 to define the listening zones Z. Blind source separation separates a set of signals into a set of other signals, such that the regularity of each resulting signal is maximized, and the regularity between the signals is minimized (i.e., statistical independence is maximized or decorrelation is minimized). The blind source separation may involve an independent component analysis (ICA) that is based on second-order statistics. In such a case, the data for the signal arriving at each microphone may be represented by the random vector xm=[x1, . . . xn] and the components as a random vector s=[s1, . . . sn] The task is to transform the observed data xm, using a linear static transformation s=Wx, into maximally independent components s measured by some function F(s1, . . . sn) of independence.
  • The components xmi of the observed random vector xm=(xm1, . . . , xm0) are generated as a sum of the independent components smk, k=1, . . . , n, xmi=ami1sm1+ . . . +amiksmk+ . . . +aminsmin, weighted by the mixing weights amik. In other words, the data vector xm can be written as the product of a mixing matrix A with the source vector sT, i.e., xm=A·sT or
  • [ x m 1 x mn ] = [ a m 11 a m 1 n a mn 1 a mnn ] · [ s 1 s n ]
  • The original sources s can be recovered by multiplying the observed signal vector xm with the inverse of the mixing matrix W=A−1, also known as the unmixing matrix. Determination of the unmixing matrix A−1 may be computationally intensive. Embodiments of the invention use blind source separation (BSS) to determine a listening direction for the microphone array. The listening zones Z of the microphone array 102 can be calibrated prior to run time (e.g., during design and/or manufacture of the microphone array) and may optionally be re-calibrated at run time.
  • By way of example, the listening zone Z may be pre-calibrated as follows. A user standing within the listening zone Z may record speech for about 10 to 30 seconds. Preferably, the recording room does not contain transient interferences, such as competing speech, background music, etc. Pre-determined intervals, e.g., about every 8 milliseconds, of the recorded voice signal may be formed into analysis frames, and transformed from the time domain into the frequency domain. Voice-Activity Detection (VAD) may be performed over each frequency-bin component in this frame. Only bins that contain strong voice signals are collected in each frame and used to estimate its 2nd-order statistics, for each frequency bin within the frame, i.e. a “Calibration Covariance Matrix” Cal_Cov(j,k)=E((X′jk)T*X′jk), where E refers to the operation of determining the expectation value and (X′jk)T is the transpose of the vector X′jk. The vector X′jk is a M+1 dimensional vector representing the Fourier transform of calibration signals for the jth frame and the kth frequency bin.
  • The accumulated covariance matrix then contains the strongest signal correlation that is emitted from the target listening direction. Each calibration covariance matrix Cal_Cov(j,k) may be decomposed by means of “Principal Component Analysis”(PCA) and its corresponding eigenmatrix C may be generated. The inverse C−1 of the eigenmatrix C may thus be regarded as a “listening direction” that essentially contains the most information to de-correlate the covariance matrix, and is saved as a calibration result. As used herein, the term “eigenmatrix” of the calibration covariance matrix Cal_Cov(j,k) refers to a matrix having columns (or rows) that are the eigenvectors of the covariance matrix.
  • At run time, this inverse eigenmatrix C−1 may be used to de-correlate the mixing matrix A by a simple linear transformation. After de-correlation, A is well approximated by its diagonal principal vector, thus the computation of the unmixing matrix (i.e., A−1) is reduced to computing a linear vector inverse of:

  • A1=A*C −1
  • A1 is the new transformed mixing matrix in independent component analysis (ICA). The principal vector is just the diagonal of the matrix A1.
  • The process may be refined by repeating the above procedure with the user standing at different locations within the listening zone Z. In microphone-array noise reduction it is preferred for the user to move around inside the listening sector during calibration so that the beamforming has a certain tolerance (essentially forming a listening cone area) that provides a user some flexible moving space while talking. In embodiments of the present invention, by contrast, voice/sound detection need not be calibrated for the entire cone area of the listening sector S. Instead the listening sector is preferably calibrated for a very narrow beam B along the center of the listening zone Z, so that the final sector determination based on noise suppression ratio becomes more robust. The process may be repeated for one or more additional listening zones.
  • Recalibration in runtime may follow the preceding steps. However, the default calibration in manufacture takes a very large amount of recording data (e.g., tens of hours of clean voices from hundreds of persons) to ensure an unbiased, person-independent statistical estimation. While the recalibration at runtime requires small amount of recording data from a particular person, the resulting estimation of C−1 is thus biased and person-dependant.
  • As described above, a principal component analysis (PCA) may be used to determine eigenvalues that diagonalize the mixing matrix A. The prior knowledge of the listening direction allows the energy of the mixing matrix A to be compressed to its diagonal. This procedure, referred to herein as semi-blind source separation (SBSS) greatly simplifies the calculation the independent component vector sT.
  • Embodiments of the present invention may also make use of anti-causal filtering. To illustrate anti-causal filtering, consider a situation in which one microphone, e.g., M0 is chosen as a reference microphone for the microphone array 102. In order for the signal x(t) from the microphone array to be causal, signals from the source 104 must arrive at the reference microphone M0 first. However, if the signal arrives at any of the other microphones first, M0 cannot be used as a reference microphone. Generally, the signal will arrive first at the microphone closest to the source 104. Embodiments of the present invention adjust for variations in the position of the source 104 by switching the reference microphone among the microphones M0, M1, M2, M3 in the array 102 so that the reference microphone always receives the signal first. Specifically, this anti-causality may be accomplished by artificially delaying the signals received at all the microphones in the array except for the reference microphone while minimizing the length of the delay filter used to accomplish this.
  • For example, if microphone M0 is the reference microphone, the signals at the other three (non-reference) microphones M1, M2, M3 may be adjusted by a fractional delay Δtm, (m=1, 2, 3) based on the system output y(t). The fractional delay Δtm may be adjusted based on a change in the signal to noise ratio (SNR) of the system output y(t). Generally, the delay is chosen in a way that maximizes SNR. For example, in the case of a discrete time signal the delay for the signal from each non-reference microphone At at time sample t may be calculated according to: Δtm(t)=Δtm(t−1)+μΔSNR, where ΔSNR is the change in SNR between t−2 and t−1 and μ is a pre-defined step size, which may be empirically determined. If Δt(t)>1 the delay has been increased by 1 sample. In embodiments of the invention using such delays for anti-causality, the total delay (i.e., the sum of the Δtm) is typically 2-3 integer samples. This may be accomplished by use of 2-3 filter taps. This is a relatively small amount of delay when one considers that typical digital signal processors may use digital filters with up to 512 taps. However, switching between different pre-calibrated listening sectors may be more robust when significantly fewer filter taps are used. For example, 128 taps may be used for the array beamforming filter for this voice detection, 512 taps may be used for array beamforming for noise-reduction purposes, and about 2 to 5 taps may be used for delay filters in both cases It is noted that applying the artificial delays At to the non-reference microphones is the digital equivalent of physically orienting the array 102 such that the reference microphone M0 is closest to the sound source 104. Appropriate configuration of the filters F0, F1, F2 and F3 and the delays Δt0, Δt0, Δt0, and Δt0may be used to establish the pre-calibrated listening sector S.
  • Referring again to FIG. 1B, as indicated at 114 a particular pre-calibrated listening zone Z may be selected at a runtime by applying to the filters F0, F1, F2 and F3 a set of filter parameters corresponding to the particular pre-calibrated listening zone Z. As a result, the microphone array may detect sounds originating within the particular listening sector and filter out sounds originating outside the particular listening sector. Although a single listening sector is shown in FIG. 1A, embodiments of the present invention may be extended to situations in which a plurality of different listening sectors are pre-calibrated. As indicated at 116 of FIG. 1B, the microphone array 102 can then track between two or more pre-calibrated sectors at runtime to determine in which sector a sound source resides. For example as illustrated in FIG. 1C, the space surrounding the microphone array 102 may be divided into multiple listening zones in the form of eighteen different pre-calibrated 20 degree wedge-shaped listening sectors S0 . . . S17 that encompass about 360 degrees surrounding the microphone array 102 by repeating the calibration procedure outlined above each of the different sectors and associating a different set of FIR filter coefficients and TDA values with each different sector. By applying an appropriate set of pre-determined filter settings (e.g., FIR filter coefficients and/or TDA values determined during calibration as described above) to the filters F0, F1, F2, F3 any of the listening sectors S0 . . . S17 may be selected.
  • By switching from one set of pre-determined filter settings to another, the microphone array 102 can switch from one sector to another to track a sound source 104 from one sector to another. For example, referring again to FIG. 1C, consider a situation where the sound source 104 is located in sector S7 and the filters F0, F1, F2, F3 are set to select sector S4. Since the filters are set to filter out sounds coming from outside sector S4 the input energy E of sounds from the sound source 104 will be attenuated. The input energy E may be defined as a dot product:
  • E = 1 / M m x m T ( t ) · x m ( t )
  • Where xm T(t) is the transpose of the vector xm(t), which represents microphone output xm(t). And the sum is an average taken over all M microphones in the array.
  • The attenuation of the input energy E may be determined from the ratio of the input energy E to the filter output energy, i.e.:
  • Attenuation = 1 / M m x m T ( t ) · x m ( t ) y T ( t ) · y ( t ) .
  • If the filters are set to select the sector containing the sound source 104 the attenuation is approximately equal to 1. Thus, the sound source 104 may be tracked by switching the settings of the filters F0, F1, F2, F3 from one sector setting to another and determining the attenuation for different sectors. A targeted voice detection 120 method using determination of attenuation for different listening sectors may proceed as depicted in the flow diagram of FIG. 1D. At 122 any pre-calibrated listening sector may be selected initially. For example, sector S4, which corresponds roughly to a forward listening direction, may be selected as a default initial listening sector. At 124 an input signal energy attenuation is determined for the initial listen sector. If, at 126 the attenuation is not an optimum value another pre-calibrated sector may be selected at 128. If, at 126 the attenuation is an optimum value, the tracking is stopped at 129.
  • There are a number of different ways to search through the sectors S0 . . . S17 for the sector containing the sound source 104. For example, by comparing the input signal energies for the microphones M0 and M3 at the far ends of the array it is possible to determine whether the sound source 104 is to one side or the other of the default sector S4. For example, in some cases the correct sector may be “behind” the microphone array 102, e.g., in sectors S9 . . . S17. In many cases the mounting of the microphone array may introduce a built-in attenuation of sounds coming from these sectors such that there is a minimum attenuation, e.g., of about 1 dB, when the source 104 is located in any of these sectors. Consequently it may be determined from the input signal attenuation whether the source 104 is “in front” or “behind” the microphone array 102.
  • As a first approximation, the sound source 104 might be expected to be closer to the microphone having the larger input signal energy. In the example depicted in FIG. 1C, it would be expected that the right hand microphone M3 would have the larger input signal energy and, by process of elimination, the sound source 104 would be in one of sectors S6, S7, S8, S9, S10, S11, S12. Preferably, the next sector selected is one that is approximately 90 degrees away from the initial sector S4 in a direction toward the right hand microphone M3, e.g., sector S8. The input signal energy attenuation for sector S8 may be determined as indicated at 124. If the attenuation is not the optimum value another sector may be selected at 126. By way of example, the next sector may be one that is approximately 45 degrees away from the previous sector in the direction back toward the initial sector, e.g., sector S6. Again the input signal energy attenuation may be determined and compared to the optimum attenuation. If the input signal energy is not close to the optimum only two sectors remain in this example. Thus, for the example depicted in FIG. 1C, in a maximum of four sector switches, the correct sector may be determined The process of determining the input signal energy attenuation and switching between different listening sectors may be accomplished in about 100 milliseconds if the input signal is sufficiently strong.
  • Sound source location as described above may be used in conjunction with a sound source location and characterization technique referred to herein as “acoustic radar”. FIG. 1E depicts an example of a sound source location and characterization apparatus 130 having a microphone array 102 described above coupled to an electronic device 132 having a processor 134 and memory 136. The device may be a video game, television or other consumer electronic device. The processor 134 may execute instructions that implement the FIR filters and time delays described above. The memory 136 may contain data 138 relating to pre-calibration of a plurality of listening zones. By way of example the pre-calibrated listening zones may include wedge shaped listening sectors S0, S1, S2, S3, S4, S5, S6, S7, S8.
  • The instructions run by the processor 134 may operate the apparatus 130 according to a method as set forth in the flow diagram 131 of FIG. 1F. Sound sources 104, 105 within the listening zones can be detected using the microphone array 102. One sound source 104 may be of interest to the device 132 or a user of the device. Another sound source 105 may be a source of background noise or otherwise not of interest to the device 132 or its user. Once the microphone array 102 detects a sound the apparatus 130 determines which listening zone contains the sound's source 104 as indicated at 133 of FIG. 1F. By way of example, the iterative sound source sector location routine described above with respect to FIGS. 1C-1D may be used to determine the pre-calibrated listening zones containing the sound sources 104, 105 (e.g., sectors S3 and S6 respectively).
  • Once a listening zone containing the sound source has been identified, the microphone array may be refocused on the sound source, e.g., using adaptive beam forming The use of adaptive beam forming techniques is described, e.g., in US Patent Application Publication Number 2005/0047611 A1. to Xiadong Mao, which is incorporated herein by reference. The sound source 104 may then be characterized as indicated at 135, e.g., through analysis of an acoustic spectrum of the sound signals originating from the sound source. Specifically, a time domain signal from the sound source may be analyzed over a predetermined time window and a fast Fourier transform (FFT) may be performed to obtain a frequency distribution characteristic of the sound source. The detected frequency distribution may be compared to a known acoustic model. The known acoustic model may be a frequency distribution generated from training data obtained from a known source of sound. A number of different acoustic models may be stored as part of the data 138 in the memory 136 or other storage medium and compared to the detected frequency distribution. By comparing the detected sounds from the sources 104,105 against these acoustic models a number of different possible sound sources may be identified.
  • Based upon the characterization of the sound source 104, 105, the apparatus 132 may take appropriate action depending upon whether the sound source is of interest or not. For example, if the sound source 104 is determined to be one of interest to the device 132, the apparatus may emphasize or amplify sounds coming from sector S3 and/or take other appropriate action. For example, if the device 132 is a video game controller and the source 104 is a video game player, the device 132 may execute game instructions such as “jump” or “swing” in response to sounds from the source 104 that are interpreted as game commands. Similarly, if the sound source 105 is determined not to be of interest to the device 132 or its user, the device may filter out sounds coming from sector S6 or take other appropriate action. In some embodiments, for example, an icon may appear on a display screen indicating the listening zone containing the sound source and the type of sound source.
  • In some embodiments, amplifying sound or taking other appropriate action may include reducing noise disturbances associated with a source of sound. For example, a noise disturbance of an audio signal associated with sound source 104 may be magnified relative to a remaining component of the audio signal. Then, a sampling rate of the audio signal may be decreased and an even order derivative is applied to the audio signal having the decreased sampling rate to define a detection signal. Then, the noise disturbance of the audio signal may be adjusted according to a statistical average of the detection signal. A system capable of canceling disturbances associated with an audio signal, a video game controller, and an integrated circuit for reducing noise disturbances associated with an audio signal are included. Details of a such a technique are described, e.g., in commonly-assigned U.S. patent application Ser. No. 10/820,469, to Xiadong Mao entitled “METHOD AND APPARATUS TO DETECT AND REMOVE AUDIO DISTURBANCES”, which was filed Apr. 7, 2004 and published on Oct. 13, 2005 as US Patent Application Publication 20050226431, the entire disclosures of which are incorporated herein by reference.
  • By way of example, the apparatus 130 may be used in a baby monitoring application. Specifically, an acoustic model stored in the memory 136 may include a frequency distribution characteristic of a baby or even of a particular baby. Such a sound may be identified as being of interest to the device 130 or its user. Frequency distributions for other known sound sources, e.g., a telephone, television, radio, computer, persons talking, etc., may also be stored in the memory 136. These sound sources may be identified as not being of interest.
  • Sound source location and characterization apparatus and methods may be used in ultrasonic- and sonic-based consumer electronic remote controls, e.g., as described in commonly assigned U.S. patent application Ser. No. 11/418,993 to Steven Osman, entitled “SYSTEM AND METHOD FOR CONTROL BY AUDIBLE DEVICE” (attorney docket no. SCEAJP 3.0-001), the entire disclosures of which are incorporated herein by reference. Specifically, a sound received by the microphone array may 102 be analyzed to determine whether or not it has one or more predetermined characteristics. If it is determined that the sound does have one or more predetermined characteristics, at least one control signal may be generated for the purpose of controlling at least one aspect of the device 132.
  • In some embodiments of the present invention, the pre-calibrated listening zone Z may correspond to the field-of-view of a camera. For example, as illustrated in FIGS. 1G-1H an audio-video apparatus 140 may include a microphone array 102 and signal filters F0, F1, F2, F3, e.g., as described above, and an image capture unit 142. By way of example, the image capture unit 142 may be a digital camera. An example of a suitable digital camera is a color digital camera sold under the name “EyeToy” by Logitech of Fremont, Calif. The image capture unit 142 may be mounted in a fixed position relative to the microphone array 102, e.g., by attaching the microphone array 102 to the image capture unit 142 or vice versa. Alternatively, both the microphone array 102 and image capture unit 142 may be attached to a common frame or mount (not shown). Preferably, the image capture unit 142 is oriented such that an optical axis 144 of its lens system 146 is aligned parallel to an axis perpendicular to a common plane of the microphones M0, M1, M2, M3 of the microphone array 102. The lens system 146 may be characterized by a volume of focus FOV that is sometimes referred to as the field of view of the image capture unit. In general, objects outside the field of view FOV do not appear in images generated by the image capture unit 142. The settings of the filters F0, F1, F2, F3 may be pre-calibrated such that the microphone array 102 has a listening zone Z that corresponds to the field of view FOV of the image capture unit 142. As used herein, the listening zone Z may be said to “correspond” to the field of view FOV if there is a significant overlap between the field of view FOV and the listening zone Z. As used herein, there is “significant overlap” if an object within the field of view FOV is also within the listening zone Z and an object outside the field of view FOV is also outside the listening zone Z. It is noted that the foregoing definitions of the terms “correspond” and “significant overlap” within the context of the embodiment depicted in FIGS. 1G-1H allow for the possibility that an object may be within the listening zone Z and outside the field of view FOV.
  • The listening zone Z may be pre-calibrated as described above, e.g., by adjusting FIR filter coefficients and TDA values for the filters F0, F1, F2, F3 using one or more known sources placed at various locations within the field of view FOV during the calibration stage. The FIR filter coefficients and TDA values are selected (e.g., using ICA) such that sounds from a source 104 located within the FOV are detected and sounds from a source 106 outside the FOV are filtered out. The apparatus 140 allows for improved processing of video and audio images. By pre-calibrating a listening zone Z to correspond to the field of view FOV of the image capture unit 142 sounds originating from sources within the FOV may be enhanced while those originating outside the FOV may be attenuated. Applications for such an apparatus include audio-video (AV) chat.
  • Although only a single pre-calibrated listening sector is depicted in FIGS. 1G-1H, embodiments of the present invention may use multiple pre-calibrated listening sectors in conjunction with a camera. For example, FIGS. 1I-1J depict an apparatus 150 having a microphone array 102 and an image capture unit 152 (e.g., a digital camera) that is mounted to one or more pointing actuators 154 (e.g., servo-motors). The microphone array 102, image capture unit 152 and actuators may be coupled to a controller 156 having a processor 157 and memory 158. Software data 155 stored in the memory 158 and instructions 159 stored in the memory 158 and executed by the processor 157 may implement the signal filter functions described above. The software data may include FIR filter coefficients and TDA values that correspond to a set of pre-calibrated listening zones, e.g., nine wedge-shaped sectors S0 . . . S8 of twenty degrees each covering a 180 degree region in front of the microphone array 102. The pointing actuators 154 may point the image capture unit 152 in a viewing direction in response to signals generated by the processor 157. In embodiments of the present invention a listening zone containing a sound source 104 may be determined, e.g., as described above with respect to FIGS. 1C-1D. Once the sector containing the sound source 104 has been determined, the actuators 154 may point the image capture unit 152 in a direction of the particular pre-calibrated listening zone containing the sound source 104 as shown in FIG. 1J. The microphone array 102 may remain in a fixed position while the pointing actuators point the camera in the direction of a selected listening zone.
  • Part of the preceding discussion refers to filtering of the input signals xm(t) from the microphones M0 . . . M3 with the filters F0 . . . F3 to produce an output signal y(t). By way of example, and without limitation, such filtering may proceed as discussed below with respect to FIGS. 2-3. FIG. 2 depicts a system 200 having microphone array 102 of M+1 microphones M0, Each microphone is connected to one of M+1 corresponding filters 202 0, 204_, 202 m. Each of the filters 202 0, 202 1, . . . , 202 m includes a corresponding set of N+1 filter taps 204 00, . . . , 204 0N, 204 10, . . . , 204 1N, 204 M0, . . . , 204 MN. Each filter tap 204 mi includes a finite impulse response filter bmi, where m=0 . . . M, i=0 . . . N. Except for the first filter tap 204 m0 in each filter 202 m, the filter taps 204 mi also include delays indicated by z-transforms Z−1. Each delay section introduces a unit integer delay to the input signal xm(t). The delays and filter taps may be implemented in hardware or software or a combination of both hardware and software. Each filter 202 m produces a corresponding output ym(t), which may be regarded as the components of a combined output y(t) of the filters 202 m. Fractional delays may be applied to each of the output signals ym(t) as follows.
  • An output ym(t) from a given filter tap 204 mi is just the convolution of the input signal to filter tap 204 mi with the corresponding finite impulse response coefficient bmi. It is noted that for all filter taps 204 mi except for the first one 204 mo the input to the filter tap is just the output of the delay section z−1 of the preceding filter tap 204 mi-1. The input signal from the microphones in the array 102 may be represented as an M+1-dimensional vector: x(t)=(x0(t), xi(t), . . . , xM(t)), where M+1 is the number of microphones in the array.
  • Thus, the output of a given filter 202 m may be represented by: ym(t)=xm(t)*b0+xm(t−1)*bm1+xm(t−2)*bm2+ . . . +xm(t−N)bmN. Where the symbol “*” represents the convolution operation. Convolution between two discrete time functions f(t) and g(t) is defined as
  • ( f * g ) ( t ) = n f ( n ) g ( t - n ) .
  • The general problem in audio signal processing is to select the values of the finite impulse response filter coefficients bm0, bm1, . . . , bmN that best separate out different sources of sound from the signal ym(t).
  • If the signals xm(t) and ym(t) are discrete time signals each delay z−1 is necessarily an integer delay and the size of the delay is inversely related to the maximum frequency of the microphone. This ordinarily limits the resolution of the system 200. A higher than normal resolution may be obtained if it is possible to introduce a fractional time delay A into the signal ym(t) so that:

  • y m(t+Δ)=x m(t+Δ)*b m0 +x m(t−1+Δ)*b m1 +x m(t−2+Δ)*b m2 + . . . +x m(t−N+Δ)b mN,
  • where Δ is between zero and ±1. In embodiments of the present invention, a fractional delay, or its equivalent, may be obtained as follows. First, the signal xm(t) is delayed by j samples. each of the finite impulse response filter coefficients bmi (where i=0, 1, . . . N) may be represented as a (J+1)-dimensional column vector
  • b m i = [ b m i 0 b m i 1 b m i J ]
  • and y(t) may be rewritten as:
  • y m ( t ) = [ x m ( t ) x m ( t - 1 ) x m ( t - J ) ] T * [ b m 00 b m 01 b m 0 j ] + [ x m ( t - 1 ) x m ( t - 2 ) x m ( t - J - 1 ) ] T * [ b m 10 b m 11 b m 1 J ] + + [ x m ( t - N - J ) x m ( t - N - J + 1 ) x m ( t - N ) ] T * [ b mN 0 b mN 1 b mnJ ]
  • When ym(t) is represented in the form shown above one can interpolate the value of ym(t) for any factional value of t=t+Δ. Specifically, three values of ym(t) can be used in a polynomial interpolation. The expected statistical precision of the fractional value Δ is inversely proportional to J+1, which is the number of “rows” in the immediately preceding expression for ym(t).
  • The quantity t+Δ may be regarded as a mathematical abstract to explain the idea in time-domain. In practice, one need not estimate the exact “t+Δ”. Instead, the signal ym(t) may be transformed into the frequency-domain, so there is no such explicit “t+Δ”. Instead an estimation of a frequency-domain function F(bi) is sufficient to provide the equivalent of a fractional delay Δ. The above equation for the time domain output signal ym(t) may be transformed from the time domain to the frequency domain, e.g., by taking a Fourier transform, and the resulting equation may be solved for the frequency domain output signal Ym(k). This is equivalent to performing a Fourier transform (e.g., with a fast Fourier transform (fft)) for J+1 frames where each frequency bin in the Fourier transform is a (J+1)×1 column vector. The number of frequency bins is equal to N+1.
  • The finite impulse response filter coefficients bmij for each row of the equation above may be determined by taking a Fourier transform of x(t) and determining the bmij through semi-blind source separation. Specifically, for each “row” of the above equation becomes:
  • X m 0 = FT ( x m ( t , t - 1 , , t - N ) ) = [ X 00 , X 01 , , X 0 N ] X m 1 = FT ( x m ( t - 1 , t - 2 , , t - ( N + 1 ) ) = [ X 10 , X 11 , , X 1 N ] X m J = FT ( x m ( t , t - 1 , , t - ( N + J ) ) ) = [ X J 0 , X J 1 , , X JN ]
  • where FT( ) represents th taking the Fourier transform of the quantity in parentheses.
  • For an array having M+1 microphones, the quantities Xmj are generally the components of (M+1)-dimensional vectors. By way of example, for a 4-channel microphone array, there are 4 input signals: x0(t), x1(t), x2(t), and x3(t). The 4-channel inputs xm(t) are transformed the frequency domain, and collected as a 1×4 vector “Xjk”. The outer product of the vector Xjk becomes a 4×4 matrix, the statistical average of this matrix becomes a “Covariance” matrix, which shows the correlation between every vector element.
  • By way of example, the four input signals x0(t), x1(t), x2(t) and x3(t) may be transformed into the frequency domain with J+1=10 blocks. Specifically:
  • For channel 0:

  • X 00=FT([x 0(t−0), x 0(t−1), x 0(t−2), . . . x 0(t−N−1+0)])

  • X 01=FT([x 0(t−1), x 0(t−2), x 0(t−3), . . . x 0(t−N−1+1)])

  • X 09=FT([x 0(t−9), x 0(t−10) x 0(t−2), . . . x 0(t−N−1+10)])
  • For channel 1:

  • X 01=FT([x 1(t−0), x 1(t−1), x 1(t−2), . . . x 1(t−N−1+0)])

  • X 11=FT([x 1(t−1), x 1(t−2), x 1(t−3), . . . x 1(t−N−1+1)])

  • X 19=FT([x 1(t−9), x 1(t−10) x 1(t−2), . . . x 1(t−N−1+10)])
  • For channel 2:

  • X 20=FT([x 2(t−0), x 2(t−1), x 2(t−2), . . . x 2(t−N−1+0)])

  • X 21=FT([x 2(t−1), x 2(t−2), x 2(t−3), . . . x 2(t−N−1+1)])

  • X 29=FT([x 2(t−9), x 2(t−10) x 2(t−2), . . . x 2(t−N−1+10)])
  • For channel 3:

  • X 30=FT([x 3(t−0), x 3(t−1), x 3(t−2), . . . x 3(t−N−1+0)])

  • X 31=FT([x 3(t−1), x 3(t−2), x 3(t−3), . . . x 3(t−N−1+1)])

  • X 39=FT([x 3(t−9), x 3(t−10) x 3(t−2), . . . x 3(t−N−1+10)])
  • By way of example 10 frames may be used to construct a fractional delay. For every frame j, where j=0:9, for every frequency bin <k>, where n=0:N−1, one can construct a 1×4 vector:

  • X jk =[X 0j(k), hd 1j(k), X 2j(k), X 3j(k)]
  • the vector Xjk is fed into the SBSS algorithm to find the filter coefficients bjn. The SBSS algorithm is an independent component analysis (ICA) based on 2nd-order independence, but the mixing matrix A (e.g., a 4×4 matrix for 4-mic-array) is replaced with 4×1 mixing weight vector bjk, which is a diagonal of A1=A*C−1 (i.e., bjk =Diagonal(A1)), where C−1 is the inverse eigenmatrix obtained from the calibration procedure described above. It is noted that the frequency domain calibration signal vectors X′jk may be generated as described in the preceding discussion.
  • The mixing matrix A may be approximated by a runtime covariance matrix Cov(j,k)=E((Xjk)T*Xjk), where E refers to the operation of determining the expectation value and (Xjk)T is the transpose of the vector Xjk. The components of each vector bjk are the corresponding filter coefficients for each frame j and each frequency bin k, i.e.,

  • b jk =[b 0j(k), b 1j(k), b 2j(k), b 3j(k)].
  • The independent frequency-domain components of the individual sound sources making up each vector Xjk may be determined from:

  • S(j,k)T =b jk −1 ·X jk=[(b 0j(k))−1 X 0j(k), (b 1j(k))−1 X 1j(k), (b 2j(k))−1 X 2j(k), (b 3j(k))−1 X 3j(k)]
  • where each S(j,k)T is a 1×4 vector containing the independent frequency-domain components of the original input signal x(t).
  • The ICA algorithm is based on “Covariance” independence, in the microphone array 102. It is assumed that there are always M+1 independent components (sound sources) and that their 2nd-order statistics are independent. In other words, the cross-correlations between the signals x0(t), x1(t), x2(t) and x3(t) should be zero. As a result, the non-diagonal elements in the covariance matrix Cov(j,k) should be zero as well.
  • By contrast, if one considers the problem inversely, if it is known that there are M+1 signal sources one can also determine their cross-correlation “covariance matrix”, by finding a matrix A that can de-correlate the cross-correlation, i.e., the matrix A can make the covariance matrix Cov(j,k) diagonal (all non-diagonal elements equal to zero), then A is the “unmixing matrix” that holds the recipe to separate out the 4 sources.
  • Because solving for “unmixing matrix A” is an “inverse problem”, it is actually very complicated, and there is normally no deterministic mathematical solution for A. Instead an initial guess of A is made, then for each signal vector xm(t) (m=0,1 . . . M), A is adaptively updated in small amounts (called adaptation step size). In the case of a four-microphone array, the adaptation of A normally involves determining the inverse of a 4×4 matrix in the original ICA algorithm. Hopefully, adapted A will converge toward the true A. According to embodiments of the present invention, through the use of semi-blind-source-separation, the unmixing matrix A becomes a vector A1, since it is has already been decorrelated by the inverse eigenmatrix C−1 which is the result of the prior calibration described above.
  • Multiplying the run-time covariance matrix Cov(j,k) with the pre-calibrated inverse eigenmatrix C−1 essentially picks up the diagonal elements of A and makes them into a vector A1. Each element of A1 is the strongest -cross-correlation, the inverse of A will essentially remove this correlation. Thus, embodiments of the present invention simplify the conventional ICA adaptation procedure, in each update, the inverse of A becomes a vector inverse b−1. It is noted that computing a matrix inverse has N-cubic complexity, while computing a vector inverse has N-linear complexity. Specifically, for the case of N=4, the matrix inverse computation requires 64times more computation that the vector inverse computation.
  • Also, by cutting a (M+1)×(M+1) matrix to a (M+1)×1 vector, the adaptation becomes much more robust, because it requires much fewer parameters and has considerably less problems with numeric stability, referred to mathematically as “degree of freedom”. Since SBSS reduces the number of degrees of freedom by (M+1) times, the adaptation convergence becomes faster. This is highly desirable since, in real world acoustic environment, sound sources keep changing, i.e., the unmixing matrix A changes very fast. The adaptation of A has to be fast enough to track this change and converge to its true value in real-time. If instead of SBSS one uses a conventional ICA-based BSS algorithm, it is almost impossible to build a real-time application with an array of more than two microphones. Although some simple microphone arrays that use BSS, most, if not all, use only two microphones, and no 4 microphone array truly BSS system can run in real-time on presently available computing platforms.
  • The frequency domain output Y(k) may be expressed as an N+1 dimensional vector Y=[Y0, Y1, . . . ,YN], where each component Yi may be calculated by:
  • Y i = [ X i 0 X i 1 X iJ ] · [ b i 0 b i 1 b iJ ]
  • Each component Yi may be normalized to achieve a unit response for the filters.
  • Y i = Y i j = 0 J ( b ij ) 2
  • Although in embodiments of the invention N and J may take on any values, it has been shown in practice that N=511 and J=9 provides a desirable level of resolution, e.g., about 1/10 of a wavelength for an array containing 16 kHz microphones.
  • Signal processing methods that utilize various combinations of the above-described concepts may be implemented in embodiments of the present invention. For example, FIG. 3 depicts a flow diagram of a signal processing method 300 that utilizes the concepts described above with respect to FIG. 2. In the method 300 a discrete time domain input signal xm(t) may be produced from microphones M0 . . . MM as indicated at 302. A listening direction may be determined for the microphone array as indicated at 304, e.g., by computing an inverse eigenmatrix C−1 for a calibration covariance matrix as described above. As discussed above, the listening direction, e.g., one or more listening sectors, may be determined during calibration of the microphone array during design or manufacture or may be re-calibrated at runtime. Specifically, a signal from a source located within a defined listening sector with respect to the microphone array may be recorded for a predetermined period of time. Analysis frames of the signal may be formed at predetermined intervals and the analysis frames may be transformed into the frequency domain. A calibration covariance matrix may be estimated from a vector of the analysis frames that have been transformed into the frequency domain. An eigenmatrix C of the calibration covariance matrix may be computed and an inverse of the eigenmatrix provides the listening direction.
  • At 306, one or more fractional delays may optionally be applied to selected input signals xm(t) other than an input signal x0(t) from a reference microphone M0. Each fractional delay is selected to optimize a signal to noise ratio of a discrete time domain output signal y(t) from the microphone array. The fractional delays are selected to such that a signal from the reference microphone M0 is first in time relative to signals from the other microphone(s) of the array. At 308 a fractional time delay A may optionally be introduced into the output signal y(t) so that: y(t+Δ)=x(t+Δ)*b0+x(t−1+Δ)*b1+x(t−2+Δ)*b2+ . . . +x(t−N+Δ)bN, where Δ is between zero and ±1. The fractional delay may be introduced as described above with respect to FIG. 2. Specifically, each time domain input signal xm(t) may be delayed by j+1 frames and the resulting delayed input signals may be transformed to a frequency domain to produce a frequency domain input signal vector Xjk for each of k=0:N frequency bins.
  • At 310 the listening direction (e.g., the inverse eigenmatrix C−1) determined at 304 is used in a semi-blind source separation to select the finite impulse response filter coefficients b0, b1 . . . , bN to separate out different sound sources from input signal xm(t). Specifically, filter coefficients for each microphone m, each frame j and each frequency bin k, [b0j(k), b1j(k), . . . bMj(k)] may be computed that best separate out two or more sources of sound from the input signals xm(t). Specifically, a runtime covariance matrix may be generated from each frequency domain input signal vector Xjk. The runtime covariance matrix may be multiplied by the inverse C−1 of the eigenmatrix C to produce a mixing matrix A and a mixing vector may be obtained from a diagonal of the mixing matrix A. The values of filter coefficients may be determined from one or more components of the mixing vector.
  • According to embodiments of the present invention, a signal processing method of the type described above with respect to FIGS. 1A-1J, 2 and 3 operating as described above may be implemented as part of a signal processing apparatus 400, as depicted in FIG. 4. The apparatus 400 may include a processor 401 and a memory 402 (e.g., RAM, DRAM, ROM, and the like). In addition, the signal processing apparatus 400 may have multiple processors 401 if parallel processing is to be implemented. The memory 402 includes data and code configured as described above. Specifically, the memory 402 may include signal data 406 which may include a digital representation of the input signals xm(t), and code and/or data implementing the filters 202 0 . . . 202 M with corresponding filter taps 204 mi having delays z−1 and finite impulse response filter coefficients bmi as described above. The memory 402 may also contain calibration data 408, e.g., data representing one or more inverse eigenmatrices C−1 for one or more corresponding pre-calibrated listening zones obtained from calibration of a microphone array 422 as described above. By way of example the memory 402 may contain eignematrices for eighteen 20 degree sectors that encompass a microphone array 422.
  • The apparatus 400 may also include well-known support functions 410, such as input/output (I/O) elements 411, power supplies (P/S) 412, a clock (CLK) 413 and cache 414. The apparatus 400 may optionally include a mass storage device 415 such as a disk drive, CD-ROM drive, tape drive, or the like to store programs and/or data. The controller may also optionally include a display unit 416 and user interface unit 418 to facilitate interaction between the controller 400 and a user. The display unit 416 may be in the form of a cathode ray tube (CRT) or flat panel screen that displays text, numerals, graphical symbols or images. The user interface 418 may include a keyboard, mouse, joystick, light pen or other device. In addition, the user interface 418 may include a microphone, video camera or other signal transducing device to provide for direct capture of a signal to be analyzed. The processor 401, memory 402 and other components of the system 400 may exchange signals (e.g., code instructions and data) with each other via a system bus 420 as shown in FIG. 4.
  • The microphone array 422 may be coupled to the apparatus 400 through the I/O functions 411. The microphone array may include between about 2 and about 8 microphones, preferably about 4 microphones with neighboring microphones separated by a distance of less than about 4 centimeters, preferably between about 1 centimeter and about 2 centimeters. Preferably, the microphones in the array 422 are omni-directional microphones. An optional image capture unit 423 (e.g., a digital camera) may be coupled to the apparatus 400 through the I/O functions 411. One or more pointing actuators 425 that are mechanically coupled to the camera may exchange signals with the processor 401 via the I/O functions 411.
  • As used herein, the term I/O generally refers to any program, operation or device that transfers data to or from the system 400 and to or from a peripheral device. Every data transfer may be regarded as an output from one device and an input into another. Peripheral devices include input-only devices, such as keyboards and mouses, output-only devices, such as printers as well as devices such as a writable CD-ROM that can act as both an input and an output device. The term “peripheral device” includes external devices, such as a mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive or scanner as well as internal devices, such as a CD-ROM drive, CD-R drive or internal modem or other peripheral such as a flash memory reader/writer, hard drive.
  • In certain embodiments of the invention, the apparatus 400 may be a video game unit, which may include a game controller 430 coupled to the processor via the I/O functions 411 either through wires (e.g., a USB cable) or wirelessly. In some embodiments the game controller 430 may be mountable to a user's body. The game controller 430 may have analog joystick controls 431 and conventional buttons 433 that provide control signals commonly used during playing of video games. Such video games may be implemented as processor readable data and/or instructions which may be stored in the memory 402 or other processor readable medium such as one associated with the mass storage device 415.
  • The joystick controls 431 may generally be configured so that moving a control stick left or right signals movement along the X axis, and moving it forward (up) or back (down) signals movement along the Y axis. In joysticks that are configured for three-dimensional movement, twisting the stick left (counter-clockwise) or right (clockwise) may signal movement along the Z axis. These three axis—X Y and Z—are often referred to as roll, pitch, and yaw, respectively, particularly in relation to an aircraft.
  • The game controller 430 may include a communications interface operable to conduct digital communications with at least one of the processor 402, a game controller 430 or both. The communications interface may include a universal asynchronous receiver transmitter (“UART”). The UART may be operable to receive a control signal for controlling an operation of a tracking device, or for transmitting a signal from the tracking device for communication with another device. Alternatively, the communications interface includes a universal serial bus (“USB”) controller. The USB controller may be operable to receive a control signal for controlling an operation of the tracking device, or for transmitting a signal from the tracking device for communication with another device.
  • In addition, the game controller 430 may include one or more inertial sensors 432, which may provide position and/or orientation information to the processor 401 via an inertial signal. Orientation information may include angular information such as a tilt, roll or yaw of the game controller 430. By way of example, the inertial sensors 432 may include any number and/or combination of accelerometers, gyroscopes or tilt sensors. In a preferred embodiment, the inertial sensors 432 include tilt sensors adapted to sense orientation of the game controller with respect to tilt and roll axes, a first accelerometer adapted to sense acceleration along a yaw axis and a second accelerometer adapted to sense angular acceleration with respect to the yaw axis. An accelerometer may be implemented, e.g., as a MEMS device including a mass mounted by one or more springs with sensors for sensing displacement of the mass relative to one or more directions. Signals from the sensors that are dependent on the displacement of the mass may be used to determine an acceleration of the game controller 430. Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401.
  • By way of example an accelerometer suitable as the inertial sensor 432 may be a simple mass elastically coupled at three or four points to a frame, e.g., by springs. Pitch and roll axes lie in a plane that intersects the frame, which is mounted to the game controller 430. As the frame (and the game controller 430) rotates about pitch and roll axes the mass will displace under the influence of gravity and the springs will elongate or compress in a way that depends on the angle of pitch and/or roll. The displacement and of the mass can be sensed and converted to a signal that is dependent on the amount of pitch and/or roll. Angular acceleration about the yaw axis or linear acceleration along the yaw axis may also produce characteristic patterns of compression and/or elongation of the springs or motion of the mass that can be sensed and converted to signals that are dependent on the amount of angular or linear acceleration. Such an accelerometer device can measure tilt, roll angular acceleration about the yaw axis and linear acceleration along the yaw axis by tracking movement of the mass or compression and expansion forces of the springs. There are a number of different ways to track the position of the mass and/or or the forces exerted on it, including resistive strain gauge material, photonic sensors, magnetic sensors, hall-effect devices, piezoelectric devices, capacitive sensors, and the like.
  • In addition, the game controller 430 may include one or more light sources 434, such as light emitting diodes (LEDs). The light sources 434 may be used to distinguish one controller from the other. For example one or more LEDs can accomplish this by flashing or holding an LED pattern code. By way of example, 5 LEDs can be provided on the game controller 430 in a linear or two-dimensional pattern. Although a linear array of LEDs is preferred, the LEDs may alternatively, be arranged in a rectangular pattern or an arcuate pattern to facilitate determination of an image plane of the LED array when analyzing an image of the LED pattern obtained by the image capture unit 423. Furthermore, the LED pattern codes may also be used to determine the positioning of the game controller 430 during game play. For instance, the LEDs can assist in identifying tilt, yaw and roll of the controllers. This detection pattern can assist in providing a better user/feel in games, such as aircraft flying games, etc. The image capture unit 423 may capture images containing the game controller 430 and light sources 434. Analysis of such images can determine the location and/or orientation of the game controller. Such analysis may be implemented by program code instructions 404 stored in the memory 402 and executed by the processor 401. To facilitate capture of images of the light sources 434 by the image capture unit 423, the light sources 434 may be placed on two or more different sides of the game controller 430, e.g., on the front and on the back (as shown in phantom). Such placement allows the image capture unit 423 to obtain images of the light sources 434 for different orientations of the game controller 430 depending on how the game controller 430 is held by a user.
  • In addition the light sources 434 may provide telemetry signals to the processor 401, e.g., in pulse code, amplitude modulation or frequency modulation format. Such telemetry signals may indicate which joystick buttons are being pressed and/or how hard such buttons are being pressed. Telemetry signals may be encoded into the optical signal, e.g., by pulse coding, pulse width modulation, frequency modulation or light intensity (amplitude) modulation. The processor 401 may decode the telemetry signal from the optical signal and execute a game command in response to the decoded telemetry signal. Telemetry signals may be decoded from analysis of images of the game controller 430 obtained by the image capture unit 423. Alternatively, the apparatus 401 may include a separate optical sensor dedicated to receiving telemetry signals from the lights sources 434. The use of LEDs in conjunction with determining an intensity amount in interfacing with a computer program is described, e.g., in commonly-assigned U.S. patent application Ser. No. 11/429,414, to Richard L. Marks et al., entitled “USE OF COMPUTER IMAGE AND AUDIO PROCESSING IN DETERMINING AN INTENSITY AMOUNT WHEN INTERFACING WITH A COMPUTER PROGRAM” (Attorney Docket No. SONYP052), which is incorporated herein by reference in its entirety. In addition, analysis of images containing the light sources 434 may be used for both telemetry and determining the position and/or orientation of the game controller 430. Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401.
  • The processor 401 may use the inertial signals from the inertial sensor 432 in conjunction with optical signals from light sources 434 detected by the image capture unit 423 and/or sound source location and characterization information from acoustic signals detected by the microphone array 422 to deduce information on the location and/or orientation of the game controller 430 and/or its user. For example, “acoustic radar” sound source location and characterization may be used in conjunction with the microphone array 422 to track a moving voice while motion of the game controller is independently tracked (through the inertial sensor 432 and or light sources 434). Any number of different combinations of different modes of providing control signals to the processor 401 may be used in conjunction with embodiments of the present invention. Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401.
  • Signals from the inertial sensor 432 may provide part of a tracking information input and signals generated from the image capture unit 423 from tracking the one or more light sources 434 may provide another part of the tracking information input. By way of example, and without limitation, such “mixed mode” signals may be used in a football type video game in which a Quarterback pitches the ball to the right after a head fake head movement to the left. Specifically, a game player holding the controller 430 may turn his head to the left and make a sound while making a pitch movement swinging the controller out to the right like it was the football. The microphone array 420 in conjunction with “acoustic radar” program code can track the user's voice. The image capture unit 423 can track the motion of the user's head or track other commands that do not require sound or use of the controller. The sensor 432 may track the motion of the game controller (representing the football). The image capture unit 423 may also track the light sources 434 on the controller 430. The user may release of the “ball” upon reaching a certain amount and/or direction of acceleration of the game controller 430 or upon a key command triggered by pressing a button on the game controller 430.
  • In certain embodiments of the present invention, an inertial signal, e.g., from an accelerometer or gyroscope may be used to determine a location of the game controller 430. Specifically, an acceleration signal from an accelerometer may be integrated once with respect to time to determine a change in velocity and the velocity may be integrated with respect to time to determine a change in position. If values of the initial position and velocity at some time are known then the absolute position may be determined using these values and the changes in velocity and position. Although position determination using an inertial sensor may be made more quickly than using the image capture unit 423 and light sources 434 the inertial sensor 432 may be subject to a type of error known as “drift” in which errors that accumulate over time can lead to a discrepancy D between the position of the joystick 430 calculated from the inertial signal (shown in phantom) and the actual position of the game controller 430. Embodiments of the present invention allow a number of ways to deal with such errors.
  • For example, the drift may be cancelled out manually by re-setting the initial position of the game controller 430 to be equal to the current calculated position. A user may use one or more of the buttons on the game controller 430 to trigger a command to re-set the initial position. Alternatively, image-based drift may be implemented by re-setting the current position to a position determined from an image obtained from the image capture unit 423 as a reference. Such image-based drift compensation may be implemented manually, e.g., when the user triggers one or more of the buttons on the game controller 430. Alternatively, image-based drift compensation may be implemented automatically, e.g., at regular intervals of time or in response to game play. Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401.
  • In certain embodiments it may be desirable to compensate for spurious data in the inertial sensor signal. For example the signal from the inertial sensor 432 may be oversampled and a sliding average may be computed from the oversampled signal to remove spurious data from the inertial sensor signal. In some situations it may be desirable to oversample the signal and reject a high and/or low value from some subset of data points and compute the sliding average from the remaining data points. Furthermore, other data sampling and manipulation techniques may be used to adjust the signal from the inertial sensor to remove or reduce the significance of spurious data. The choice of technique may depend on the nature of the signal, computations to be performed with the signal, the nature of game play or some combination of two or more of these. Such techniques may be implemented by program code instructions 404 which may be stored in the memory 402 and executed by the processor 401.
  • The processor 401 may perform digital signal processing on signal data 406 as described above in response to the data 406 and program code instructions of a program 404 stored and retrieved by the memory 402 and executed by the processor module 401. Code portions of the program 404 may conform to any one of a number of different programming languages such as Assembly, C++, JAVA or a number of other languages. The processor module 401 forms a general-purpose computer that becomes a specific purpose computer when executing programs such as the program code 404. Although the program code 404 is described herein as being implemented in software and executed upon a general purpose computer, those skilled in the art will realize that the method of task management could alternatively be implemented using hardware such as an application specific integrated circuit (ASIC) or other hardware circuitry. As such, it should be understood that embodiments of the invention can be implemented, in whole or in part, in software, hardware or some combination of both.
  • In one embodiment, among others, the program code 404 may include a set of processor readable instructions that implement a method having features in common with the method 110 of FIG. 1B, the method 120 of FIG. 1D, the method 140 of FIG. 1F, the method 300 of FIG. 3 or some combination of two or more of these. The program code 404 may generally include one or more instructions that direct the one or more processors to select a pre-calibrated listening zone at runtime and filter out sounds originating from sources outside the pre-calibrated listening zone. The pre-calibrated listening zones may include a listening zone that corresponds to a volume of focus or field of view of the image capture unit 423.
  • The program code may include one or more instructions which, when executed, cause the apparatus 400 to select a pre-calibrated listening sector that contains a source of sound. Such instructions may cause the apparatus to determine whether a source of sound lies within an initial sector or on a particular side of the initial sector. If the source of sound does not lie within the default sector, the instructions may, when executed, select a different sector on the particular side of the default sector. The different sector may be characterized by an attenuation of the input signals that is closest to an optimum value. These instructions may, when executed, calculate an attenuation of input signals from the microphone array 422 and the attenuation to an optimum value. The instructions may, when executed, cause the apparatus 400 to determine a value of an attenuation of the input signals for one or more sectors and select a sector for which the attenuation is closest to an optimum value.
  • The program code 404 may optionally include one or more instructions that direct the one or more processors to produce a discrete time domain input signal xm(t) from the microphones M0 . . . MM, determine a listening sector, and use the listening sector in a semi-blind source separation to select the finite impulse response filter coefficients to separate out different sound sources from input signal xm(t). The program 404 may also include instructions to apply one or more fractional delays to selected input signals xm(t) other than an input signal x0(t) from a reference microphone M0. Each fractional delay may be selected to optimize a signal to noise ratio of a discrete time domain output signal y(t) from the microphone array. The fractional delays may be selected to such that a signal from the reference microphone M0 is first in time relative to signals from the other microphone(s) of the array. The program 404 may also include instructions to introduce a fractional time delay A into an output signal y(t) of the microphone array so that: y(t+Δ)=x(t+Δ)*b0+x(t−1+Δ)*b1+x(t−2+Δ)*b2+ . . . +x(t−N+Δ)bN, where Δ is between zero and ±1.
  • The program code 404 may optionally include processor executable instructions including one or more instructions which, when executed cause the image capture unit 423 to monitor a field of view in front of the image capture unit 423, identify one or more of the light sources 434 within the field of view, detect a change in light emitted from the light source(s) 434; and in response to detecting the change, triggering an input command to the processor 401. The use of LEDs in conjunction with an image capture device to trigger actions in a game controller is described e.g., in commonly-assigned, U.S. patent application Ser. No. 10/759,782 to Richard L. Marks, filed Jan. 16, 2004 and entitled: METHOD AND APPARATUS FOR LIGHT INPUT DEVICE, which is incorporated herein by reference in its entirety.
  • The program code 404 may optionally include processor executable instructions including one or more instructions which, when executed, use signals from the inertial sensor and signals generated from the image capture unit from tracking the one or more light sources as inputs to a game system, e.g., as described above. The program code 404 may optionally include processor executable instructions including one or more instructions which, when executed compensate for drift in the inertial sensor 432.
  • In addition, the program code 404 may optionally include processor executable instructions including one or more instructions which, when executed adjust the gearing and mapping of controller manipulations to game a environment. Such a feature allows a user to change the “gearing” of manipulations of the game controller 430 to game state. For example, a 45 degree rotation of the game controller 430 may be geared to a 45 degree rotation of a game object. However this 1:1 gearing ratio may be modified so that an X degree rotation (or tilt or yaw or “manipulation”) of the controller translates to a Y rotation (or tilt or yaw or “manipulation”) of the game object. Gearing may be 1:1 ratio, 1:2 ratio, 1:X ratio or X:Y ratio, where X and Y can take on arbitrary values. Additionally, mapping of input channel to game control may also be modified over time or instantly. Modifications may comprise changing gesture trajectory models, modifying the location, scale, threshold of gestures, etc. Such mapping may be programmed, random, tiered, staggered, etc., to provide a user with a dynamic range of manipulatives. Modification of the mapping, gearing or ratios can be adjusted by the program code 404 according to game play, game state, through a user modifier button (key pad, etc.) located on the game controller 430, or broadly in response to the input channel. The input channel may include, but may not be limited to elements of user audio, audio generated by controller, tracking audio generated by the controller, controller button state, video camera output, controller telemetry data, including accelerometer data, tilt, yaw, roll, position, acceleration and any other data from sensors capable of tracking a user or the user manipulation of an object.
  • In certain embodiments the program code 404 may change the mapping or gearing over time from one scheme or ratio to another scheme, respectively, in a predetermined time-dependent manner. Gearing and mapping changes can be applied to a game environment in various ways. In one example, a video game character may be controlled under one gearing scheme when the character is healthy and as the character's health deteriorates the system may gear the controller commands so the user is forced to exacerbate the movements of the controller to gesture commands to the character. A video game character who becomes disoriented may force a change of mapping of the input channel as users, for example, may be required to adjust input to regain control of the character under a new mapping. Mapping schemes that modify the translation of the input channel to game commands may also change during gameplay. This translation may occur in various ways in response to game state or in response to modifier commands issued under one or more elements of the input channel. Gearing and mapping may also be configured to influence the configuration and/or processing of one or more elements of the input channel.
  • In addition, a sound emitter 436, e.g., a speaker, a buzzer, a horn or a pipe, may be mounted to the game controller 430. In certain embodiments the sound emitter may be detachably mounted to a “body” of the game controller 430. In “acoustic radar” embodiments wherein the program code 404 locates and characterizes sounds detected with the microphone array 422, the sound emitter 436 may provide an audio signal that can be detected by the microphone array 422 and used by the program code 404 to track the position of the game controller 430. The sound emitter 436 may also be used to provide an additional “input channel” from the game controller 430 to the processor 401. Audio signals from the sound emitter 436 may be periodically pulsed to provide a beacon for the acoustic radar to track location. The audio signals (pulsed or otherwise) may be audible or ultrasonic. The acoustic radar may track the user manipulation of the game controller 430 and where such manipulation tracking may include information about the position and orientation (e.g., pitch, roll or yaw angle) of the game controller 430. The pulses may be triggered at an appropriate duty cycle as one skilled in the art is capable of applying. Pulses may be initiated based on a control signal arbitrated from the system. The apparatus 400 (through the program code 404) may coordinate the dispatch of control signals amongst two or more game controllers 430 coupled to the processor 401 to assure that multiple controllers can be tracked.
  • By way of example, embodiments of the present invention may be implemented on parallel processing systems. Such parallel processing systems typically include two or more processor elements that are configured to execute parts of a program in parallel using separate processors. By way of example, and without limitation, FIG. 5 illustrates a type of cell processor 500 according to an embodiment of the present invention. The cell processor 500 may be used as the processor 401 of FIG. 4. In the example depicted in FIG. 5, the cell processor 500 includes a main memory 502, power processor element (PPE) 504, and a number of synergistic processor elements (SPEs) 506. In the example depicted in FIG. 5, the cell processor 500 includes a single PPE 504 and eight SPE 506. In such a configuration, seven of the SPE 506 may be used for parallel processing and one may be reserved as a back-up in case one of the other seven fails. A cell processor may alternatively include multiple groups of PPEs (PPE groups) and multiple groups of SPEs (SPE groups). In such a case, hardware resources can be shared between units within a group. However, the SPEs and PPEs must appear to software as independent elements. As such, embodiments of the present invention are not limited to use with the configuration shown in FIG. 5.
  • The main memory 502 typically includes both general-purpose and nonvolatile storage, as well as special-purpose hardware registers or arrays used for functions such as system configuration, data-transfer synchronization, memory-mapped I/O, and I/O subsystems. In embodiments of the present invention, a signal processing program 503 and a signal 509 may be resident in main memory 502. The signal processing program 503 may be configured as described with respect to FIG. 1B, 1D, 1F or 3 above or some combination of two or more of these. The signal processing program 503 may run on the PPE. The program 503 may be divided up into multiple signal processing tasks that can be executed on the SPEs and/or PPE.
  • By way of example, the PPE 504 may be a 64-bit PowerPC Processor Unit (PPU) with associated caches L1 and L2. The PPE 504 is a general-purpose processing unit, which can access system management resources (such as the memory-protection tables, for example). Hardware resources may be mapped explicitly to a real address space as seen by the PPE. Therefore, the PPE can address any of these resources directly by using an appropriate effective address value. A primary function of the PPE 504 is the management and allocation of tasks for the SPEs 506 in the cell processor 500.
  • Although only a single PPE is shown in FIG. 5, some cell processor implementations, such as cell broadband engine architecture (CBEA), the cell processor 500 may have multiple PPEs organized into PPE groups, of which there may be more than one. These PPE groups may share access to the main memory 502. Furthermore the cell processor 500 may include two or more groups SPEs. The SPE groups may also share access to the main memory 502. Such configurations are within the scope of the present invention.
  • Each SPE 506 is includes a synergistic processor unit (SPU) and its own local storage area LS. The local storage LS may include one or more separate areas of memory storage, each one associated with a specific SPU. Each SPU may be configured to only execute instructions (including data load and data store operations) from within its own associated local storage domain. In such a configuration, data transfers between the local storage LS and elsewhere in a system 500 may be performed by issuing direct memory access (DMA) commands from the memory flow controller (MFC) to transfer data to or from the local storage domain (of the individual SPE). The SPUs are less complex computational units than the PPE 504 in that they do not perform any system management functions. The SPU generally have a single instruction, multiple data (SIMD) capability and typically process data and initiate any required data transfers (subject to access properties set up by the PPE) in order to perform their allocated tasks. The purpose of the SPU is to enable applications that require a higher computational unit density and can effectively use the provided instruction set. A significant number of SPEs in a system managed by the PPE 504 allow for cost-effective processing over a wide range of applications.
  • Each SPE 506 may include a dedicated memory flow controller (MFC) that includes an associated memory management unit that can hold and process memory-protection and access-permission information. The MFC provides the primary method for data transfer, protection, and synchronization between main storage of the cell processor and the local storage of an SPE. An MFC command describes the transfer to be performed. Commands for transferring data are sometimes referred to as MFC direct memory access (DMA) commands (or MFC DMA commands).
  • Each MFC may support multiple DMA transfers at the same time and can maintain and process multiple MFC commands Each MFC DMA data transfer command request may involve both a local storage address (LSA) and an effective address (EA). The local storage address may directly address only the local storage area of its associated SPE. The effective address may have a more general application, e.g., it may be able to reference main storage, including all the SPE local storage areas, if they are aliased into the real address space.
  • To facilitate communication between the SPEs 506 and/or between the SPEs 506 and the PPE 504, the SPEs 506 and PPE 504 may include signal notification registers that are tied to signaling events. The PPE 504 and SPEs 506 may be coupled by a star topology in which the PPE 504 acts as a router to transmit messages to the SPEs 506. Alternatively, each SPE 506 and the PPE 504 may have a one-way signal notification register referred to as a mailbox. The mailbox can be used by an SPE 506 to host operating system (OS) synchronization.
  • The cell processor 500 may include an input/output (I/O) function 508 through which the cell processor 500 may interface with peripheral devices, such as a microphone array 512 and optional image capture unit 513. In addition an Element Interconnect Bus 510 may connect the various components listed above. Each SPE and the PPE can access the bus 510 through a bus interface units BIU. The cell processor 500 may also includes two controllers typically found in a processor: a Memory Interface Controller MIC that controls the flow of data between the bus 510 and the main memory 502, and a Bus Interface Controller BIC, which controls the flow of data between the I/O 508 and the bus 510. Although the requirements for the MIC, BIC, BIUs and bus 510 may vary widely for different implementations, those of skill in the art will be familiar their functions and circuits for implementing them.
  • The cell processor 500 may also include an internal interrupt controller IIC. The IIC component manages the priority of the interrupts presented to the PPE. The IIC allows interrupts from the other components the cell processor 500 to be handled without using a main system interrupt controller. The IIC may be regarded as a second level controller. The main system interrupt controller may handle interrupts originating external to the cell processor.
  • In embodiments of the present invention, certain computations, such as the fractional delays described above, may be performed in parallel using the PPE 504 and/or one or more of the SPE 506. Each fractional delay calculation may be run as one or more separate tasks that different SPE 506 may take as they become available.
  • Embodiments of the present invention may utilize arrays of between about 2 and about 8 microphones in an array characterized by a microphone spacing d between about 0.5 cm and about 2 cm. The microphones may have a dynamic range from about 120 Hz to about 16 kHz. It is noted that the introduction of fractional delays in the output signal y(t) as described above allows for much greater resolution in the source separation than would otherwise be possible with a digital processor limited to applying discrete integer time delays to the output signal. It is the introduction of such fractional time delays that allows embodiments of the present invention to achieve high resolution with such small microphone spacing and relatively inexpensive microphones. Embodiments of the invention may also be applied to ultrasonic position tracking by adding an ultrasonic emitter to the microphone array and tracking objects locations through analysis of the time delay of arrival of echoes of ultrasonic pulses from the emitter.
  • Although for the sake of example the drawings depict linear arrays of microphones embodiments of the invention are not limited to such configurations. Alternatively, three or more microphones may be arranged in a two-dimensional array, or four or more microphones may be arranged in a three-dimensional array. In one particular embodiment, a system based on 2-microphone array may be incorporated into a controller unit for a video game.
  • Signal processing systems of the present invention may use microphone arrays that are small enough to be utilized in portable hand-held devices such as cell phones personal digital assistants, video/digital cameras, and the like. In certain embodiments of the present invention increasing the number of microphones in the array has no beneficial effect and in some cases fewer microphones may work better than more. Specifically a four-microphone array has been observed to work better than an eight-microphone array.
  • Embodiments of the present invention may be used as presented herein or in combination with other user input mechanisms and notwithstanding mechanisms that track or profile the angular direction or volume of sound and/or mechanisms that track the position of the object actively or passively, mechanisms using machine vision, combinations thereof and where the object tracked may include ancillary controls or buttons that manipulate feedback to the system and where such feedback may include but is not limited light emission from light sources, sound distortion means, or other suitable transmitters and modulators as well as controls, buttons, pressure pad, etc. that may influence the transmission or modulation of the same, encode state, and/or transmit commands from or to a device, including devices that are tracked by the system and whether such devices are part of, interacting with or influencing a system used in connection with embodiments of the present invention.
  • Although embodiments of the present invention have been shown to operate with an entertainment console and controller such as in a video game unit it must be understood that other embodiments of the present invention clearly may be operable in a variety of uses, industries, apart from gaming and entertainment.
  • While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article “A”, or “An” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase “means for.”

Claims (15)

1. A tracking device for use in obtaining information for controlling an execution of a program by a processor, comprising:
a body;
a sound emitter operable to emit a sound;
an array of two or more microphones M0 . . . MM wherein each of the two or more microphones is adapted to produce a discrete time domain input signal xm(t) at a runtime;
one or more processors coupled to the array of two or more microphones; and
a memory coupled to the microphones and the processor, the memory having embodied therein a set of processor readable instructions, the processor readable instructions including:
one or more instructions for determining a listening direction of the sound source relative to the microphone array from the discrete time domain input signals xm(t);
one or more instructions for using the listening direction in a semi-blind source separation to select filtering functions that separate out sounds from the sound emitter from the discrete time domain input signals xm(t);
one or more instructions for identifying a listening zone containing the sound emitter from the filtering functions; and
one or more instructions for determining a location of the sound emitter from the identified listening zone.
2. The tracking device of claim 1, wherein the body is mountable to a controller, the controller including a controller body and at least one input device assembled with the controller body.
3. An apparatus including the tracking device and the controller as claimed in claim 2.
4. The tracking device of claim 1, wherein the body of the tracking device is mountable to a user's body.
5. The tracking device as claimed in claim 1, wherein the sound emitter includes at least one of a speaker, a buzzer, a horn or a pipe.
6. The tracking device of claim 1 wherein the program is a game program.
7. The tracking device of claim 6, wherein the processor is operable to register an input when the sounds from the sound emitter satisfy a predetermined criterion.
8. The tracking device of claim 7, wherein the sound emitter is operable to emit a sound specific to a particular user associated with the tracking device and the one or more processors are operable to register input specific to the particular user when the specific sound is emitted.
9. The tracking device of claim 7 or 8, wherein the predetermined criterion includes at least one criterion selected from the group consisting of pitch, audio pulse frequency, audio pulse shape, audio pulse duty cycle, beat frequency, or spectral signature including a spectral content characteristic of a distinctive sound.
10. The apparatus of claim 2 or 3, further comprising a communications interface operable to conduct digital communications with the one or more processors, the controller or both the processor and the controller.
11. The tracking device as claimed in claim 10, wherein the communications interface includes a universal asynchronous receiver transmitter (“UART”).
12. The tracking device of claim 11, wherein the UART is operable to perform at least one of receiving a control signal for controlling an operation of the tracking device, or for transmitting a signal from the tracking device for communication with another device.
13. The tracking device of claim 10, wherein the communications interface includes a universal serial bus (“USB”) controller.
14. The tracking device of claim 13, wherein the USB controller is operable to perform at least one of receiving a control signal for controlling an operation of the tracking device, or for transmitting a signal from the tracking device for communication with another device.
15. The tracking device of claim 1, wherein the listening direction is determined by
forming analysis frames of a pre-recorded a signal from a source located in a preferred known listening direction,
transforming the analysis frames into the frequency domain,
estimating a calibration covariance matrix from vectors formed from the analysis frames that have been transformed into the frequency domain,
computing an eigenmatrix of the calibration covariance matrix, and
computing an inverse of the eigenmatrix.
US12/891,633 2006-05-08 2010-09-27 Tracking device with sound emitter for use in obtaining information for controlling game program execution Abandoned US20110014981A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/891,633 US20110014981A1 (en) 2006-05-08 2010-09-27 Tracking device with sound emitter for use in obtaining information for controlling game program execution

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/382,256 US7803050B2 (en) 2002-07-27 2006-05-08 Tracking device with sound emitter for use in obtaining information for controlling game program execution
US12/891,633 US20110014981A1 (en) 2006-05-08 2010-09-27 Tracking device with sound emitter for use in obtaining information for controlling game program execution

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/382,256 Continuation US7803050B2 (en) 2002-07-27 2006-05-08 Tracking device with sound emitter for use in obtaining information for controlling game program execution

Publications (1)

Publication Number Publication Date
US20110014981A1 true US20110014981A1 (en) 2011-01-20

Family

ID=43478098

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/891,633 Abandoned US20110014981A1 (en) 2006-05-08 2010-09-27 Tracking device with sound emitter for use in obtaining information for controlling game program execution

Country Status (1)

Country Link
US (1) US20110014981A1 (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106672A1 (en) * 2007-10-18 2009-04-23 Sony Ericsson Mobile Communications Ab Virtual world avatar activity governed by person's real life activity
US20090279728A1 (en) * 2008-05-12 2009-11-12 Kun-Hong Hou Game controller with speaker
US20100300271A1 (en) * 2009-05-27 2010-12-02 Microsoft Corporation Detecting Beat Information Using a Diverse Set of Correlations
US20120183149A1 (en) * 2011-01-18 2012-07-19 Sony Corporation Sound signal processing apparatus, sound signal processing method, and program
US20120330594A1 (en) * 2011-06-22 2012-12-27 Samsung Electronics Co., Ltd. Method and apparatus for estimating 3d position and orientation through sensor fusion
US20130131836A1 (en) * 2011-11-21 2013-05-23 Microsoft Corporation System for controlling light enabled devices
US20130332156A1 (en) * 2012-06-11 2013-12-12 Apple Inc. Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device
US8855295B1 (en) * 2012-06-25 2014-10-07 Rawles Llc Acoustic echo cancellation using blind source separation
US20140372081A1 (en) * 2011-03-29 2014-12-18 Drexel University Real time artifact removal
US20150341735A1 (en) * 2014-05-26 2015-11-26 Canon Kabushiki Kaisha Sound source separation apparatus and sound source separation method
IT201700116138A1 (en) * 2017-10-16 2018-01-16 Salvatore Iannelli Electronic system for managing a sale procedure and related procedure
US10412456B2 (en) 2008-04-24 2019-09-10 Sony Interactive Entertainment, LLC Method and apparatus for real-time viewer interaction with a media presentation
US10568467B2 (en) 2014-10-02 2020-02-25 Conopco, Inc. Liquid dispenser with framed refill receiving bay
US11165853B2 (en) 2012-05-09 2021-11-02 Twilio Inc. System and method for managing media in a distributed communication network
US11171865B2 (en) 2016-02-04 2021-11-09 Twilio Inc. Systems and methods for providing secure network exchanged for a multitenant virtual private cloud
US11246013B2 (en) 2012-10-15 2022-02-08 Twilio Inc. System and method for triggering on platform usage
US11265392B2 (en) 2016-05-23 2022-03-01 Twilio Inc. System and method for a multi-channel notification service
US11265367B2 (en) 2015-05-14 2022-03-01 Twilio Inc. System and method for signaling through data storage
US11272325B2 (en) 2015-05-14 2022-03-08 Twilio Inc. System and method for communicating through multiple endpoints
US20220116700A1 (en) * 2019-01-09 2022-04-14 Hangzhou Taro Positioning Technology Co., Ltd. Directional sound capture using image-based object tracking
US11341092B2 (en) 2014-07-07 2022-05-24 Twilio Inc. Method and system for applying data retention policies in a computing platform
US11379275B2 (en) 2013-09-17 2022-07-05 Twilio Inc. System and method for tagging and tracking events of an application
US11399044B2 (en) 2011-05-23 2022-07-26 Twilio Inc. System and method for connecting a communication to a client
US11489961B2 (en) 2011-09-21 2022-11-01 Twilio Inc. System and method for determining and communicating presence information
US11539601B2 (en) 2013-09-17 2022-12-27 Twilio Inc. System and method for providing communication platform metadata
US11546471B2 (en) 2012-06-19 2023-01-03 Twilio Inc. System and method for queuing a communication session
US11575795B2 (en) 2008-04-02 2023-02-07 Twilio Inc. System and method for processing telephony sessions
US11627225B2 (en) 2016-05-23 2023-04-11 Twilio Inc. System and method for programmatic device connectivity
US11632471B2 (en) 2008-10-01 2023-04-18 Twilio Inc. Telephony web event system and method
US11637876B2 (en) 2013-03-14 2023-04-25 Twilio Inc. System and method for integrating session initiation protocol communication in a telecommunications platform
US11637934B2 (en) 2010-06-23 2023-04-25 Twilio Inc. System and method for monitoring account usage on a platform
US11637933B2 (en) 2009-10-07 2023-04-25 Twilio Inc. System and method for running a multi-module telephony application
US11653282B2 (en) 2014-04-17 2023-05-16 Twilio Inc. System and method for enabling multi-modal communication
US11785145B2 (en) 2009-03-02 2023-10-10 Twilio Inc. Method and system for a multitenancy telephone network

Citations (103)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4811243A (en) * 1984-04-06 1989-03-07 Racine Marsh V Computer aided coordinate digitizing system
US5018736A (en) * 1989-10-27 1991-05-28 Wakeman & Deforrest Corporation Interactive game system and method
US5113449A (en) * 1982-08-16 1992-05-12 Texas Instruments Incorporated Method and apparatus for altering voice characteristics of synthesized speech
US5128671A (en) * 1990-04-12 1992-07-07 Ltv Aerospace And Defense Company Control device having multiple degrees of freedom
US5168524A (en) * 1989-08-17 1992-12-01 Eliza Corporation Speech-recognition circuitry employing nonlinear processing, speech element modeling and phoneme estimation
US5214615A (en) * 1990-02-26 1993-05-25 Will Bauer Three-dimensional displacement of a body with computer interface
US5227985A (en) * 1991-08-19 1993-07-13 University Of Maryland Computer vision system for position monitoring in three dimensions using non-coplanar light sources attached to a monitored object
US5292871A (en) * 1991-10-24 1994-03-08 Hoechst Aktiengesellschaft Water-soluble compounds having a fiber-reactive group of the vinyl sulfone monoazo series bonded to an alkanoylamino of benzoylamino grouping, suitable as dyestuffs
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5335011A (en) * 1993-01-12 1994-08-02 Bell Communications Research, Inc. Sound localization system for teleconferencing using self-steering microphone arrays
US5394168A (en) * 1993-01-06 1995-02-28 Smith Engineering Dual-mode hand-held game controller
US5425130A (en) * 1990-07-11 1995-06-13 Lockheed Sanders, Inc. Apparatus for transforming voice using neural networks
US5485273A (en) * 1991-04-22 1996-01-16 Litton Systems, Inc. Ring laser gyroscope enhanced resolution system
US5534917A (en) * 1991-05-09 1996-07-09 Very Vivid, Inc. Video image based control system
US5611731A (en) * 1995-09-08 1997-03-18 Thrustmaster, Inc. Video pinball machine controller having an optical accelerometer for detecting slide and tilt
US5649021A (en) * 1995-06-07 1997-07-15 David Sarnoff Research Center, Inc. Method and system for object detection for instrument control
US5768415A (en) * 1995-09-08 1998-06-16 Lucent Technologies Inc. Apparatus and methods for performing electronic scene analysis and enhancement
US5878367A (en) * 1996-06-28 1999-03-02 Northrop Grumman Corporation Passive acoustic traffic monitoring system
US5900863A (en) * 1995-03-16 1999-05-04 Kabushiki Kaisha Toshiba Method and apparatus for controlling computer without touching input device
US5913727A (en) * 1995-06-02 1999-06-22 Ahdoot; Ned Interactive movement and contact simulation game
US5916024A (en) * 1986-03-10 1999-06-29 Response Reward Systems, L.C. System and method of playing games and rewarding successful players
US5917936A (en) * 1996-02-14 1999-06-29 Nec Corporation Object detecting system based on multiple-eye images
US5930383A (en) * 1996-09-24 1999-07-27 Netzer; Yishay Depth sensing camera systems and methods
US6009396A (en) * 1996-03-15 1999-12-28 Kabushiki Kaisha Toshiba Method and system for microphone array input type speech recognition using band-pass power distribution for sound source position/direction estimation
US6014167A (en) * 1996-01-26 2000-01-11 Sony Corporation Tracking apparatus and tracking method
US6014623A (en) * 1997-06-12 2000-01-11 United Microelectronics Corp. Method of encoding synthetic speech
US6022274A (en) * 1995-11-22 2000-02-08 Nintendo Co., Ltd. Video game system using memory module
US6057909A (en) * 1995-06-22 2000-05-02 3Dv Systems Ltd. Optical ranging camera
US6061055A (en) * 1997-03-21 2000-05-09 Autodesk, Inc. Method of tracking objects with an imaging device
US6069594A (en) * 1991-07-29 2000-05-30 Logitech, Inc. Computer input device with multiple switches using single line
US6075895A (en) * 1997-06-20 2000-06-13 Holoplex Methods and apparatus for gesture recognition based on templates
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system
US6100895A (en) * 1994-12-01 2000-08-08 Namco Ltd. Apparatus and method of image synthesization
US6157403A (en) * 1996-08-05 2000-12-05 Kabushiki Kaisha Toshiba Apparatus for detecting position of object capable of simultaneously detecting plural objects and detection method therefor
US6173059B1 (en) * 1998-04-24 2001-01-09 Gentner Communications Corporation Teleconferencing system with visual feedback
US6195104B1 (en) * 1997-12-23 2001-02-27 Philips Electronics North America Corp. System and method for permitting three-dimensional navigation through a virtual reality environment using camera-based gesture inputs
US6243491B1 (en) * 1996-12-31 2001-06-05 Lucent Technologies Inc. Methods and apparatus for controlling a video system with visually recognized props
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US6346929B1 (en) * 1994-04-22 2002-02-12 Canon Kabushiki Kaisha Display apparatus which detects an observer body part motion in correspondence to a displayed element used to input operation instructions to start a process
US20020024500A1 (en) * 1997-03-06 2002-02-28 Robert Bruce Howard Wireless control device
US20020041327A1 (en) * 2000-07-24 2002-04-11 Evan Hildreth Video-based image control system
US6371849B1 (en) * 1997-05-02 2002-04-16 Konami Co., Ltd. Volleyball video game system
US20020048376A1 (en) * 2000-08-24 2002-04-25 Masakazu Ukita Signal processing apparatus and signal processing method
US20020051119A1 (en) * 2000-06-30 2002-05-02 Gary Sherman Video karaoke system and method of use
US6392644B1 (en) * 1998-05-25 2002-05-21 Fujitsu Limited Three-dimensional graphics display system
US6400374B2 (en) * 1996-09-18 2002-06-04 Eyematic Interfaces, Inc. Video superposition system and method
US6411744B1 (en) * 1997-10-15 2002-06-25 Electric Planet, Inc. Method and apparatus for performing a clean background subtraction
US6417836B1 (en) * 1999-08-02 2002-07-09 Lucent Technologies Inc. Computer input device having six degrees of freedom for controlling movement of a three-dimensional object
US20020109680A1 (en) * 2000-02-14 2002-08-15 Julian Orbanes Method for viewing information in virtual space
US20030020718A1 (en) * 2001-02-28 2003-01-30 Marshall Carl S. Approximating motion using a three-dimensional model
US20030022716A1 (en) * 2001-07-24 2003-01-30 Samsung Electronics Co., Ltd. Input device for computer games including inertia sensor
US20030032466A1 (en) * 2001-08-10 2003-02-13 Konami Corporation And Konami Computer Entertainment Tokyo, Inc. Gun shooting game device, method of controlling computer and program
US20030031333A1 (en) * 2000-03-09 2003-02-13 Yuval Cohen System and method for optimization of three-dimensional audio
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium
US20030047464A1 (en) * 2001-07-27 2003-03-13 Applied Materials, Inc. Electrochemically roughened aluminum semiconductor processing apparatus surfaces
US20030055646A1 (en) * 1998-06-15 2003-03-20 Yamaha Corporation Voice converter with extraction and modification of attribute data
US20030063065A1 (en) * 2001-09-11 2003-04-03 Samsung Electronics Co., Ltd. Pointer control method, pointing apparatus, and host apparatus therefor
US6545706B1 (en) * 1999-07-30 2003-04-08 Electric Planet, Inc. System, method and article of manufacture for tracking a head of a camera-generated image of a person
US20030100363A1 (en) * 2001-11-28 2003-05-29 Ali Guiseppe C. Method and apparatus for inputting appearance of computer operator into a computer program
US6573883B1 (en) * 1998-06-24 2003-06-03 Hewlett Packard Development Company, L.P. Method and apparatus for controlling a computing device with gestures
US6593956B1 (en) * 1998-05-15 2003-07-15 Polycom, Inc. Locating an audio source
US6597342B1 (en) * 1998-11-13 2003-07-22 Aruze Corporation Game machine controller
US20040029640A1 (en) * 1999-10-04 2004-02-12 Nintendo Co., Ltd. Game system and game information storage medium used for same
US20040032796A1 (en) * 2002-04-15 2004-02-19 Polycom, Inc. System and method for computing a location of an acoustic source
US6699123B2 (en) * 1999-10-14 2004-03-02 Sony Computer Entertainment Inc. Entertainment system, entertainment apparatus, recording medium, and program
US20040046736A1 (en) * 1997-08-22 2004-03-11 Pryor Timothy R. Novel man machine interfaces and applications
US20040063502A1 (en) * 2002-09-24 2004-04-01 Intec, Inc. Power module
US20040070564A1 (en) * 2002-10-15 2004-04-15 Dawson Thomas P. Method and system for controlling a display device
US20040075677A1 (en) * 2000-11-03 2004-04-22 Loyall A. Bryan Interactive character system
US20050047611A1 (en) * 2003-08-27 2005-03-03 Xiadong Mao Audio input system
US20050059488A1 (en) * 2003-09-15 2005-03-17 Sony Computer Entertainment Inc. Method and apparatus for adjusting a view of a scene being displayed according to tracked head motion
US20050063552A1 (en) * 2003-09-24 2005-03-24 Shuttleworth Timothy J. Ambient noise sound level compensation
US6890262B2 (en) * 2001-07-19 2005-05-10 Konami Corporation Video game apparatus, method and recording medium storing program for controlling viewpoint movement of simulated camera in video game
US20050114126A1 (en) * 2002-04-18 2005-05-26 Ralf Geiger Apparatus and method for coding a time-discrete audio signal and apparatus and method for decoding coded audio data
US20050115383A1 (en) * 2003-11-28 2005-06-02 Pei-Chen Chang Method and apparatus for karaoke scoring
US20050162384A1 (en) * 2004-01-28 2005-07-28 Fujinon Corporation Pointing device, method for displaying point image, and program therefor
US6990639B2 (en) * 2002-02-07 2006-01-24 Microsoft Corporation System and process for controlling electronic components in a ubiquitous computing environment using multimodal integration
US20060035710A1 (en) * 2003-02-21 2006-02-16 Festejo Ronald J Control of data processing
US20060115103A1 (en) * 2003-04-09 2006-06-01 Feng Albert S Systems and methods for interference-suppression with directional sensing patterns
US20060121681A1 (en) * 2004-12-02 2006-06-08 Texas Instruments, Inc. Method for forming halo/pocket implants through an L-shaped sidewall spacer
US20060136213A1 (en) * 2004-10-13 2006-06-22 Yoshifumi Hirose Speech synthesis apparatus and speech synthesis method
US20060139322A1 (en) * 2002-07-27 2006-06-29 Sony Computer Entertainment America Inc. Man-machine interface using a deformable device
US20070015558A1 (en) * 2002-07-27 2007-01-18 Sony Computer Entertainment America Inc. Method and apparatus for use in determining an activity level of a user in relation to a system
US20070015559A1 (en) * 2002-07-27 2007-01-18 Sony Computer Entertainment America Inc. Method and apparatus for use in determining lack of user activity in relation to a system
US20070021208A1 (en) * 2002-07-27 2007-01-25 Xiadong Mao Obtaining input for controlling execution of a game program
US20070025562A1 (en) * 2003-08-27 2007-02-01 Sony Computer Entertainment Inc. Methods and apparatus for targeted sound detection
US20070027687A1 (en) * 2005-03-14 2007-02-01 Voxonic, Inc. Automatic donor ranking and selection system and method for voice conversion
US20070061413A1 (en) * 2005-09-15 2007-03-15 Larsen Eric J System and method for obtaining user information from voices
US20070120834A1 (en) * 2005-11-29 2007-05-31 Navisense, Llc Method and system for object control
US20070120996A1 (en) * 2005-11-28 2007-05-31 Navisense, Llc Method and device for touchless control of a camera
US7227976B1 (en) * 2002-07-08 2007-06-05 Videomining Corporation Method and system for real-time facial image enhancement
US7233316B2 (en) * 2003-05-01 2007-06-19 Thomson Licensing Multimedia user interface
US20080056561A1 (en) * 2006-08-30 2008-03-06 Fujifilm Corporation Image processing device
US20080070684A1 (en) * 2006-09-14 2008-03-20 Mark Haigh-Hutchinson Method and apparatus for using a common pointing input to control 3D viewpoint and object targeting
US20080098448A1 (en) * 2006-10-19 2008-04-24 Sony Computer Entertainment America Inc. Controller configured to track user's level of anxiety and other mental and physical attributes
US20080096657A1 (en) * 2006-10-20 2008-04-24 Sony Computer Entertainment America Inc. Method for aiming and shooting using motion sensing controller
US20080096654A1 (en) * 2006-10-20 2008-04-24 Sony Computer Entertainment America Inc. Game control using three-dimensional motions of controller
US20080100825A1 (en) * 2006-09-28 2008-05-01 Sony Computer Entertainment America Inc. Mapping movements of a hand-held controller to the two-dimensional image plane of a display screen
US20080120115A1 (en) * 2006-11-16 2008-05-22 Xiao Dong Mao Methods and apparatuses for dynamically adjusting an audio signal based on a parameter
USD571367S1 (en) * 2006-05-08 2008-06-17 Sony Computer Entertainment Inc. Video game controller
USD571806S1 (en) * 2006-05-08 2008-06-24 Sony Computer Entertainment Inc. Video game controller
USD572254S1 (en) * 2006-05-08 2008-07-01 Sony Computer Entertainment Inc. Video game controller
US20090062943A1 (en) * 2007-08-27 2009-03-05 Sony Computer Entertainment Inc. Methods and apparatus for automatically controlling the sound level based on the content

Patent Citations (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5113449A (en) * 1982-08-16 1992-05-12 Texas Instruments Incorporated Method and apparatus for altering voice characteristics of synthesized speech
US4811243A (en) * 1984-04-06 1989-03-07 Racine Marsh V Computer aided coordinate digitizing system
US5916024A (en) * 1986-03-10 1999-06-29 Response Reward Systems, L.C. System and method of playing games and rewarding successful players
US5168524A (en) * 1989-08-17 1992-12-01 Eliza Corporation Speech-recognition circuitry employing nonlinear processing, speech element modeling and phoneme estimation
US5018736A (en) * 1989-10-27 1991-05-28 Wakeman & Deforrest Corporation Interactive game system and method
US5214615A (en) * 1990-02-26 1993-05-25 Will Bauer Three-dimensional displacement of a body with computer interface
US5128671A (en) * 1990-04-12 1992-07-07 Ltv Aerospace And Defense Company Control device having multiple degrees of freedom
US5425130A (en) * 1990-07-11 1995-06-13 Lockheed Sanders, Inc. Apparatus for transforming voice using neural networks
US5485273A (en) * 1991-04-22 1996-01-16 Litton Systems, Inc. Ring laser gyroscope enhanced resolution system
US5534917A (en) * 1991-05-09 1996-07-09 Very Vivid, Inc. Video image based control system
US6069594A (en) * 1991-07-29 2000-05-30 Logitech, Inc. Computer input device with multiple switches using single line
US5227985A (en) * 1991-08-19 1993-07-13 University Of Maryland Computer vision system for position monitoring in three dimensions using non-coplanar light sources attached to a monitored object
US5292871A (en) * 1991-10-24 1994-03-08 Hoechst Aktiengesellschaft Water-soluble compounds having a fiber-reactive group of the vinyl sulfone monoazo series bonded to an alkanoylamino of benzoylamino grouping, suitable as dyestuffs
US5327521A (en) * 1992-03-02 1994-07-05 The Walt Disney Company Speech transformation system
US5394168A (en) * 1993-01-06 1995-02-28 Smith Engineering Dual-mode hand-held game controller
US5335011A (en) * 1993-01-12 1994-08-02 Bell Communications Research, Inc. Sound localization system for teleconferencing using self-steering microphone arrays
US6346929B1 (en) * 1994-04-22 2002-02-12 Canon Kabushiki Kaisha Display apparatus which detects an observer body part motion in correspondence to a displayed element used to input operation instructions to start a process
US6100895A (en) * 1994-12-01 2000-08-08 Namco Ltd. Apparatus and method of image synthesization
US5900863A (en) * 1995-03-16 1999-05-04 Kabushiki Kaisha Toshiba Method and apparatus for controlling computer without touching input device
US5913727A (en) * 1995-06-02 1999-06-22 Ahdoot; Ned Interactive movement and contact simulation game
US5649021A (en) * 1995-06-07 1997-07-15 David Sarnoff Research Center, Inc. Method and system for object detection for instrument control
US6057909A (en) * 1995-06-22 2000-05-02 3Dv Systems Ltd. Optical ranging camera
US5611731A (en) * 1995-09-08 1997-03-18 Thrustmaster, Inc. Video pinball machine controller having an optical accelerometer for detecting slide and tilt
US5768415A (en) * 1995-09-08 1998-06-16 Lucent Technologies Inc. Apparatus and methods for performing electronic scene analysis and enhancement
US6022274A (en) * 1995-11-22 2000-02-08 Nintendo Co., Ltd. Video game system using memory module
US6014167A (en) * 1996-01-26 2000-01-11 Sony Corporation Tracking apparatus and tracking method
US5917936A (en) * 1996-02-14 1999-06-29 Nec Corporation Object detecting system based on multiple-eye images
US6009396A (en) * 1996-03-15 1999-12-28 Kabushiki Kaisha Toshiba Method and system for microphone array input type speech recognition using band-pass power distribution for sound source position/direction estimation
US5878367A (en) * 1996-06-28 1999-03-02 Northrop Grumman Corporation Passive acoustic traffic monitoring system
US6157403A (en) * 1996-08-05 2000-12-05 Kabushiki Kaisha Toshiba Apparatus for detecting position of object capable of simultaneously detecting plural objects and detection method therefor
US6400374B2 (en) * 1996-09-18 2002-06-04 Eyematic Interfaces, Inc. Video superposition system and method
US5930383A (en) * 1996-09-24 1999-07-27 Netzer; Yishay Depth sensing camera systems and methods
US6243491B1 (en) * 1996-12-31 2001-06-05 Lucent Technologies Inc. Methods and apparatus for controlling a video system with visually recognized props
US20020024500A1 (en) * 1997-03-06 2002-02-28 Robert Bruce Howard Wireless control device
US6061055A (en) * 1997-03-21 2000-05-09 Autodesk, Inc. Method of tracking objects with an imaging device
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US6394897B1 (en) * 1997-05-02 2002-05-28 Konami Co., Ltd. Volleyball video game system
US6371849B1 (en) * 1997-05-02 2002-04-16 Konami Co., Ltd. Volleyball video game system
US6014623A (en) * 1997-06-12 2000-01-11 United Microelectronics Corp. Method of encoding synthetic speech
US6075895A (en) * 1997-06-20 2000-06-13 Holoplex Methods and apparatus for gesture recognition based on templates
US20040046736A1 (en) * 1997-08-22 2004-03-11 Pryor Timothy R. Novel man machine interfaces and applications
US6720949B1 (en) * 1997-08-22 2004-04-13 Timothy R. Pryor Man machine interfaces and applications
US7042440B2 (en) * 1997-08-22 2006-05-09 Pryor Timothy R Man machine interfaces and applications
US6411744B1 (en) * 1997-10-15 2002-06-25 Electric Planet, Inc. Method and apparatus for performing a clean background subtraction
US6195104B1 (en) * 1997-12-23 2001-02-27 Philips Electronics North America Corp. System and method for permitting three-dimensional navigation through a virtual reality environment using camera-based gesture inputs
US6173059B1 (en) * 1998-04-24 2001-01-09 Gentner Communications Corporation Teleconferencing system with visual feedback
US6081780A (en) * 1998-04-28 2000-06-27 International Business Machines Corporation TTS and prosody based authoring system
US6593956B1 (en) * 1998-05-15 2003-07-15 Polycom, Inc. Locating an audio source
US6392644B1 (en) * 1998-05-25 2002-05-21 Fujitsu Limited Three-dimensional graphics display system
US20030055646A1 (en) * 1998-06-15 2003-03-20 Yamaha Corporation Voice converter with extraction and modification of attribute data
US6573883B1 (en) * 1998-06-24 2003-06-03 Hewlett Packard Development Company, L.P. Method and apparatus for controlling a computing device with gestures
US6597342B1 (en) * 1998-11-13 2003-07-22 Aruze Corporation Game machine controller
US20030032484A1 (en) * 1999-06-11 2003-02-13 Toshikazu Ohshima Game apparatus for mixed reality space, image processing method thereof, and program storage medium
US6545706B1 (en) * 1999-07-30 2003-04-08 Electric Planet, Inc. System, method and article of manufacture for tracking a head of a camera-generated image of a person
US6417836B1 (en) * 1999-08-02 2002-07-09 Lucent Technologies Inc. Computer input device having six degrees of freedom for controlling movement of a three-dimensional object
US20040029640A1 (en) * 1999-10-04 2004-02-12 Nintendo Co., Ltd. Game system and game information storage medium used for same
US6699123B2 (en) * 1999-10-14 2004-03-02 Sony Computer Entertainment Inc. Entertainment system, entertainment apparatus, recording medium, and program
US20020109680A1 (en) * 2000-02-14 2002-08-15 Julian Orbanes Method for viewing information in virtual space
US20030031333A1 (en) * 2000-03-09 2003-02-13 Yuval Cohen System and method for optimization of three-dimensional audio
US20020051119A1 (en) * 2000-06-30 2002-05-02 Gary Sherman Video karaoke system and method of use
US20020041327A1 (en) * 2000-07-24 2002-04-11 Evan Hildreth Video-based image control system
US20020048376A1 (en) * 2000-08-24 2002-04-25 Masakazu Ukita Signal processing apparatus and signal processing method
US20040075677A1 (en) * 2000-11-03 2004-04-22 Loyall A. Bryan Interactive character system
US20030020718A1 (en) * 2001-02-28 2003-01-30 Marshall Carl S. Approximating motion using a three-dimensional model
US6890262B2 (en) * 2001-07-19 2005-05-10 Konami Corporation Video game apparatus, method and recording medium storing program for controlling viewpoint movement of simulated camera in video game
US20030022716A1 (en) * 2001-07-24 2003-01-30 Samsung Electronics Co., Ltd. Input device for computer games including inertia sensor
US20030047464A1 (en) * 2001-07-27 2003-03-13 Applied Materials, Inc. Electrochemically roughened aluminum semiconductor processing apparatus surfaces
US20030032466A1 (en) * 2001-08-10 2003-02-13 Konami Corporation And Konami Computer Entertainment Tokyo, Inc. Gun shooting game device, method of controlling computer and program
US20030063065A1 (en) * 2001-09-11 2003-04-03 Samsung Electronics Co., Ltd. Pointer control method, pointing apparatus, and host apparatus therefor
US20030100363A1 (en) * 2001-11-28 2003-05-29 Ali Guiseppe C. Method and apparatus for inputting appearance of computer operator into a computer program
US6990639B2 (en) * 2002-02-07 2006-01-24 Microsoft Corporation System and process for controlling electronic components in a ubiquitous computing environment using multimodal integration
US20040032796A1 (en) * 2002-04-15 2004-02-19 Polycom, Inc. System and method for computing a location of an acoustic source
US20050114126A1 (en) * 2002-04-18 2005-05-26 Ralf Geiger Apparatus and method for coding a time-discrete audio signal and apparatus and method for decoding coded audio data
US7227976B1 (en) * 2002-07-08 2007-06-05 Videomining Corporation Method and system for real-time facial image enhancement
US20070015558A1 (en) * 2002-07-27 2007-01-18 Sony Computer Entertainment America Inc. Method and apparatus for use in determining an activity level of a user in relation to a system
US20060139322A1 (en) * 2002-07-27 2006-06-29 Sony Computer Entertainment America Inc. Man-machine interface using a deformable device
US20070021208A1 (en) * 2002-07-27 2007-01-25 Xiadong Mao Obtaining input for controlling execution of a game program
US20070015559A1 (en) * 2002-07-27 2007-01-18 Sony Computer Entertainment America Inc. Method and apparatus for use in determining lack of user activity in relation to a system
US20040063502A1 (en) * 2002-09-24 2004-04-01 Intec, Inc. Power module
US20040070564A1 (en) * 2002-10-15 2004-04-15 Dawson Thomas P. Method and system for controlling a display device
US20060035710A1 (en) * 2003-02-21 2006-02-16 Festejo Ronald J Control of data processing
US20060115103A1 (en) * 2003-04-09 2006-06-01 Feng Albert S Systems and methods for interference-suppression with directional sensing patterns
US7233316B2 (en) * 2003-05-01 2007-06-19 Thomson Licensing Multimedia user interface
US20070025562A1 (en) * 2003-08-27 2007-02-01 Sony Computer Entertainment Inc. Methods and apparatus for targeted sound detection
US20050047611A1 (en) * 2003-08-27 2005-03-03 Xiadong Mao Audio input system
US20050059488A1 (en) * 2003-09-15 2005-03-17 Sony Computer Entertainment Inc. Method and apparatus for adjusting a view of a scene being displayed according to tracked head motion
US20050063552A1 (en) * 2003-09-24 2005-03-24 Shuttleworth Timothy J. Ambient noise sound level compensation
US20050115383A1 (en) * 2003-11-28 2005-06-02 Pei-Chen Chang Method and apparatus for karaoke scoring
US20050162384A1 (en) * 2004-01-28 2005-07-28 Fujinon Corporation Pointing device, method for displaying point image, and program therefor
US20060136213A1 (en) * 2004-10-13 2006-06-22 Yoshifumi Hirose Speech synthesis apparatus and speech synthesis method
US20060121681A1 (en) * 2004-12-02 2006-06-08 Texas Instruments, Inc. Method for forming halo/pocket implants through an L-shaped sidewall spacer
US20070027687A1 (en) * 2005-03-14 2007-02-01 Voxonic, Inc. Automatic donor ranking and selection system and method for voice conversion
US20070061413A1 (en) * 2005-09-15 2007-03-15 Larsen Eric J System and method for obtaining user information from voices
US20070120996A1 (en) * 2005-11-28 2007-05-31 Navisense, Llc Method and device for touchless control of a camera
US20070120834A1 (en) * 2005-11-29 2007-05-31 Navisense, Llc Method and system for object control
USD571367S1 (en) * 2006-05-08 2008-06-17 Sony Computer Entertainment Inc. Video game controller
USD572254S1 (en) * 2006-05-08 2008-07-01 Sony Computer Entertainment Inc. Video game controller
USD571806S1 (en) * 2006-05-08 2008-06-24 Sony Computer Entertainment Inc. Video game controller
US20080056561A1 (en) * 2006-08-30 2008-03-06 Fujifilm Corporation Image processing device
US20080070684A1 (en) * 2006-09-14 2008-03-20 Mark Haigh-Hutchinson Method and apparatus for using a common pointing input to control 3D viewpoint and object targeting
US20080100825A1 (en) * 2006-09-28 2008-05-01 Sony Computer Entertainment America Inc. Mapping movements of a hand-held controller to the two-dimensional image plane of a display screen
US20080098448A1 (en) * 2006-10-19 2008-04-24 Sony Computer Entertainment America Inc. Controller configured to track user's level of anxiety and other mental and physical attributes
US20080096654A1 (en) * 2006-10-20 2008-04-24 Sony Computer Entertainment America Inc. Game control using three-dimensional motions of controller
US20080096657A1 (en) * 2006-10-20 2008-04-24 Sony Computer Entertainment America Inc. Method for aiming and shooting using motion sensing controller
US20080120115A1 (en) * 2006-11-16 2008-05-22 Xiao Dong Mao Methods and apparatuses for dynamically adjusting an audio signal based on a parameter
US20090062943A1 (en) * 2007-08-27 2009-03-05 Sony Computer Entertainment Inc. Methods and apparatus for automatically controlling the sound level based on the content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
j. Benesty, Adaptive eigenvalue decomposition algorithm for passive acoustic source localization," j. Acoustic Soc. Amer., vol. 107, no. 1, pp. 384-391, January 2000 *

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106672A1 (en) * 2007-10-18 2009-04-23 Sony Ericsson Mobile Communications Ab Virtual world avatar activity governed by person's real life activity
US11706349B2 (en) 2008-04-02 2023-07-18 Twilio Inc. System and method for processing telephony sessions
US11575795B2 (en) 2008-04-02 2023-02-07 Twilio Inc. System and method for processing telephony sessions
US11611663B2 (en) 2008-04-02 2023-03-21 Twilio Inc. System and method for processing telephony sessions
US11765275B2 (en) 2008-04-02 2023-09-19 Twilio Inc. System and method for processing telephony sessions
US11722602B2 (en) 2008-04-02 2023-08-08 Twilio Inc. System and method for processing media requests during telephony sessions
US10412456B2 (en) 2008-04-24 2019-09-10 Sony Interactive Entertainment, LLC Method and apparatus for real-time viewer interaction with a media presentation
US20090279728A1 (en) * 2008-05-12 2009-11-12 Kun-Hong Hou Game controller with speaker
US11641427B2 (en) 2008-10-01 2023-05-02 Twilio Inc. Telephony web event system and method
US11665285B2 (en) 2008-10-01 2023-05-30 Twilio Inc. Telephony web event system and method
US11632471B2 (en) 2008-10-01 2023-04-18 Twilio Inc. Telephony web event system and method
US11785145B2 (en) 2009-03-02 2023-10-10 Twilio Inc. Method and system for a multitenancy telephone network
US8878041B2 (en) * 2009-05-27 2014-11-04 Microsoft Corporation Detecting beat information using a diverse set of correlations
US20100300271A1 (en) * 2009-05-27 2010-12-02 Microsoft Corporation Detecting Beat Information Using a Diverse Set of Correlations
US11637933B2 (en) 2009-10-07 2023-04-25 Twilio Inc. System and method for running a multi-module telephony application
US11637934B2 (en) 2010-06-23 2023-04-25 Twilio Inc. System and method for monitoring account usage on a platform
US20120183149A1 (en) * 2011-01-18 2012-07-19 Sony Corporation Sound signal processing apparatus, sound signal processing method, and program
US9361907B2 (en) * 2011-01-18 2016-06-07 Sony Corporation Sound signal processing apparatus, sound signal processing method, and program
US20140372081A1 (en) * 2011-03-29 2014-12-18 Drexel University Real time artifact removal
US11399044B2 (en) 2011-05-23 2022-07-26 Twilio Inc. System and method for connecting a communication to a client
US20120330594A1 (en) * 2011-06-22 2012-12-27 Samsung Electronics Co., Ltd. Method and apparatus for estimating 3d position and orientation through sensor fusion
US9759804B2 (en) * 2011-06-22 2017-09-12 Samsung Electronics Co., Ltd. Method and apparatus for estimating 3D position and orientation through sensor fusion
US11489961B2 (en) 2011-09-21 2022-11-01 Twilio Inc. System and method for determining and communicating presence information
US9628843B2 (en) * 2011-11-21 2017-04-18 Microsoft Technology Licensing, Llc Methods for controlling electronic devices using gestures
US20130131836A1 (en) * 2011-11-21 2013-05-23 Microsoft Corporation System for controlling light enabled devices
US11165853B2 (en) 2012-05-09 2021-11-02 Twilio Inc. System and method for managing media in a distributed communication network
US20130332156A1 (en) * 2012-06-11 2013-12-12 Apple Inc. Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device
US11546471B2 (en) 2012-06-19 2023-01-03 Twilio Inc. System and method for queuing a communication session
US8855295B1 (en) * 2012-06-25 2014-10-07 Rawles Llc Acoustic echo cancellation using blind source separation
US11689899B2 (en) 2012-10-15 2023-06-27 Twilio Inc. System and method for triggering on platform usage
US11595792B2 (en) 2012-10-15 2023-02-28 Twilio Inc. System and method for triggering on platform usage
US11246013B2 (en) 2012-10-15 2022-02-08 Twilio Inc. System and method for triggering on platform usage
US11637876B2 (en) 2013-03-14 2023-04-25 Twilio Inc. System and method for integrating session initiation protocol communication in a telecommunications platform
US11379275B2 (en) 2013-09-17 2022-07-05 Twilio Inc. System and method for tagging and tracking events of an application
US11539601B2 (en) 2013-09-17 2022-12-27 Twilio Inc. System and method for providing communication platform metadata
US11653282B2 (en) 2014-04-17 2023-05-16 Twilio Inc. System and method for enabling multi-modal communication
US20150341735A1 (en) * 2014-05-26 2015-11-26 Canon Kabushiki Kaisha Sound source separation apparatus and sound source separation method
US9712937B2 (en) * 2014-05-26 2017-07-18 Canon Kabushiki Kaisha Sound source separation apparatus and sound source separation method
US11755530B2 (en) 2014-07-07 2023-09-12 Twilio Inc. Method and system for applying data retention policies in a computing platform
US11768802B2 (en) 2014-07-07 2023-09-26 Twilio Inc. Method and system for applying data retention policies in a computing platform
US11341092B2 (en) 2014-07-07 2022-05-24 Twilio Inc. Method and system for applying data retention policies in a computing platform
US10568467B2 (en) 2014-10-02 2020-02-25 Conopco, Inc. Liquid dispenser with framed refill receiving bay
US11265367B2 (en) 2015-05-14 2022-03-01 Twilio Inc. System and method for signaling through data storage
US11272325B2 (en) 2015-05-14 2022-03-08 Twilio Inc. System and method for communicating through multiple endpoints
US11171865B2 (en) 2016-02-04 2021-11-09 Twilio Inc. Systems and methods for providing secure network exchanged for a multitenant virtual private cloud
US11627225B2 (en) 2016-05-23 2023-04-11 Twilio Inc. System and method for programmatic device connectivity
US11265392B2 (en) 2016-05-23 2022-03-01 Twilio Inc. System and method for a multi-channel notification service
IT201700116138A1 (en) * 2017-10-16 2018-01-16 Salvatore Iannelli Electronic system for managing a sale procedure and related procedure
US20220116700A1 (en) * 2019-01-09 2022-04-14 Hangzhou Taro Positioning Technology Co., Ltd. Directional sound capture using image-based object tracking
US11902737B2 (en) * 2019-01-09 2024-02-13 Hangzhou Taro Positioning Technology Co., Ltd. Directional sound capture using image-based object tracking

Similar Documents

Publication Publication Date Title
US7803050B2 (en) Tracking device with sound emitter for use in obtaining information for controlling game program execution
US8073157B2 (en) Methods and apparatus for targeted sound detection and characterization
US7783061B2 (en) Methods and apparatus for the targeted sound detection
US20110014981A1 (en) Tracking device with sound emitter for use in obtaining information for controlling game program execution
US8947347B2 (en) Controlling actions in a video game unit
EP1878013B1 (en) Video game control with joystick
US8303405B2 (en) Controller for providing inputs to control execution of a program when inputs are combined
US7809145B2 (en) Ultra small microphone array
US7613310B2 (en) Audio input system
US9682320B2 (en) Inertially trackable hand-held controller
US9174119B2 (en) Controller for providing inputs to control execution of a program when inputs are combined
US8313380B2 (en) Scheme for translating movements of a hand-held controller into inputs for a system
US8675915B2 (en) System for tracking user manipulations within an environment
US9393487B2 (en) Method for mapping movements of a hand-held controller to game commands
JP4897666B2 (en) Method and apparatus for detecting and eliminating audio interference
US20060274032A1 (en) Tracking device for use in obtaining information for controlling game program execution
WO2007130793A2 (en) Obtaining input for controlling execution of a game program
EP2460570A2 (en) Scheme for Detecting and Tracking User Manipulation of a Game Controller Body and for Translating Movements Thereof into Inputs and Game Commands
WO2007130819A2 (en) Tracking device with sound emitter for use in obtaining information for controlling game program execution
KR101020509B1 (en) Obtaining input for controlling execution of a program
EP1852164A2 (en) Obtaining input for controlling execution of a game program

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY INTERACTIVE ENTERTAINMENT INC., JAPAN

Free format text: CHANGE OF NAME;ASSIGNOR:SONY COMPUTER ENTERTAINMENT INC.;REEL/FRAME:039239/0343

Effective date: 20160401

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION