CN1132149C - Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus - Google Patents

Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus Download PDF

Info

Publication number
CN1132149C
CN1132149C CN95107149A CN95107149A CN1132149C CN 1132149 C CN1132149 C CN 1132149C CN 95107149 A CN95107149 A CN 95107149A CN 95107149 A CN95107149 A CN 95107149A CN 1132149 C CN1132149 C CN 1132149C
Authority
CN
China
Prior art keywords
sound
selecting arrangement
input
voice
lip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN95107149A
Other languages
Chinese (zh)
Other versions
CN1120965A (en
Inventor
前川英嗣
渡边辰巳
小原和昭
萱嶋一弘
松井谦二
松川善彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Publication of CN1120965A publication Critical patent/CN1120965A/en
Application granted granted Critical
Publication of CN1132149C publication Critical patent/CN1132149C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Toys (AREA)
  • User Interface Of Digital Computer (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Selective Calling Equipment (AREA)
  • Position Input By Displaying (AREA)

Abstract

A game apparatus includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and a control means for controlling an object on the basis of the voice extracted by the overall judgment means.

Description

Game device, sound selecting arrangement and sound reaction unit
The present invention relates to a kind of game device that utilizes the sound operation, one is used for the image of input mouth or lip and/or input media and audio response device of sound.
Figure 34 represents the embodiment of a traditional game device.According to this game device, the operator uses a kind of telepilot that comprises a portable wireless transmitter, so that operate a dirigible 7 with wireless receiver.As shown in figure 34, this traditional game device uses the joystick 161 that links to each other with telepilot, operator's armrest handle control intended target (dirigible) 7 usually.When the operator promotes joystick 161, utilize angle detector 162 and 163 to detect all angles of these handles, and and each angular transition become electric signal so that be input to controller 164.Controller 164 is used to control the wireless control signal of dirigible 7 motions according to the angle output of joystick.
But when the needs according to the traditional game device used this joystick 161, people (operator) can not expect operation.So just caused some problems: the operator just needs spended time to improve the skill level of operation, and rapid reaction as required always.Handle another kind of game device the operator, when being a kind of balloon that drive unit is housed rather than dirigible, the motion of balloon is also controlled in a manner described, will bring such problem like this, i.e. motion becomes lifeless, thereby destroys the sensation of balloon intrinsic " people ".
On the other hand, the someone advises adopting the device of image by input operation person's mouth or the lip person's that comes the identifying operation sound.But this device needs complicated optical lens, thereby has increased whole device size and scale, and makes price very expensive.
Game device of the present invention comprises and is used to import at least one group of sound that is sent by the operator, will organizes the acoustic input dephonoprojectoscope that sound converts first electric signal to and exports this first electric signal; Be used for voice recognition device according to the first electric signal sound recognition group of exporting from this acoustic input dephonoprojectoscope; Be used for converting second electric signal to and export the image input device of this second electric signal by optical means detecting operation person lip action, with the lip action that detects; Be used to receive second electric signal, and obtain the period of speech pick-up unit in the sound cycle that the operator sends according to second electric signal that receives; Be used for cycle of obtaining according to the sound group and the period of speech pick-up unit of voice recognition device identification, total decision maker of the sound that extraction operation person sends from the sound group of input; And the control device that is used for the sound controlling object of the total decision maker extraction of basis.
In one embodiment of the invention, the period of speech pick-up unit comprises: the derivator that is used for detecting the intensity of variation from second electric signal that image input device is exported; And be used for when the detected intensity of variation of derivator surpasses predetermined value, determining the device whether corresponding sound sent by the operator;
In another embodiment of the present invention, this total decision maker comprises: be used for being increased to the device that cycle that the period of speech pick-up unit obtains produces a cycle estimator by the cycle with the schedule time; Be used for when voice recognition device is exported, to detect the device of recognition result output time by the sound group of voice recognition device identification; And be used between recognition result output time and cycle estimator, comparing, and determine whether the sound that its recognition result output time drops in the sound group cycle estimator scope is the device that is sent by the operator.
Another kind of game device of the present invention comprises: be used for converting electric signal to by optical means input operation person lip action, with the lip action that detects and export the image input device of this signal; Be used for obtaining lip action, identification and lip action parallel expression that is obtained and the labiomaney device of exporting this recognition result according to this electric signal; And be used for coming the control device of controlling object according to the control signal that obtains by recognition result.
In another embodiment of the present invention, this labiomaney device comprises: the memory storage that is used to store a predetermined word number; And be used for according to the action of the lip that obtains from predetermined word number select a word and determine selected word whether with the corresponding to coalignment of the action of lip.
In another embodiment of the present invention, the corresponding lip action of this memory device stores and predetermined word number as reference pattern; This coalignment calculates the initial distance of lip action from all reference patterns that obtained, and selects a minimum of computation with reference pattern apart from parallel expression.
In yet another embodiment of the present invention, this game device also comprises: be used for sound import, convert this sound to the another one electric signal and export the acoustic input dephonoprojectoscope of this another one electric signal; Be used for voice recognition device according to this another one electric signal sound recognition of exporting from this acoustic input dephonoprojectoscope; And be used for exporting the total decision maker that is sent to the control signal on the control device according to the recognition result of this voice recognition device and the recognition result of this labiomaney device.
In another embodiment of the present invention, this game device also comprises: the device that is used to obtain the voice recognition fiduciary level of the recognition result that is obtained by voice recognition device; And the device that is used to obtain the labiomaney fiduciary level of the recognition result that obtains by the labiomaney device, wherein total decision maker according to voice recognition fiduciary level and labiomaney fiduciary level select a sound in the recognition result of the recognition result of recognition device and labiomaney device a result and export selected recognition result as control signal.
In another embodiment of the present invention, this image input device comprise be used for radiative light emitting devices and one be used to receive from the light of operator's lip reflection and the light that will receive convert to second electric signal optical detection device.
In another embodiment of the present invention, light shines lip from the side direction of lip.
In another embodiment of the present invention, light shines lip from the frontal of lip.
In another embodiment of the present invention, acoustic input dephonoprojectoscope comprises a microphone at least.
In another embodiment of the present invention, acoustic input dephonoprojectoscope comprises at least one microphone, and the light emitting devices of this microphone and image input device and optical detection device all are arranged in the electronic unit.
Input media of the present invention comprises: an earphone-type head-telephone; One end is connected the support bar on this head-telephone; An electronic unit that is connected the support bar other end, this electronic unit comprise that at least one is used to produce the light emitting devices that shines operating operation person lip and is used to receive the photodetector that reflexes to the light above it from operator's lip with at least one.
In another embodiment of the present invention, this electronic unit also comprises the acoustic input dephonoprojectoscope that is used to import the sound that sends it to.
Sound selecting arrangement of the present invention comprises: be used to store first memory storage of some tables, and each table comprise can be with a plurality of words of input and output; Be used for storing second memory storage of a table of a plurality of tables; Be used for according to a input value, from be included in a plurality of words in the table that is stored in second memory storage, select a word from outside input, and with the selecting arrangement of a selected word as voice output; And be used for will be stored in a table in second memory storage convert another table of a plurality of tables that are stored in first memory storage to, and detect the conversion equipment of this table according to a selected word.
In one embodiment of the invention, the sound selecting arrangement also comprises the device that is used to produce a random number, and wherein selecting arrangement utilizes this random number to select a word from a plurality of words.
Another kind of sound selecting arrangement comprises: be used for the memory storage of storage list, table comprises can be according to a plurality of words of input and output; Be used for receiving a input value, utilize random number to select words and select the selecting arrangement of a word as voice output from a plurality of words in the table that is stored in memory storage from the outside input; And the device that is used to produce random number.
Audio response device of the present invention comprises: the tut selecting arrangement; And the voice recognition device that is used to receive sound, sound recognition and recognition result is exported to the sound selecting arrangement.
Another kind of game device of the present invention comprises the tut reaction unit.
Another kind of game device of the present invention comprises above-mentioned a plurality of audio response device, thereby a plurality of audio response devices can be engaged in the dialogue mutually.
Another kind of game device of the present invention comprises: be used for sound import is converted to a plurality of sound input units of electric signal, the direction that these a plurality of sound input units are corresponding different respectively; And be used for obtaining electrical signal energy, determining that one of them has the sound input unit of ceiling capacity, and determine and the originate from direction selecting arrangement of a definite corresponding direction of sound input unit of direction of sound to each sound input unit of a plurality of sound input units.
In one embodiment of the invention, this game device also comprises: the operating means that is used for manipulating objects; And be used to control this operating means, so that make the direction of operand change to the control device of definite direction.
In another embodiment of the present invention, this game device also comprises the direction selecting arrangement and is used for the operating means of operand, the device that the measurement mechanism and being used to that this direction selecting arrangement has a current direction of operating that is used for measuring object is imported definite direction, obtained target direction and store this target direction according to current direction and definite direction, wherein the direction selecting arrangement is controlled this operating means, so that utilize the current direction of operating of the official post object between target direction and the current direction consistent with target direction substantially.
Another kind of game device of the present invention comprises the direction selecting arrangement, and this direction selecting arrangement has the input media that is used to utilize a relative direction of sound input; The measurement mechanism that is used for the current direction of measuring object; And be used for obtaining target direction and storing the device of this target direction according to the relative direction of current direction and input, wherein the direction selecting arrangement is controlled this object, so that utilize the current direction of the official post object between target direction and the current direction consistent with target direction substantially.
In one embodiment of the invention, input media comprises an input unit that is used for sound import, and a recognizer that is used for according to the voice recognition absolute direction of input.
Another kind of game device of the present invention comprises the direction selecting arrangement, and this direction selecting arrangement comprises: the input unit that is used to utilize sound input absolute direction; Be used for determining target direction and storing the device of this target direction according to this absolute direction; And the measurement mechanism of the current direction of measuring object, wherein the direction selecting arrangement is controlled this object, so that utilize the current direction of the official post object between target direction and the current direction consistent with target direction substantially.
In one embodiment of the invention, input media comprises an input unit that is used for sound import, and a recognizer that is used for according to the voice recognition relative direction of input.
Voice recognition device of the present invention comprises: be used for receiving and the corresponding electric signal of sound, and according to this electric signal, detect first pick-up unit that expression sound is imported the sound terminating point of the time that stops; Be used for determining according to this electric signal second pick-up unit of period of speech, this period of speech is the cycle of the sound that sends in the whole cycle of sound import; Be used for according to forming the characteristic quantity draw-out device of a characteristic quantity vector with the corresponding part electric signal of period of speech; Be used to store the memory storage of characteristic quantity vector of a plurality of sound to be selected of previous generation; And be used for discerning the device of sound import by the characteristic quantity of characteristic quantity draw-out device is compared with each the characteristic quantity vector of a plurality of sound to be selected in being stored in this memory storage.
In one embodiment of the invention, first pick-up unit comprises: be used for electric signal is divided into a plurality of frames, every frame has the device of predetermined length; Be used to obtain the calculation element of electrical signal energy of every frame of multiframe; And the definite device that is used for determining the sound terminal point according to the variation of energy.
In another embodiment of the present invention, should determine that device was by comparing to determine the sound terminal point with predetermined threshold with energy variation, and this sound terminal point time corresponding place energy variation is consistent with threshold value, simultaneously energy from greater than changes of threshold to less than threshold value.
In another embodiment of the present invention, this determines that device utilizes the energy variation of predetermined frame number in the multiframe energy.
In another embodiment of the present invention, second pick-up unit comprises: the device that is used for level and smooth electrical signal energy; Be used for before smoothly, storing in proper order the first circulation memory storage of the electrical signal energy of each frame; Be used for the second circulation storage device that order stores the energy of each frame after level and smooth; Be used to utilize when the sound end point determination, be stored in the first circulation memory storage energy before level and smooth and when the sound end point determination, be stored in the energy after level and smooth in the second circulation memory storage, calculate the threshold calculations device of the period of speech of detection threshold; And be used for determining that by the period of speech of energy before smoother and detection threshold the period of speech of period of speech determines device.
In another embodiment of the present invention, this threshold calculations device utilization is stored in the first circulation memory storage energy maximal value before level and smooth and is stored in the energy-minimum after level and smooth in the second circulation memory storage when the sound terminal point does not detect as yet when the sound end point determination, calculate the period of speech of detection threshold.
In another embodiment of the present invention, the characteristic quantity pick-up unit according to the period of speech of electric signal calculate the zero crossing number of each frame of electric signal, to electric signal differentiate the zero crossing number of each frame of signal of obtaining and the energy of electric signal, wherein, the value of being tried to achieve is as the unit of characteristic quantity vector.
Another kind of audio response device of the present invention comprises: the voice recognition device that at least one is above-mentioned; At least one is used for the control device according to an object of recognition result control of at least one voice recognition device.
In one embodiment of the invention, this audio response device also comprises the transmitting device that is connected at least one voice recognition device, is used to transmit the recognition result of this at least one voice recognition device; The receiving trap that is connected at least one control device, is used to receive the recognition result of transmission and this recognition result is applied to this at least one control device, wherein at least one control device and at least one receiving trap are contained on the object, thereby can carry out remote control to object.
Therefore, above-described the present invention can have following advantage: (1) provides a kind of low-cost game device that has simple structure, can be operated and do not needed a twist of the wrist by people's sound, this device can use in noise circumstance or under the situation of speaker's inconvenience sounding, and has the people of aphasis also can use; (2) provide a kind of voice recognition device that can expect operation to game device or toy; And (3) provide an energy to change the audio response device of operation according to the sound that inputs to it.
Those skilled in the art will become apparent above-mentioned and other advantage of the present invention after reading and having understood the detailed description of carrying out with reference to the accompanying drawings.
Fig. 1 is the block scheme of the game device structure of expression first embodiment of the invention;
Fig. 2 is the detail structure chart of the image input device of expression the present invention first to the 3rd embodiment;
Fig. 3 is the detail structure chart of the period of speech pick-up unit of expression first embodiment of the invention;
Fig. 4 is the block scheme of thin portion structure of total decision maker of expression first embodiment of the invention;
Fig. 5 A is the curve map of the example of output differential signal of expression the present invention first to the 3rd embodiment;
Fig. 5 B is the curve map of another example of output differential signal of expression the present invention first to the 3rd embodiment;
Fig. 6 is the course of work curve map of total decision maker of expression first embodiment of the invention;
Fig. 7 is the curve map of duty of total decision maker of expression first embodiment of the invention;
Fig. 8 is the block scheme of the game device structure of expression second embodiment of the invention;
Fig. 9 is the thin portion block diagram of the lip treating apparatus of expression lip readout device of second embodiment of the invention and third embodiment of the invention;
Figure 10 is the curve map of the duty of a differentiating circuit of expression the present invention;
Figure 11 is the curve map of duty of the mode matching device of expression the present invention second and three embodiment;
Figure 12 is the block scheme of the game device structure of expression third embodiment of the invention;
Figure 13 is the chart of duty of total decision maker of expression third embodiment of the invention;
Figure 14 A is the chart of duty of total decision maker of expression third embodiment of the invention;
Figure 14 B be the expression third embodiment of the invention another kind of total decision maker duty chart;
Figure 15 A is the synoptic diagram of a kind of typical structure of expression input media of the present invention;
Figure 15 B is the synoptic diagram of a kind of typical structure of expression input media of the present invention;
Figure 16 is the synoptic diagram of the sound selecting arrangement structure of expression fourth embodiment of the invention;
Figure 17 A is the synoptic diagram of the input/output state of expression sound selecting arrangement shown in Figure 16;
Figure 17 B is the synoptic diagram of the input/output state of expression sound selecting arrangement shown in Figure 16;
Figure 18 is the synoptic diagram of the another kind of sound selecting arrangement structure of expression fourth embodiment of the invention;
Figure 19 is the synoptic diagram of the direction detection device structure of expression fifth embodiment of the invention;
Figure 20 is the synoptic diagram of expression input acoustic waveform and frame;
Figure 21 is the synoptic diagram of the direction detection device structure of expression fifth embodiment of the invention;
Figure 22 is the synoptic diagram of the another kind of direction detection device structure of expression fifth embodiment of the invention;
Figure 23 is the synoptic diagram of expression sound waveform, energy and circulation storage;
Figure 24 is the synoptic diagram of method of the detection sound terminal point of expression sixth embodiment of the invention;
Figure 25 is the synoptic diagram of the detection sound method of expression sixth embodiment of the invention;
Figure 26 is the block scheme of the voice recognition device structure of expression sixth embodiment of the invention;
Figure 27 is the structural representation that expression is combined into a voice recognition device of the present invention and sound selecting arrangement in an audio response device;
Figure 28 is the structural representation that an expression direction detection device of the present invention and operating means are combined into an audio response device;
Figure 29 is the structural representation of the expression audio response device of being made up of voice recognition device, direction detection device and operating control of the present invention;
Figure 30 is the structural representation of the expression audio response device of being made up of direction selecting arrangement, direction detection device and operating control of the present invention;
Figure 31 is the structural representation of the expression audio response device of being made up of voice recognition device and operating control of the present invention;
Figure 32 is the structural representation that expression can be carried out the audio response device of remote control;
Figure 33 is an expression synoptic diagram with toy example of audio response device of the present invention;
Figure 34 is the structural representation of a kind of traditional game device of expression.
Embodiment one
The first embodiment of the present invention is described below with reference to accompanying drawings.According to the game device of present embodiment, dirigible is by with the corresponding sound instruction of various motions of dirigible it being handled.Sound instruction comprises 6 instructions, i.e. " mae " (forward), " ushiro " (backward), " migi " (to the right), " hisri " (left), " ue " (making progress) and " shita " (downwards).In the present embodiment, the voice signal that will send by the speaker not only, but also will represent that the signal (hereinafter being referred to as " lip moves signal ") of speaker's lip action is input to this game device.Determine the speaker, the processing whether operator that promptly plays is speaking according to voice signal and the moving signal of lip.So just can avoid because outside noise, particularly the sound that sends by an other people and make game device do the action that makes mistake.
Fig. 1 illustrates a kind of structure of present embodiment game device.As shown in Figure 1, game device comprises an acoustic input dephonoprojectoscope that is used to handle input signal 1, voice recognition device 2, an image input device 3 and a period of speech pick-up unit 4.Image input device 3 is handled the moving signal of the lip of representing the action of speaker's lip.Voice recognition device 2 and period of speech pick-up unit 4 all are connected on total decision maker 5.Total decision maker 5 is definite instruction of being sent by the speaker on the basis of sound import and speaker's lip motion.The result of determination of total decision maker 5 is defeated by control device 6.Control device 6 is controlled dirigible 7 according to the result who judges.
The sound that at first will comprise the instruction of being sent by the speaker is input to acoustic input dephonoprojectoscope 1.Can adopt common microphone or similar device sound import.Acoustic input dephonoprojectoscope 1 converts the sound of input to electric signal, and this electric signal is exported to voice recognition device 2 as voice signal 11.Voice recognition device 2 is analyzed voice signal 11, so that the result that will analyze is as voice recognition result 12 outputs.Analysis to voice signal 11 can utilize known classic method, and for example the DP matching process carries out.
When the processing procedure of above-mentioned sound import is carried out, carry out the moving Signal Processing of lip.When the speaker sent an instruction, speaker's lip action was imported into image input device 3.Fig. 2 illustrates a kind of typical structure of image input device 3.According to the image input device 3 of present embodiment, the illumination of sending from LED (light emitting diode) 21 is mapped on the total area of lip (hereinafter referred to as " lip district ") of comprising the speaker etc.The light that is reflected by the lip district is detected by photodiode 22.Like this, lip with the speaker of image input device 3 output moves corresponding lip and moves signal 13.When speaker's lip moved, the level of the moving signal 13 of lip changed according near the variation of the shade speaker's lip.But the light vertical irradiation that LED21 sends, promptly from speaker's front or side, the people's that just speaks side irradiation.The moving signal 13 of the lip that image input device 3 sends is defeated by period of speech pick-up unit 4.Fig. 3 illustrates a kind of structure of the period of speech pick-up unit 4 of present embodiment.As shown in Figure 4, voice recognition device 4 comprises a differentiating circuit 31 and one-period pick-up unit 32.The differential signal 33 of the intensity of variation of the moving signal 13 of lip of differentiating circuit 31 output representative inputs.Fig. 5 A and 5B respectively illustrate a kind of typical waveform of the differential signal 33 that a kind of light as LED21 obtains when (promptly laterally) shines on speaker's the lip from the side.Differential signal 33 shown in Fig. 5 A obtains when (forward) when the speaker sends " mae ".Differential signal 33 shown in Fig. 5 B obtains when (backward) when the speaker sends " ushiro ".From Fig. 5 A and 5B, as can be seen, when differential signal 33 has than large amplitude, show that the speaker speaks.Yet, because from only the shining on speaker's the lip of LED21, so when " u " sound in the instruction of sending " ushiro ", the action that lip has sticked up will be reflected on the waveform of differential signal 33 from speaker's a side (promptly laterally).If on the lip of light from the front illuminated to speaker from LED21, then light only is radiated at the speaker on the face, so moving signal 13 of lip and differential signal 33 just are not subjected to any background action The noise.
Cycle detection device 32 receives differential signal 33 and determines the amplitude of differential signal 33, so that detect speaker's period of speech.The ad hoc approach that is used to detect period of speech is described with reference to Fig. 6.
When the level (amplitude) of differential signal 33 surpasses threshold amplitude 51, cycle detection device 32 determines whether differential signal 33 is launched because of the speaker sends instruction, and when the level of definition differential signal 33 surpasses threshold amplitude 51, cycle be period of speech.In this embodiment shown in Figure 6, the cycle 1 and 2 is period of speech.Cycle detection device 32 is compared the interval between the adjacent period of speech with predetermined threshold time length 52.Predetermined threshold time length 52 is used to determine whether the same pronunciation of a plurality of period of speech corresponding to the speaker, that is to say whether these period of speech are sent continuously.If the interval between two period of speech is equal to or less than threshold time length 52, can determine that then this period of speech is a continuous period of speech that comprises this interval.It is the period of speech detection signal 14 of continuous period of speech that 4 one of the output of the speech pick-up unit same period indicate.Threshold time length 52 and threshold amplitude 51 are in the appropriate value scope of regulation.
As mentioned above, during the speaker sent an instruction (i.e. " period of speech "), period of speech pick-up unit 4 was according to differential signal 33, and the intensity and the duration of moving by detection speaker lip obtain one-period.
The operating process of total decision maker 5 will be described below.As shown in Figure 4, total decision maker 5 comprises a voice recognition time decision maker 41, an output decision maker 42 and an out gate 43.Voice recognition time decision maker 41 receives voice recognition result 12 and transmits the duration, and in this duration, recognition result is output to output decision maker 42 from voice recognition device 2.Output decision maker 42 receives from the period of speech detection signal 14 of period of speech pick-up unit 4 and from the output valve of voice recognition time decision maker 41.The operating process of output decision maker 42 is described now with reference to Fig. 7.
Output decision maker 42 is according to the period of speech detection signal 14 that receives, by before period of speech and add a threshold time length 71 that is used to estimate purpose (hereinafter referred to as " estimating purpose period of speech 72 ") afterwards and produce a period of speech 72 that is used to estimate purpose (hereinafter referred to as " estimating purpose threshold time length 71 ").Output decision maker 42 determine voice recognition results 12 from duration of voice recognition device 2 outputs whether in the scope of estimating purpose period of speech 72.If this duration in the scope of estimating purpose period of speech 72, is then determined to be input in the acoustic input dephonoprojectoscope 1 and the sound of being discerned by voice recognition device 2 is sent by the speaker.The result of determination that output decision maker 42 is made is exported to control device 6 as signal 15.
When the voice recognition device 2 that will take into account the time need be discerned processing, need to stipulate to being used to produce the evaluation purpose threshold time length 71 of estimating purpose period of speech 72.This be because the duration that sound is known result's 12 outputs be used as determine identification sound whether from the speaker.
When instruction picked up signal 15 that the common bluebeard must send corresponding to the speaker, control device 6 is according to the instruction of input, by the radio control signal control dirigible 7 of output.
Therefore,, send between order period the lip motion detection period of speech when sending instructions according to him the speaker according to first embodiment.Determine that according to the period of speech that detects the sound that is identified is that operator or other people (for example another one people) send.Therefore can avoid wrong identification, so just can prevent that controll plant (for example dirigible) from making incorrect action by other people's pronunciation.
Therefore be appreciated that, can utilize speaker's (operator) sound that game device is handled, so just can make speaker (operator) finish the operation of expection.But, on the basis that LED and photodiode are used in combination, utilize a kind of simple structure and method to detect speaker's lip action according to present embodiment.Therefore, with tradition and utilize video camera or similar device to catch in a minute the game device of the image of mouth lip action to compare, the cost of this game device is very low.Be appreciated that the also available phototriode of photodiode replaces.
Fig. 2 and circuit diagram shown in Figure 3 just are used to illustrate.The present invention is not limited to this specific circuit.Also can utilize computer software in addition.
Embodiment two
According to the game device of second embodiment, only according to speaker's (operator of game device) lip action, rather than the sound input instruction that sends of speaker, so that according to this input instruction control dirigible.So the game device of present embodiment can be used on noise circumstance or do not allow the situation of the careless sounding of speaker (for example at midnight, and also can be used by the people that aphasis is arranged.
Fig. 8 schematically expresses the structure of the game device of present embodiment.As shown in Figure 8, the game device of present embodiment comprises the visual input unit of identical one of game device with first embodiment 3, a controller 6 and a dirigible 7.This game device also comprises a labiomaney device 81 that is used for discerning the word that is sent by speaker (operator).
Fig. 9 represents a kind of typical structure of labiomaney device 81.In the present embodiment, labiomaney device 81 comprises a differentiating circuit 31, difference counter 91, a database 92 and a figure adaptation 93.This differentiating circuit 31 is identical with the voice cycle detecting device 4 interior differentiating circuit 31 that comprised among first embodiment.
Difference counter 91 is chosen the differential signal 33 of output from differentiating circuit 31 at interval at preset time, to obtain to comprise one group of data from the sample survey of many sampling of datas, and poor between the computational data sampling then.Result after will subtracting each other (hereinafter referred to as " one group of differential data ") delivers to database 92 and figure adaptation 93 from difference counter 91.Database 92 uses when storing one group of differential data of reference pattern (master drawing) for identification.Derive apart from difference between the tablet pattern differential data group that one group of differential data of each reference pattern of figure adaptation 93 from be stored in database 92 and plan are discerned.When speaker's lip moves, the word that the difference that 93 identifications of figure adaptation are derived according to it has been imported.Clearly, when above-mentioned difference reduced, the reliability of identification then can increase.
To describe the method for operating of the game device of present embodiment below in detail.In this embodiment, labiomaney device 81 is discerned the word (as mentioned above) of input by benchmark figure and tablet pattern.Therefore, before carrying out identifying operation, must in advance reference pattern be recorded in the labiomaney device 81.
(recording operation)
At first, visual input unit 3 receives and sends from light emitting diode (LED), the light that reflects from a speaker's (operator) lip then, and electric signal 13 is exported to labiomaney device 81 according to the action of lip.Electric signal 13 is imported in the differentiating circuit 31 of labiomaney device 81.On behalf of the differential signal 33 of electric signal 13 intensity of variations, differentiating circuit 31 will deliver to difference counter 91.This operating process proceeds to that this is identical with first embodiment.
The operating process of difference counter 91 is described with reference to Figure 10 below.At first, difference counter 91 is selected differential signal 33 in the time interval (△ t).Then, difference counter 91 calculates the difference between the adjacent data sampling in the data from the sample survey group of obtaining.With the difference between each sampling of data that calculates, promptly one group of differential data is exported to database 92.Database 92 stores this group differential data.This operating process repeats predetermined times.The number of times that repeats equals word quantity (kind) to be identified.Therefore, should store the differential data group of all kinds.These differential data groups that store are just as the reference pattern that is used to discern.In the present embodiment, the instruction that is used for controlling object is " Mae " (forward), " Ushiro " (backward), " Migi " (to the right), " Hidari " (left), " Ue " (making progress) and " Shita " (downwards).That is to say, use 6 instructions.Therefore, the process that stores the differential data group will repeat 6 times.At last, in database 92, there are 6 reference patterns.
After being recorded in all reference patterns in the database 92 by this way, database 92 is just checked each differential data group and is chosen Cycle Length.On this Cycle Length, with the corresponding data of lip action part be continuous with respect to each differential data group.When particularly to approach zero numerical value in the time of determining in the differential data group be continuous, database 92 was determined and motionless corresponding those data of cycle of lip.When Cycle Length that will be corresponding with the lip action part extracts as whole reference patterns fully, select reference pattern, and determine the differential data group length (N) of this maximum length as reference pattern with maximum length.So just, finished record, and the differential data of reference pattern has been kept in the database 92.
(identifying operation)
Identical with recording operation from the step of input lip action to the operation of the step of obtaining differential signal 33.Describe the process in the operation input difference counter 91 after the differential signal 33 with reference to Figure 11 below.
The differential signal 33 of input in difference counter 91 is to choose by the mode (△ t) in a time interval identical with when record.Then, equal sampling of data in cycle of reference pattern differential data group length (N), then calculate poor between the adjacent data sampling for length.The differential data group of obtaining is handled as the differential data group in this cycle.The difference cycle that calculates moves forward △ t along time shaft.Figure 11 only is illustrated in the differential data group that the cycle 111 and 112 obtains.Cycle 111 originates in first sampling of data and has length N, and cycle 112 from cycle 111 moves forward N/2 along time shaft.
When obtaining length respectively after the differential data group (hereinafter they being called recognition differential divided data group) for a plurality of cycles of N, these recognition differential divided data groups are delivered to figure adaptation 93.Figure adaptation 93 reads reference pattern from database 92, and obtains the distance between the corresponding data set in reference pattern and a plurality of recognition differential divided data group.In the present embodiment, owing in database 92, write down six reference patterns (as mentioned above) in advance, so figure adaptation 93 calculates the distance of each recognition differential divided data according to reference pattern.
Utilize following formula to calculate distance between a recognition differential divided data group and the reference pattern: d j = Σ ( r i - p ij ) 2 i = 1 N
In this formula, r iRepresent i recognition differential divided data group, p IjRepresent j reference pattern (corresponding to j kind), d jRepresent the distance between recognition differential divided data group and j the reference pattern.As distance d jIt is a predetermined value or during less than predetermined value, then figure adaptation 93 judges that these recognition differential divided data groups are complementary with j reference pattern.To export as result of determination with the corresponding signal 82 of j kind (word) then.
Result of determination is input in the controller 6.Controller 6 is exported one and the corresponding radio control signal of j kind again, so that control dirigible 7.
As mentioned above, in the present embodiment, only import word (instruction), and control this dirigible by the word of identification according to the action recognition of lip.Therefore, the present invention can be used on the situation that noise circumstance or speaker's inconvenience are spoken, and the people with aphasis also can use.
Similar to first embodiment, the visual input unit 3 of input lip action is combined by a LED21 and a photodiode 22, so compare with the classic method of using video camera or similar devices to catch lip action image, the cost of this game device is very low.
In the present embodiment, the user of game device should write down the reference pattern that is used for recognition instruction in advance before input instruction.For example, in production technology, technology of the package or the similar technology of game device, the reference pattern that will adapt to any nonspecific user's lip action in advance is recorded in the database 92, so just can omit user's recording operation.
Embodiment three
The game device of third embodiment of the invention will be described below.In this embodiment, instruction is to import by sound and speaker's (operator) lip action, and judgement is to carry out on the basis of the result combinations that will discern.So just can handle a dirigible.Therefore, even in noise circumstance, also can discern the instruction of sending fully by the speaker.
Figure 12 schematically illustrates a kind of structure of present embodiment game device.The game device of this embodiment game device and first embodiment is similar, comprises an acoustic input dephonoprojectoscope 1, image input device 3, control device 6 and one 's dirigible 7.The game device of the 3rd embodiment also comprises a sound processing apparatus 121 and a lip treating apparatus 122 in addition.Sound processing apparatus 121 is discerned the sound of input by the mode identical with the voice recognition device 2 of first embodiment, calculates the fiduciary level of recognition result then.Lip treating apparatus 122 also calculates the fiduciary level of recognition result then by discerning the word of importing with labiomaney device 81 same way as of second embodiment.To all import total decision maker 123 from the output signal of sound processing apparatus 121 and lip treating apparatus 122.Total decision maker 123 is according to the definite instruction by speaker's input of fiduciary level of recognition result and this recognition result of each treating apparatus 121 and 122.The total result of determination of total then decision maker 123 outputs.
Hereinafter will describe the operating process of present embodiment game device in detail.
The sound that input is sent by speaker (operator of game device), and will to deliver to the step of the step of sound processing apparatus 121 and first embodiment identical with the corresponding electric signal 11 of sound of input.The sound that sound processing apparatus 121 receives electric signal 11 and imports according to this signal identification.Can adopt any traditional known method as sound identification method.For example, when sending respectively instructing of to be transfused to, can adopt the lip electric signal 11 that recognition device obtains in a like fashion with the foregoing description, and handle this electric signal 11, so that obtain one group of data.This data set is noted in advance as reference pattern.Send when instruction when the operator of game device is actual, distance between data set to be identified that calculating is obtained when handling electric signal 11 and the reference pattern that all write down in advance utilizes this method can discern the command content of importing by acoustic input dephonoprojectoscope.After sound recognition in the manner described above, sound processing apparatus 121 can obtain one and show recognition result reliability fiduciary level how.Then voice recognition result and fiduciary level are offered total decision maker 123 as output 124.How will describe fiduciary level below obtains.
In the sound of handling input, also the signal of expression lip action is handled.At first, image input device 3 is imported the action of lip in the identical mode of first embodiment, and exports the electric signal 13 of its level with the action change of lip.Lip treating apparatus 122 receives electric signal 13, and handles by the identical mode of second embodiment.In the present embodiment, result as figure coupling between recognition differential divided data group and the reference pattern, when recognition differential divided data group is determined when being complementary with j reference pattern, lip treating apparatus 122 according between recognition differential divided data group and j the reference pattern apart from d jCalculate the fiduciary level of this recognition result.The recognition result and the fiduciary level that so obtain all are transported to total decision maker 123.
The method of calculating fiduciary level will be described below briefly.In the present embodiment, the fiduciary level of voice recognition result and labiomaney result's fiduciary level is to utilize the processor (not shown) with same structure to adopt identical disposal route to calculate.To explain the computation process of voice recognition result fiduciary level below.Adopt Three Estate here in " greatly ", " " and " " estimate the fiduciary level of voice recognition result for a short time.Note that " little " grade shows that the fiduciary level of recognition result is the highest, and " greatly " grade shows that the fiduciary level of recognition result is minimum.In this case, " little " and " in " use threshold alpha between the grade L, " in " and " greatly " between use threshold alpha HL<α H).Will comparing between reference pattern judgement and that object to be identified is complementary and this object to be identified apart from d and threshold value.If comparative result is d<α L, judge that then fiduciary level is grade " little ".If comparative result is α L≤ d<α H, then judge fiduciary level be grade " in ".If comparative result is d 〉=α H, then fiduciary level is judged to be grade " greatly ".Similarly, for the recognition result of making, need by comparing to determine the grade of fiduciary level with threshold value according to lip motion.The method of calculating fiduciary level is not limited to said method, also can use any other known method as required.
The operating process of total decision maker 123 is described with reference to Figure 13 below.
Figure 13 is the ultimate principle of the total decision method of expression.At first, total decision maker 123 detects the time (promptly produce the time of output 124) of voice recognition result from sound processing apparatus 121 outputs, and the labiomaney result is from the time (promptly producing the time of output 125) of lip treating apparatus 122 outputs.Before detected output time respectively and afterwards, produce with the 131 corresponding cycles of predetermined threshold and to estimate purpose cycle 132a and 132b by increasing by one.Whether the evaluation purpose cycle 132b of definite labiomaney result's evaluation purpose cycle 132a and voice recognition result is overlapping then, if cycle 132a and 132b are overlapping, total decision maker 123 determines that the sound that institute imports and discerns is the sound that is sent by the operator that the input lip moves.If cycle 132a and 132b are not overlapping, the sound of then judging identification is neighbourhood noise or the sound that sent by any other people rather than operator.So just can prevent to make the wrong identification of the sound that sends by other people rather than by the operator.
Total decision maker 123 determines whether the labiomaney result is complementary with voice recognition result.If their couplings just can be handled recognition result as total result of determination (the total result of determination " Mae " among Figure 13 (i.e. " forward ")).If they do not match, then determine total result of determination according to being used for the fiduciary level that each recognition result obtains.Figure 14 A and 14B illustrate the combined action of recognition result and by a kind of typical relation between total result of determination that this combined action determined.As mentioned above, in this embodiment, evaluation is to utilize expression lowest reliable degree, and Three Estates such as " greatly among ", represent high-reliability " little " and represent between the moderate fiduciary level between them " " carries out.Figure 14 A is illustrated in the relation under the situation that is equal to each other between the fiduciary level, and gives preferential situation to voice recognition result; Figure 14 B is illustrated in the relation under the situation that is equal to each other between the fiduciary level, and gives preferential situation to the labiomaney result.Which kind of recognition result decision selects for use to depend on the factors such as ambient condition of direct game device.In some cases, can in game device, pre-determine and select which kind of recognition result for use.In some cases, game device is made the structure that can be determined to select for use which kind of recognition result by the operator.For example, be not used under the lower noise conditions, should give voice recognition result preferential by the situation of Figure 14 A when the operator has disfluency and game device.When the operator has disfluency or game device to be used in the very high environment of noise, be suitable for the situation of Figure 14 B.Total decision maker 123 outputs are fixed in a manner described, as total result of determination of signal 15.Step in the end, control device 6 is exported wireless control signal according to always declaring the result, so that control dirigible 7.
As mentioned above, according to present embodiment, not only sound is discerned, also the action of lip is discerned, and this identification is the comprehensive identification that utilizes two kinds of recognition results at the same time, so just may discern the speech (instruction) that the speaker sends under noise circumstance.Simultaneously, this embodiment people of also having the disfluency of making can utilize the sound control function to use the effect of this game device.In addition, similar with first and second embodiment, the action of lip utilizes LED21 and photodiode 22 common detections, so be appreciated that, compare with the similar device classic method of utilizing video camera or other to be used to catch lip action image, the cost of this game device is very low.Although this embodiment is not described in detail, the user of game device can be by the reference pattern of the identical method record labiomaney of second embodiment.In addition, preparation can be omitted the record of being done by the user applicable to the reference pattern of any unspecified person.
In first to the 3rd embodiment, to having done typical description by the game device of radio control signal control dirigible 7.Be appreciated that the applied game device of the present invention is not limited to this specific forms.For example, if the structure of describing among above-mentioned any embodiment is provided for each operator, then can handle game device simultaneously by a plurality of operators.
The various details input media.Figure 15 is the structural representation of expression input media of the present invention.Input media of the present invention comprises that 154, one of head-telephones are connected support bar 155 and the electronic unit 153 that photodiode 151 and LED152 are housed on the head-telephone 154.Electronic unit 153 is connected on the support bar 155 with predetermined angle (sees Figure 15 A).By suitably regulating the angle between electronic unit 153 and the support bar 155, can change the direction of light from the LED152 emission in irradiation operator lip district.The rayed operator's that the input media utilization is sent from LED152 lip district, and detect the light of photodiode 151 reflections, so that the action of input lip.This input media can be used as the image input device of above-mentioned first to the 3rd embodiment.(see Figure 15 B) if a microphone 156 is arranged on the electronic unit 153, then input media can be used as acoustic input dephonoprojectoscope again.
The input media that is unkitted microphone in shown in Figure 15 A its can be used as the image input device of second embodiment.The input media that microphone is housed in its shown in Figure 15 B can be used as the device that plays acoustic input dephonoprojectoscope and image input device function among the first and the 3rd embodiment.As mentioned above, input media of the present invention has adopted photodiode 151, LED152 and microphone small-sized, very light in weight, so whole input media size is also very little, weight is also very light.In addition, employed element all is cheap, so whole input media can be with low-cost production.In addition, input media of the present invention be utilize head-telephone 154 be fixed on the operator the head on.Like this, the relative position relation between operator and photodiode 151 and the LED152 is a basic fixed, therefore can stably import the action of lip.But in input media of the present invention, light is used to import the lip action, and reflected light has been transformed into electric signal and export this electric signal.Compare with traditional input media, the structure of this input media is simpler, and traditional input media for example a kind ofly replaces the device of lip action or a kind ofly utilizes hyperacoustic device all to have the big and baroque defective of volume inevitably with input imagery.
Input media of the present invention is provided with single photoelectric diode and single led.In addition, a plurality of photodiodes and a plurality of LED also can be set.For example,, and their cruciforms are arranged, then can be detected direction of motion in one plane if be equipped with two cover LED and photodiodes.
As mentioned above, according to the present invention, the operation that utilizes people's sound to expect, and operate this game device and do not need a twist of the wrist.In addition, the word of input (instruction) is not only by voice recognition, and the action recognition by lip also is so even also can realize stable operation in noise circumstance.And the action of lip is to utilize the assembly of LED and photodiode (a kind of phototransistor) to detect, so compare the low production cost of its whole device with using video camera, ultrasound wave or analogue.
In addition, as described in first embodiment, operator's period of speech is to utilize the motion detection of lip, and this cycle be used for voice recognition, so avoid discerning mistakenly the sound that sends by other people rather than by the operator.As described at the second and the 3rd embodiment, if word (instruction) according to the input of the Motion Recognition of lip, and utilize this recognition result to control dirigible, the present invention can be applicable in the noise circumstance so, or be used under the situation of operator's inconvenience sounding, and there are those people of aphasis also can use.
In input media of the present invention, the light emitting diode (LED or like) of cheapness and the photoelectric detection system (photodiode or like) of cheapness are connected on light-duty head-telephone, support bar and the electronic unit.Therefore, this input media can reach in light weight and the low purpose of cost.
In first to the 3rd embodiment, the exemplary embodiments according to the motion of the sound of identification or the action controlling object of lip has been described.Yet, not being only limited to aspect the motion by the operation of sound or lip action controlling object, some instruction etc. is for example returned in its also other operation of may command.To describe various types of sound according to identification below makes controlling object carry out the various operations device of (comprising motor performance).
The sound that to describe below according to identification makes controlling object carry out the embodiment of the device of various operations (comprising motor performance).
The 4th embodiment
The sound according to identification that hereinafter will describe in the present embodiment is selected one group of output sound instruction from a series of output sounds of preparing for sound recognition, and exports the device of selected sound.
Figure 16 schematically illustrates the structure of the sound selecting arrangement 100 of present embodiment.This sound selecting arrangement 100 comprises that a randomizer 101, selector switch 102, I/O status register 103, a state change device 104 and an I/O slip condition database 105.I/O slip condition database 105 is stored many I/O state tables in advance.Each I/O state table all comprises an input x (x is a nonnegative integer) under state s, and the group sp of n the output word of an input x (0≤i<n (s)) (x, i).Figure 17 A and 17B show typical I/O table.During beginning the original state table 201 shown in Figure 17 A is stored in the I/O status register 103.Randomizer 101 is identified for selecting from the word group of output the value i as a speech of voice output.
The operating process of sound selecting arrangement 100 will be described below.When with x when the outside is input to selector switch 102, this selector switch 102 is with reference to the I/O state table that is stored in the I/O status register 103, and select with import the corresponding output phrase of x sp (x, i).Then, selector switch 102 makes randomizer 101 produce a random number r (n (s)) (0≤r (n (s))<n (s) here), and definite i=r (n)) so that phrase sp (x, i) word of middle selection from exporting.Therefore, selected output word is exported from the outside as sound.
Be not only outside output from the output word of selector switch 102, but also the state that outputs to changes in the device 104.When state changed device 104 and receives output from selector switch 102, it was with reference to I/O slip condition database 105, and with the content of I/O status register 103 become with selector switch 102 in the corresponding I/O state table of output word.For example, when word " Genki? " (promptly " how do you do? ") during as the voice output of original state 201, state changes device 104 with reference to I/O database 105, and extract with export " Genki? " the table of corresponding I/O state 202.The table of taking-up state 202 is stored in the I/O status register 103.
By this method, the sound selecting arrangement 100 of present embodiment is according to the data output and the corresponding sound of word that utilizes random number to select of input.Therefore, utilize sound selecting arrangement 100 can constitute a simple interactive system.In addition,, and wherein omitted state change device 104 and I/O slip condition database 105 as shown in figure 18, then the response of input has only been carried out once if use a kind of sound selecting arrangement 100a with simplified structure.
Sound selecting arrangement 100 and 100a can be used as the sound selecting arrangement 1202 of audio response device shown in Figure 27, and can link to each other with voice recognition device 1201.More particularly, when voice recognition device 1201 identifies sound first, recognition result is input to sound selecting arrangement 1202 with the identification number that is added in the sound.Sound selecting arrangement 1202 is taken the identification number of input as input x, and selects a word arbitrarily from output word group.Send one and the corresponding sound of selected word then.Can constitute audio response device 1203 with this.In audio response device 1203, behind a certain sound of input, respond this voice output sound again.But in this audio response device 1203, can make various reaction for identical sound import.When for example voice recognition device 1201 output is as the sound " Ohayo " (i.e. " good morning ") of recognition result when being in original state at sound selecting arrangement 1202, will be used for representing that the identification number 1 of sound " Ohayo " inputs to sound selecting arrangement 1202 as importing x (seeing Figure 17 A).Sound selecting arrangement 1202 according to the input x from comprise two output words " Ohayo " and " Genki? " group sp (1, select arbitrarily a word in i), and send corresponding to the sound of selecting word.
In audio response device 1203, before practical operation, necessary record can be by the sound of sound selecting arrangement 1202 as the input reception.If will be input to sound selecting arrangement 1202, then can from sound selecting arrangement 1202, export for example " Nani " words such as (i.e. " what ") with the corresponding sound of word in the record group that is not included in input.With the device among the 3rd embodiment as under the situation of voice recognition device 1201, if the reliability of the sound of identification is lower, then can from sound selecting arrangement 1202, exports and be used for solicit operation person's sound of sound import once more.
As mentioned above, in sound selecting arrangement of the present invention, need to prepare some and represent the table of I/O state, and change the I/O state according to the rule of input and output in advance.Therefore, if use sound selecting arrangement of the present invention, then can realize a kind ofly carrying out simple mutual device.In addition, in this sound selecting arrangement, need to give a sound import to prepare some output candidate words, and can from these candidate's words, select a word arbitrarily.Then, send and the corresponding sound of selected word.Thereby can provide same reaction is not made in an input, but can make the audio response device of differential responses.
The 5th embodiment
Various details direction detection device and direction selecting arrangement.
At first direction detection device 400 is described with reference to Figure 19.This direction detection device 400 comprises a direction detector 401 and some microphones 402 that links to each other with this direction detector.Microphone 402 is connected on the controlled object.The operating process of direction detection device 400 under being provided with the situation of four microphones, is described below in supposition.When this four microphone m (i) (i=0,1,2,3) sound import, direction detector 401 with sound import sp (m (i), t) divide framing f (m (i), j) 501 (0≤j), as shown in figure 20.For example the length of a frame can be decided to be 16ms.Then, direction detector 401 each frame all obtain a acoustic energy e (m (n), j), and with the acoustic energy e that obtains (m (n), j) sequential storage is in the cyclic store (not shown) with length l (for example length is 100).At this moment, when the energy of direction detector 401 every storage one frames, just obtain to supply with the gross energy of the former frame of each microphone, and detect which microphone and have maximum gross energy.Then, direction detector 401 should be compared with the threshold value That is determined by experiment in advance by the maximum gross energy.If maximum gross energy, determines then that the direction from direction detector 401 to microphone equals the direction of sound origination greater than threshold value The.Count i from direction detector 401 outputs as the microphone of determining of sound input direction.
The use (as shown in figure 28) if the direction detection device 400 and the operating means 1302 of operation in a manner described link together then can constitute the audio response device 1303 that a direction according to sound origination is carried out scheduled operation.Particularly be connected on the operating means 1302 so that when operating this object and direction detection device 1301 (400 among Figure 19) when an object (for example a balloon or a stuffed toy), then this object moves or facing to this direction along the sound origination direction.So just, constituted a device of aiming at the scheduled operation of voice directions according to sound.
An embodiment of aforesaid operations device 1302 comprises that three have the motor that is connected the screw propeller on the object and the drive unit of these motor.If input object will travel direction, three motor of this device control are so that move object in the direction.
With reference to Figure 21 the direction selecting arrangement is described below.Direction selecting arrangement 600 comprises a drift computer 601, an azimuth compass 602 and a target direction storer 603.Direction selecting arrangement 600 can be used as a device that is used for the direction that controlling object direction of motion and/or object face.During the input value x (x is a nonnegative integer) of the direction of facing when indicated object direction of motion of input or object, drift computer 601 is exported an off-set value according to this input value kx and the table that is stored in advance in the drift computer 601.This output offset value is added in the object actual direction of being measured by azimuth compass 602 this moment goes, and then deliver to target direction storer 603.The direction that the result that target direction storer 603 will replenish faces as the direction or the object of object motion stores.
As mentioned above, the direction detection device among Figure 21 is used for changing according to the current travel direction of object or the current direction of facing of object the direction of this object.
If replace direction selecting arrangement 600 shown in Figure 21 with direction selecting arrangement shown in Figure 22 700, object is constant with respect to the relative direction of current direction so, but absolute direction can change.In the direction selecting arrangement 700 of Figure 22, direction calculating device 701 receives the input value x (x is non-negative integer) in expression absolute direction (for example north), output and the corresponding value of this input value x then from the outside.This output valve directly is stored in the target direction storer 603 as target direction.Similar with drift computer 601, direction calculating device 701 can be preserved some values of the absolute value of representing input value x, and this input value x occurs with the form of showing.After in target direction is stored in storer 603, direction selecting arrangement 700 utilizes the current direction of azimuth compass 602 proceeding measurements in object motion or during turning to.Again the difference between direction of measuring and the direction that is stored in the target direction storer 603 is exported.If on the basis of this output valve, this object is carried out FEEDBACK CONTROL, then according to target absolute direction make object motion or according to target direction object is turned to.
The use (as shown in figure 29) if above-mentioned direction selecting arrangement and voice recognition device and operating means link together then can constitute an audio response device 1402.The voice recognition device of any kind, for example traditional voice recognition device and the voice recognition device that matches with the game device of first to the 3rd embodiment all can be used on the audio response device 1402.In audio response device 1402, when the positive dirction of passing through the sound input object or direction of motion, the positive dirction of object or direction of motion change according to the sound of input.In audio response device 1402, with the input as direction selecting arrangement 1401 of the recognition result of voice recognition device 1201, and the output of direction selecting arrangement 1401 is as the input of operating means 1302.Therefore, in the positive dirction that object is current or direction of motion with when target direction is compared, the operation of controllable objects.
For example, the direction dictates of being exposed to the north is a zero degree, and is positive dirction towards the direction dictates in east.
Consider the situation of object now in the face of the zero degree direction.In the present embodiment, direction selecting arrangement 600 (seeing Figure 21) is used as direction selecting arrangement 1401.Word " Migi " (i.e. " to the right ") is stored in the drift computer 601 of direction selecting arrangement 600 with the table that+90 degree are associated, in this case, when the sound that utilizes voice recognition device 1201 will determine target direction as word " Migi " when discerning, direction selecting arrangement 600 is delivered to operating means 1302 with an output.This output indication operating means 1302 makes object turn 90 degrees towards east from current positive dirction or direction of motion.At this moment, direction selecting arrangement 600 is always compared current positive dirction or direction of motion during direction changes with target direction.Control operation device 1302, and utilize the output of direction selecting arrangement 600 to make the positive dirction of object or direction of motion change to target direction.
In addition, if with direction selecting arrangement shown in Figure 22 700 as direction selecting arrangement 1401, with the word " Kita " of expression absolute direction (replace word to " north ") or " Nansei " (promptly to " southwest ") " Migi " or " Hidari " imports as the word of representing target direction.At this moment, direction selecting arrangement 700 will be imported 0 degree of word " Kita ", or-135 degree of input word " Nansei " are stored in the target direction storer as the absolute object direction, and carry out aforesaid operations.The target direction is here spent to+180 degree scopes-180.The direction detection device of present embodiment and direction selecting arrangement can combine with operating means.In this case, as shown in figure 30, the testing result of direction detection device 1301 is used as the input of direction selecting arrangement 1401, and the output of direction selecting arrangement 1401 is used as the input of operating means 1302.So just, can constitute an audio response device 1501, wherein when the current positive dirction of object and direction of motion and target direction compare, the direction that the positive dirction or the direction of motion of object is changed over sounding.
The 6th embodiment
The device relevant with voice recognition in the present embodiment described now.As shown in figure 26, this device comprises a sound end point determination device 1101, sound detection device 1102, characteristic quantity draw-out device 1103, a distance calculation device 1104 and a dictionary 1105.
Sound end point determination device 1101 at first will be described.Sound end point determination device 1101 receives and the corresponding signal of sound import, and according to this input sound terminal point.In this manual, term " sound terminal point " refers to the time of sound end of input.
The sound end point determination device 1101 of present embodiment is connected on the acoustic input dephonoprojectoscopes such as microphone.When sound s (t) is when importing by acoustic input dephonoprojectoscope, sound end point determination device 1101 is with sound import s (t) branch framing f (i) (i is a nonnegative integer) (as shown in figure 23).Just obtain the energy e (i) of each frame then.In Figure 23, curve 801 expression sound s (t), curve 802 expression energy e (i).When importing the sound of a frame, the frame number that sound end point determination device 1101 is scheduled to regard to utilization obtains the energy variation of the former frame from the present frame to the present frame then, and this energy variation is compared with the threshold Thv that is determined by experiment in advance.After comparison, if energy when major part changes to smaller portions and threshold value intersect, this intersection time point just is defined as the sound terminal point.
To be described as below pre-determining the time cycle and from each frame energy, obtain the method for energy variation.The method of using cyclic store is at first described.The energy that will obtain in each frame is stored in the cyclic store 803 with length l in proper order.When obtaining the energy of a frame, utilize the energy of predetermined period of time, so that obtain energy variation with reference to present frame former frame in the cyclic store 803.Also have another kind not use cyclic store to obtain the method for energy variation.In the method, sound end point determination device 1101 stores the mean value m (i-1) and the variable quantity v (i-1) of the predetermined frame number of each frame of front.When obtaining the energy of a new frame, new energy e (i) is just replaced by new average energy m (i) with the weighted sum (i) of previous average energy m (i-1).Similarly, previous variable quantity v (i-1) and | e (i)-m (i) | weighted sum replaced by new variable quantity v (i).May obtain false variation by this way.Here, attenuation constant α is used as weighting, and obtains new mean value and new variable quantity on the basis of following equation, wherein α is 1.02: m ( i ) = m ( i - 1 ) α + α - 1 α ( i ) v ( i ) = v ( i - 1 ) α + α - 1 α | e ( i ) - m ( i ) |
Adopt this method not need cyclic store, so just saved storer.In addition, when obtaining new energy, can be omitted in the predetermined period of time is to obtain the operation that the energy summation is carried out, and therefore can shorten processing time period.
Use description to obtain the sound detection device 1102 in actual cycle of sending of sound below.In order to obtain this cycle, except the cyclic store 803 that is provided for storing energy, also provide a cyclic store 902 that is used to store level and smooth energy.As shown in figure 24, when obtaining the energy of a frame, energy 802 just is stored in the storer 803, and level and smooth energy 901 just is stored in the storer 902.When obtaining sound terminal point 903 as stated above, the Changing Pattern of energy and level and smooth energy just is kept in these cyclic stores 803 and 902.Therefore, (for example corresponding with two seconds length then can be preserved the energy of next speech if the length of each cyclic store is decided to be long enough.Sound detection device 1102 utilizes energy and the level and smooth actual sounding cycle of power extraction that is stored in these storeies.The extraction in cycle is undertaken by following program.At first should determine threshold value Th (as mentioned below).Again with this threshold value from the old to the new sequentially be stored in cyclic store 803 in energy compare.The energy point that surpasses this threshold value at first is defined as sounding cycle starting point.By opposite mode from newly comparing to the once visited place.To be defined as sounding cycle terminal point with the point of this threshold crossings at first.Just can extract the sounding cycle in such a way.
The method of determining threshold value Th is described below.At first when detecting the sound terminal point, obtain ceiling capacity max1001 and the level and smooth energy min1002 of the minimum in the storer 902 in the storer 803.Utilize these values and can obtain threshold value Th according to following formula, the value of β is about 0.07 here:
Th=min+β(max-min)
Here adopted the method for getting intermediate value in the fixed window as the method for calculating level and smooth energy.Yet smoothing method is not limited to this method, also can adopt for example a kind of method of averaging.In the present embodiment, do not use maximum level and smooth energy, and use ceiling capacity, so that obtain threshold value Th.Its reason is: if utilize maximum level and smooth energy to ask threshold value Th, then when the word length variations, peaked variation is very big, so threshold value Th also changes.The result can not obtain good sound and detect effect.In addition, utilize minimum level and smooth energy calculated threshold Th can avoid detecting not to be the noise of the sound that sends by the operator.
As mentioned above, sound detection device 1102 extracts the sounding cycle, i.e. the corresponding part of sound in extraction and the input signal.
Then, characteristic quantity draw-out device 1103 extracts the characteristic quantity that is used to discern from the sound that detects.Under the situation of utilizing energy to calculate, try to achieve the characteristic quantity of each frame and be stored in the cyclic store.Here, " characteristic quantity " is an eigenvector that comprises three components, the zero crossing number of the zero crossing number that these three components are original sound signal s (t), the differential signal of original sound signal s (t) and the logarithmic difference of original sound signal s (t) the energy e (i) between two frames.
To be input in the distance calculation device 1104 by the characteristic quantity vector that sound end point determination device 1101, sound detection device 1102 and characteristic quantity draw-out device 1103 are obtained.These distance calculation device 1104 contrast characteristic quantity vectors are previously recorded in each the sound characteristic amount vector in the dictionary 1105 and will proofread and correct the result and export with the tidemark as recognition result.Can obtain Euclidean distance between the vector simply with this inspection method, perhaps can adopt the method for DP coupling.
In said method, device of the present invention is used to carry out voice recognition.The voice recognition device use that can link together with the sound selecting arrangement 1202 among the 4th embodiment shown in Figure 27, perhaps also can with the use that links together of the direction selecting arrangement 1401 of the 5th embodiment shown in Figure 29 and operating means 1302.In addition, if voice recognition device gets simply and combines (as shown in figure 31) with operating means 1302, then can constitute an audio response device 1601, wherein the result of voice recognition device 1201 is used as the input of operating means 1302, thereby whole device is moved along target direction.The voice recognition device of present embodiment can be used with the game device of first to the 3rd embodiment.So just can utilize operator's sound operate game device.In addition, if in the described audio response device that comprises voice recognition device 1201 of the 4th to the 6th embodiment will, signal transmitting apparatus 1701 is added on the voice recognition device, and on the operating means 1302 after signal receiving device 1702 is added to sound selecting arrangement 1202, direction selecting arrangement 1401 or is connected voice recognition device in each configuration (shown in figure 32), the hand-held remote controller that then only needs voice recognition device to be used as the operator just can carry out remote control to target, can transmit with infrared ray or radio and do the signal transmission.
By the tut reaction unit being fixed on the balloon, just can connect or control this balloon each other with this balloon, so just can be made into can effectively utilize balloon the toy of intrinsic center heat characteristic.
As shown in figure 33, prepare two balloons, the audio response device that tut recognition device and sound selecting arrangement are housed is fixed on these two balloons, and these two audio response devices are not with the interconnected system of physiognomy, but between them, connect each other.The toy of making like this, its inner element can be got in touch mutually automatically.In addition, prepare some balloons that reaction unit is housed 1801, and they can be connected each other.At this moment, in the voice recognition process, have removing function, then can make the specific word of balloon response if each balloon of audio response device is housed.One of them balloon in a plurality of balloons of making in such a way can respond some specific word.For example, can give a name for respectively balloon 1801, each balloon 1801 just can be to representing that the sound of corresponding name reacts separately.As for elimination method, can calculate the distance of the inside dictionary that is used for voice recognition.And be determined by experiment threshold value, so that reject the distance that those surpass this threshold value.In addition, can in the audio response device, adorn a timer.When during the cycle, from the output sound in groups of record, selecting a sound group arbitrarily, and export selected sound group, so just made the toy that can utilize the starting of audio response device to connect function each other through a preset time.The object of control is not limited to balloon, also may command filled type toy, doll, photo or picture.In addition, controlling object also can be the picture that is moving in the show window.An also available antigravity device, rather than balloon is as controlling object (a for example helicopter or magnetic levitation linear electric motors automobile that suspends by screw propeller).
As mentioned above, according to the present invention, can utilize people's sound to carry out the nature operation, and this game device does not need the operator that a twist of the wrist is arranged.In addition, the word of input (instruction) is not only discerned by sound, but also the action recognition by lip.Like this, also can realize stable operation even be in the noise environment.In addition, the action of lip is that the combined action by LED and photodiode (a kind of phototransistor) detects, so compare with the situation of using video camera, ultrasound wave or similar device, whole device is available to be produced very cheaply.
In addition, in voice recognition device of the present invention, speaker's period of speech is by the motion detection of lip, and should the cycle as the identification of sound, so can prevent the wrong identification of making by other people rather than this speaker's sounding.In another recognition device of the present invention, word (instruction) according to the input of the action recognition of lip, and utilize recognition result control dirigible, so the present invention can be used under the environment of noise circumstance or speaker's inconvenience sounding, and those have the people of aphasis also can use.
In input media of the present invention, a photocell with low cost (for example LED) and a photodetector with low cost (for example photodiode) are connected on light-duty head-telephone, support bar and the electronic unit, so can constitute very light in weight, input media with low cost.
As mentioned above, sound selecting arrangement of the present invention can provide some I/O states in advance, and changes the I/O state by the rule of previous input and output.So just provide a kind can utilize the sound selecting arrangement to carry out the device of simple dialog.Sound selecting arrangement of the present invention provides some outputs for an input in advance, and selects an output arbitrarily from these outputs, therefore just can make various reaction to a kind of input.
Direction detection device of the present invention utilizes a plurality of microphone sound imports and detects the microphone that ceiling capacity can be provided.So just can detect the direction of sound origination.Utilize direction detection device of the present invention, can make accurately in accordance with regulations direction motion or in accordance with regulations directional steering accurately of object, utilize an azimuth compass to detect current location simultaneously.
In voice recognition device of the present invention, sound end point determination device is obtained the terminal point of sound at first roughly, and sound detection device automatically obtains threshold value then.At this moment, utilize the ceiling capacity of sound import and minimum level and smooth energy to determine threshold value, so, all can extract the useful sound cycle no matter how much length of period of speech is.When sound detection device utilizes threshold test sound, can from sound, obtain characteristic quantity, and sound be discerned according to this characteristic quantity.
By being carried out appropriate combination, above-mentioned each device can obtain various audio response devices.For example, when with the combination of voice recognition device and sound selecting arrangement, the audio response device of people's sound can be obtained to respond, so just man-machine conversation can be realized.When direction detection device and operating means combination, can be according to the sound manipulating objects.When voice recognition device, direction selecting arrangement and operating means are combined, object is accurately moved by the direction of sound indication, or make motion of objects change to the direction of sound indication.In addition, if signal transmitting apparatus is connected on the voice recognition device in the audio response device, and signal receiving device is connected to the rear portion of voice recognition device, when being fixed on the device on the object simultaneously, then can constitute the audio response device that can carry out remote control.
If a plurality of tut reaction units are provided, then can engage in the dialogue automatically between the audio response device in the toy of being done.If the audio response device is fixed on each balloon, balloon is become have the toy that inherent characteristic is heated at the center.If refill a timer, and make the structure of this device can after through the cycle sometime, export suitable sound, then can constitute the audio response device that can begin to talk with rather than respond people's sound.
Under the situation that does not break away from protection scope of the present invention and spirit, those skilled in the art can carry out various improvement row at an easy rate.Therefore, the protection domain of claims is not subjected to the restriction of above-mentioned instructions, but can carry out the more explanation of broad sense to claim.

Claims (8)

1. sound selecting arrangement comprises:
Be used to store first memory storage of a plurality of tables, and each table comprises and can select as each sound based on dialogue according to a plurality of words of exporting;
Be used for storing second memory storage of a table of a plurality of tables;
Be used for according to a input value, from be included in a plurality of words in the table that is stored in second memory storage, select a word, and will a selected word select the selecting arrangement exported as a sound from outside input; And
A table that is used for being stored in second memory storage converts another table of a plurality of tables that are stored in first memory storage to, and detects the conversion equipment of this table according to a selected word.
2. according to the sound selecting arrangement of claim 1, it is characterized in that also comprising the device that is used to produce a random number, wherein selecting arrangement utilizes this random number to select a word from a plurality of words.
3. sound selecting arrangement comprises:
The memory storage that is used for storage list, table comprise and can select as each sound based on dialogue according to a plurality of speech of exporting;
Be used for receiving an input value from the outside input, the selecting arrangement that utilizes random number to select a word and select a word to export as a sound selection from a plurality of words in the table that is stored in memory storage; And
Be used to produce the device of random number.
4. audio response device comprises:
Sound selecting arrangement according to claim 1; And
The voice recognition device that is used to receive sound, sound recognition and recognition result is exported to the sound selecting arrangement.
5. audio response device comprises:
Sound selecting arrangement according to claim 3; And
The voice recognition device that is used to receive sound, sound recognition and recognition result is exported to the sound selecting arrangement.
6. game device comprises one according to the audio response device of claim 4 be used for producing the device of the visual pattern that operation combines with the output of above-mentioned selecting arrangement.
7. game device comprises one according to the audio response device of claim 5 be used for producing the device of the visual pattern that operation combines with the output of above-mentioned selecting arrangement.
8. game device that comprises a plurality of audio response devices according to claim 4, thereby a plurality of audio response devices are configured to each other alternately engaging in the dialogue, each audio response device be used for the identification that comes according to another audio response device from described a plurality of audio response devices reception sound and output sound is selected.
CN95107149A 1994-05-13 1995-05-12 Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus Expired - Fee Related CN1132149C (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP99629/94 1994-05-13
JP99629/1994 1994-05-13
JP9962994 1994-05-13
JP274911/1994 1994-11-09
JP27491194 1994-11-09
JP274911/94 1994-11-09

Publications (2)

Publication Number Publication Date
CN1120965A CN1120965A (en) 1996-04-24
CN1132149C true CN1132149C (en) 2003-12-24

Family

ID=26440741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN95107149A Expired - Fee Related CN1132149C (en) 1994-05-13 1995-05-12 Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus

Country Status (6)

Country Link
US (2) US6471420B1 (en)
EP (1) EP0683481B1 (en)
KR (1) KR100215946B1 (en)
CN (1) CN1132149C (en)
DE (1) DE69527745T2 (en)
ES (1) ES2181732T3 (en)

Families Citing this family (95)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6202046B1 (en) * 1997-01-23 2001-03-13 Kabushiki Kaisha Toshiba Background noise/speech classification method
JP3112254B2 (en) * 1997-03-04 2000-11-27 富士ゼロックス株式会社 Voice detection device
CA2225060A1 (en) * 1997-04-09 1998-10-09 Peter Suilun Fong Interactive talking dolls
US7630895B2 (en) * 2000-01-21 2009-12-08 At&T Intellectual Property I, L.P. Speaker verification method
US6012027A (en) * 1997-05-27 2000-01-04 Ameritech Corporation Criteria for usable repetitions of an utterance during speech reference enrollment
DE19751290A1 (en) * 1997-11-19 1999-05-20 X Ist Realtime Technologies Gm Unit for transformation of acoustic signals
JP3688879B2 (en) * 1998-01-30 2005-08-31 株式会社東芝 Image recognition apparatus, image recognition method, and recording medium therefor
US6240381B1 (en) * 1998-02-17 2001-05-29 Fonix Corporation Apparatus and methods for detecting onset of a signal
US7081915B1 (en) * 1998-06-17 2006-07-25 Intel Corporation Control of video conferencing using activity detection
DE69941499D1 (en) * 1998-10-09 2009-11-12 Sony Corp Apparatus and methods for learning and applying a distance-transition model
CA2328953A1 (en) * 1999-02-16 2000-08-24 Yugen Kaisha Gm&M Speech converting device and method
JP3132815B2 (en) * 1999-04-21 2001-02-05 株式会社トイテック Voice recognition device for toys
US6771982B1 (en) 1999-10-20 2004-08-03 Curo Interactive Incorporated Single action audio prompt interface utlizing binary state time domain multiple selection protocol
US9232037B2 (en) 1999-10-20 2016-01-05 Curo Interactive Incorporated Single action sensory prompt interface utilising binary state time domain selection protocol
US6804539B2 (en) * 1999-10-20 2004-10-12 Curo Interactive Incorporated Single action audio prompt interface utilizing binary state time domain multiple selection protocol
AU1601501A (en) * 1999-11-12 2001-06-06 William E Kirksey Method and apparatus for displaying writing and utterance of word symbols
KR20010073718A (en) * 2000-01-19 2001-08-01 정우협 Voice recognition operation (regulation operation) of network game
KR20010073719A (en) * 2000-01-19 2001-08-01 정우협 How to recognize voice of mud game
JP2002091466A (en) * 2000-09-12 2002-03-27 Pioneer Electronic Corp Speech recognition device
WO2002029784A1 (en) * 2000-10-02 2002-04-11 Clarity, Llc Audio visual speech processing
WO2002054184A2 (en) 2001-01-04 2002-07-11 Roy-G-Biv Corporation Systems and methods for transmitting motion control data
US7904194B2 (en) * 2001-02-09 2011-03-08 Roy-G-Biv Corporation Event management systems and methods for motion control systems
US6641401B2 (en) 2001-06-20 2003-11-04 Leapfrog Enterprises, Inc. Interactive apparatus with templates
WO2003001475A1 (en) * 2001-06-20 2003-01-03 Leapfrog Enterprises, Inc. Interactive apparatus using print media
JP2003202888A (en) * 2002-01-07 2003-07-18 Toshiba Corp Headset with radio communication function and voice processing system using the same
US7219062B2 (en) * 2002-01-30 2007-05-15 Koninklijke Philips Electronics N.V. Speech activity detection using acoustic and facial characteristics in an automatic speech recognition system
JP2003316387A (en) * 2002-02-19 2003-11-07 Ntt Docomo Inc Learning device, mobile communication terminal, information recognition system, and learning method
US7587318B2 (en) * 2002-09-12 2009-09-08 Broadcom Corporation Correlating video images of lip movements with audio signals to improve speech recognition
US20040254794A1 (en) * 2003-05-08 2004-12-16 Carl Padula Interactive eyes-free and hands-free device
US7231190B2 (en) * 2003-07-28 2007-06-12 Motorola, Inc. Method and apparatus for terminating reception in a wireless communication system
US7610210B2 (en) * 2003-09-04 2009-10-27 Hartford Fire Insurance Company System for the acquisition of technology risk mitigation information associated with insurance
US7412376B2 (en) * 2003-09-10 2008-08-12 Microsoft Corporation System and method for real-time detection and preservation of speech onset in a signal
DE602004021716D1 (en) * 2003-11-12 2009-08-06 Honda Motor Co Ltd SPEECH RECOGNITION SYSTEM
US7355593B2 (en) * 2004-01-02 2008-04-08 Smart Technologies, Inc. Pointer tracking across multiple overlapping coordinate input sub-regions defining a generally contiguous input region
US20050154593A1 (en) * 2004-01-14 2005-07-14 International Business Machines Corporation Method and apparatus employing electromyographic sensors to initiate oral communications with a voice-based device
JP2005202854A (en) * 2004-01-19 2005-07-28 Nec Corp Image processor, image processing method and image processing program
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
BRPI0418839A (en) * 2004-05-17 2007-11-13 Nokia Corp method for supporting and electronic device supporting an audio signal encoding, audio encoding system, and software program product
JP3827317B2 (en) * 2004-06-03 2006-09-27 任天堂株式会社 Command processing unit
DE102004028082A1 (en) * 2004-06-09 2005-12-29 BSH Bosch und Siemens Hausgeräte GmbH Household appliance e.g. oven, has detection device for evaluating image signal in order to identify sound spoken by operator, and outputting control commands to control unit, such that control unit controls function unit accordingly
GB2415639B (en) * 2004-06-29 2008-09-17 Sony Comp Entertainment Europe Control of data processing
US7704135B2 (en) * 2004-08-23 2010-04-27 Harrison Jr Shelton E Integrated game system, method, and device
US20060046845A1 (en) * 2004-08-26 2006-03-02 Alexandre Armand Device for the acoustic control of a game system and application
JP4729927B2 (en) * 2005-01-11 2011-07-20 ソニー株式会社 Voice detection device, automatic imaging device, and voice detection method
JP4847022B2 (en) * 2005-01-28 2011-12-28 京セラ株式会社 Utterance content recognition device
KR100718125B1 (en) 2005-03-25 2007-05-15 삼성전자주식회사 Biometric apparatus and method using bio signal and artificial neural network
JP4910312B2 (en) * 2005-06-03 2012-04-04 ソニー株式会社 Imaging apparatus and imaging method
US7680656B2 (en) * 2005-06-28 2010-03-16 Microsoft Corporation Multi-sensory speech enhancement using a speech-state model
US20070055528A1 (en) * 2005-08-30 2007-03-08 Dmitry Malyshev Teaching aid and voice game system
US7883420B2 (en) 2005-09-12 2011-02-08 Mattel, Inc. Video game systems
US7860718B2 (en) * 2005-12-08 2010-12-28 Electronics And Telecommunications Research Institute Apparatus and method for speech segment detection and system for speech recognition
JP4557919B2 (en) * 2006-03-29 2010-10-06 株式会社東芝 Audio processing apparatus, audio processing method, and audio processing program
WO2007141682A1 (en) * 2006-06-02 2007-12-13 Koninklijke Philips Electronics N.V. Speech differentiation
US8069039B2 (en) * 2006-12-25 2011-11-29 Yamaha Corporation Sound signal processing apparatus and program
US8326636B2 (en) 2008-01-16 2012-12-04 Canyon Ip Holdings Llc Using a physical phenomenon detector to control operation of a speech recognition engine
GB2450886B (en) * 2007-07-10 2009-12-16 Motorola Inc Voice activity detector and a method of operation
CN101101752B (en) * 2007-07-19 2010-12-01 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
JP4462339B2 (en) * 2007-12-07 2010-05-12 ソニー株式会社 Information processing apparatus, information processing method, and computer program
US8172637B2 (en) * 2008-03-12 2012-05-08 Health Hero Network, Inc. Programmable interactive talking device
US9135809B2 (en) * 2008-06-20 2015-09-15 At&T Intellectual Property I, Lp Voice enabled remote control for a set-top box
KR101229034B1 (en) * 2008-09-10 2013-02-01 성준형 Multimodal unification of articulation for device interfacing
US8154644B2 (en) * 2008-10-08 2012-04-10 Sony Ericsson Mobile Communications Ab System and method for manipulation of a digital image
JP2010165305A (en) * 2009-01-19 2010-07-29 Sony Corp Information processing apparatus, information processing method, and program
KR101581883B1 (en) * 2009-04-30 2016-01-11 삼성전자주식회사 Appratus for detecting voice using motion information and method thereof
JP5499633B2 (en) * 2009-10-28 2014-05-21 ソニー株式会社 REPRODUCTION DEVICE, HEADPHONE, AND REPRODUCTION METHOD
KR101644015B1 (en) * 2009-11-27 2016-08-01 삼성전자주식회사 Communication interface apparatus and method for multi-user and system
WO2011070972A1 (en) * 2009-12-10 2011-06-16 日本電気株式会社 Voice recognition system, voice recognition method and voice recognition program
US8996382B2 (en) * 2010-10-14 2015-03-31 Guy L. McClung, III Lips blockers, headsets and systems
KR20130022607A (en) * 2011-08-25 2013-03-07 삼성전자주식회사 Voice recognition apparatus and method for recognizing voice
JP6100263B2 (en) * 2012-08-10 2017-03-22 株式会社ホンダアクセス Speech recognition method and speech recognition apparatus
RU2523220C1 (en) * 2013-02-19 2014-07-20 Михаил Сергеевич Беллавин Electronic computer
TWI502583B (en) * 2013-04-11 2015-10-01 Wistron Corp Apparatus and method for voice processing
US9873038B2 (en) 2013-06-14 2018-01-23 Intercontinental Great Brands Llc Interactive electronic games based on chewing motion
WO2015072816A1 (en) * 2013-11-18 2015-05-21 삼성전자 주식회사 Display device and control method
KR102345611B1 (en) * 2013-11-18 2021-12-31 삼성전자주식회사 Display apparatus and control method thereof
CN104753607B (en) * 2013-12-31 2017-07-28 鸿富锦精密工业(深圳)有限公司 Eliminate the method and electronic equipment of mobile device interference signal
CN105096935B (en) * 2014-05-06 2019-08-09 阿里巴巴集团控股有限公司 A kind of pronunciation inputting method, device and system
DE112014007265T5 (en) * 2014-12-18 2017-09-07 Mitsubishi Electric Corporation Speech recognition device and speech recognition method
JP2017120609A (en) * 2015-12-24 2017-07-06 カシオ計算機株式会社 Emotion estimation device, emotion estimation method and program
US10192399B2 (en) * 2016-05-13 2019-01-29 Universal Entertainment Corporation Operation device and dealer-alternate device
CN106095381B (en) * 2016-06-07 2020-05-01 北京京东尚科信息技术有限公司 Terminal equipment and sliding operation control method and device of display screen of terminal equipment
US10764643B2 (en) * 2016-06-15 2020-09-01 Opentv, Inc. Context driven content rewind
KR102562287B1 (en) * 2016-10-14 2023-08-02 삼성전자주식회사 Electronic device and audio signal processing method thereof
CN108227904A (en) * 2016-12-21 2018-06-29 深圳市掌网科技股份有限公司 A kind of virtual reality language interactive system and method
US10332515B2 (en) 2017-03-14 2019-06-25 Google Llc Query endpointing based on lip detection
CN111033611A (en) * 2017-03-23 2020-04-17 乔伊森安全系统收购有限责任公司 System and method for associating mouth images with input instructions
CN106875941B (en) * 2017-04-01 2020-02-18 彭楚奥 Voice semantic recognition method of service robot
CN107316651B (en) * 2017-07-04 2020-03-31 北京中瑞智科技有限公司 Audio processing method and device based on microphone
CN109859749A (en) 2017-11-30 2019-06-07 阿里巴巴集团控股有限公司 A kind of voice signal recognition methods and device
US11068735B2 (en) * 2017-12-05 2021-07-20 Denso Corporation Reliability calculation apparatus
CN108465241B (en) * 2018-02-12 2021-05-04 网易(杭州)网络有限公司 Game sound reverberation processing method and device, storage medium and electronic equipment
US10997979B2 (en) * 2018-06-21 2021-05-04 Casio Computer Co., Ltd. Voice recognition device and voice recognition method
US11288974B2 (en) 2019-03-20 2022-03-29 Edana Croyle Speech development system
US11282402B2 (en) 2019-03-20 2022-03-22 Edana Croyle Speech development assembly
CN113345472B (en) * 2021-05-08 2022-03-25 北京百度网讯科技有限公司 Voice endpoint detection method and device, electronic equipment and storage medium

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3582559A (en) 1969-04-21 1971-06-01 Scope Inc Method and apparatus for interpretation of time-varying signals
US4245430A (en) 1979-07-16 1981-01-20 Hoyt Steven D Voice responsive toy
ZA813750B (en) 1981-06-04 1982-06-30 Digicor Pty Ltd Audio sensing apparatus
US4799171A (en) 1983-06-20 1989-01-17 Kenner Parker Toys Inc. Talk back doll
JPS6055985A (en) 1983-09-05 1985-04-01 株式会社トミー Sound recognizing toy
NL8400728A (en) * 1984-03-07 1985-10-01 Philips Nv DIGITAL VOICE CODER WITH BASE BAND RESIDUCODING.
US4975960A (en) * 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
US4725956A (en) 1985-10-15 1988-02-16 Lockheed Corporation Voice command air vehicle control system
US4757541A (en) * 1985-11-05 1988-07-12 Research Triangle Institute Audio visual speech recognition
GB8528143D0 (en) * 1985-11-14 1985-12-18 British Telecomm Image encoding & synthesis
US4696653A (en) 1986-02-07 1987-09-29 Worlds Of Wonder, Inc. Speaking toy doll
JPS62239231A (en) * 1986-04-10 1987-10-20 Kiyarii Rabo:Kk Speech recognition method by inputting lip picture
GB8618193D0 (en) 1986-07-25 1986-11-26 Smiths Industries Plc Speech recognition apparatus
JPS6338993A (en) * 1986-08-04 1988-02-19 松下電器産業株式会社 Voice section detector
US4829578A (en) 1986-10-02 1989-05-09 Dragon Systems, Inc. Speech detection and recognition apparatus for use with background noise of varying levels
US4857030A (en) * 1987-02-06 1989-08-15 Coleco Industries, Inc. Conversing dolls
US4840602A (en) 1987-02-06 1989-06-20 Coleco Industries, Inc. Talking doll responsive to external signal
JPH067343B2 (en) 1987-02-23 1994-01-26 株式会社東芝 Pattern identification device
US5222147A (en) 1989-04-13 1993-06-22 Kabushiki Kaisha Toshiba Speech recognition LSI system including recording/reproduction device
JPH0398078A (en) 1989-09-12 1991-04-23 Seiko Epson Corp Voice evaluation system
JPH03129400A (en) 1989-10-13 1991-06-03 Seiko Epson Corp Speech recognition device
US5267323A (en) * 1989-12-29 1993-11-30 Pioneer Electronic Corporation Voice-operated remote control system
CA2031965A1 (en) 1990-01-02 1991-07-03 Paul A. Rosenstrach Sound synthesizer
US5210791A (en) 1990-12-13 1993-05-11 Michael Krasik Telephone headset on-line indicator
US5209695A (en) 1991-05-13 1993-05-11 Omri Rothschild Sound controllable apparatus particularly useful in controlling toys and robots
US5313522A (en) * 1991-08-23 1994-05-17 Slager Robert P Apparatus for generating from an audio signal a moving visual lip image from which a speech content of the signal can be comprehended by a lipreader
JP3098078B2 (en) 1991-11-27 2000-10-10 日本放送協会 Lightning alarm
US5305422A (en) 1992-02-28 1994-04-19 Panasonic Technologies, Inc. Method for determining boundaries of isolated words within a speech signal
US5579431A (en) 1992-10-05 1996-11-26 Panasonic Technologies, Inc. Speech detection in presence of noise by determining variance over time of frequency band limited energy
US5615296A (en) * 1993-11-12 1997-03-25 International Business Machines Corporation Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors
JP3129400B2 (en) 1997-03-24 2001-01-29 株式会社桑原組 Assembly block and assembly method

Also Published As

Publication number Publication date
US5884257A (en) 1999-03-16
KR950034051A (en) 1995-12-26
DE69527745D1 (en) 2002-09-19
EP0683481A2 (en) 1995-11-22
ES2181732T3 (en) 2003-03-01
EP0683481B1 (en) 2002-08-14
US6471420B1 (en) 2002-10-29
CN1120965A (en) 1996-04-24
KR100215946B1 (en) 1999-08-16
DE69527745T2 (en) 2003-05-15
EP0683481A3 (en) 1998-03-04

Similar Documents

Publication Publication Date Title
CN1132149C (en) Game apparatus, voice selection apparatus, voice recognition apparatus and voice response apparatus
CN1488134A (en) Device and method for voice recognition
CN1187734C (en) Robot control apparatus
CN1302056A (en) Information processing equiopment, information processing method and storage medium
CN107862060B (en) Semantic recognition device and recognition method for tracking target person
CN101101752B (en) Monosyllabic language lip-reading recognition system based on vision character
CN105765650A (en) Speech recognizer with multi-directional decoding
US9443536B2 (en) Apparatus and method for detecting voice based on motion information
CN1304177C (en) Robot apparatus and control method thereof
US20180182396A1 (en) Multi-speaker speech recognition correction system
JP4971413B2 (en) Motion recognition system combined with audiovisual and recognition method thereof
CN1231175C (en) Pulse wave detection method, artery position detection method and pulse wave detection apparatus
CN101030370A (en) Speech communication system and method, and robot apparatus
CN1894740A (en) Information processing system, information processing method, and information processing program
US20120226981A1 (en) Controlling electronic devices in a multimedia system through a natural user interface
CN106157956A (en) The method and device of speech recognition
CN1461463A (en) Voice synthesis device
US20160349839A1 (en) Display apparatus of front-of-the-eye mounted type
US9799332B2 (en) Apparatus and method for providing a reliable voice interface between a system and multiple users
CN1140295A (en) Voice recognition device, reaction device, reaction selection device, and reaction toy using them
JP2004024863A (en) Lips recognition device and occurrence zone recognition device
Tambunan et al. Indonesian speech recognition grammar using Kinect 2.0 for controlling humanoid robot
CN113409809A (en) Voice noise reduction method, device and equipment
JP2000311077A (en) Sound information input device
JP2004258289A (en) Unit and method for robot control, recording medium, and program

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee