EP2367606A1 - A toy exhibiting bonding behaviour - Google Patents

A toy exhibiting bonding behaviour

Info

Publication number
EP2367606A1
EP2367606A1 EP09828710A EP09828710A EP2367606A1 EP 2367606 A1 EP2367606 A1 EP 2367606A1 EP 09828710 A EP09828710 A EP 09828710A EP 09828710 A EP09828710 A EP 09828710A EP 2367606 A1 EP2367606 A1 EP 2367606A1
Authority
EP
European Patent Office
Prior art keywords
input
toy
user
voice
accumulated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP09828710A
Other languages
German (de)
French (fr)
Other versions
EP2367606A4 (en
Inventor
Johan Adam Du Preez
Ludwig Carl Schwardt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Stellenbosch University
Original Assignee
Stellenbosch University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Stellenbosch University filed Critical Stellenbosch University
Publication of EP2367606A1 publication Critical patent/EP2367606A1/en
Publication of EP2367606A4 publication Critical patent/EP2367606A4/en
Ceased legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63HTOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
    • A63H3/00Dolls
    • A63H3/28Arrangements of sound-producing means in dolls; Means in dolls for producing sounds
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63HTOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
    • A63H2200/00Computerized interactive toys, e.g. dolls

Definitions

  • This invention relates to an interactive toy, more specifically a doll, capable of exhibiting bonding behaviour towards natural persons which mimics the bonding that naturally occurs between a parent and child.
  • the invention extends to a method for simulating bonding behaviour by a toy towards a natural person or persons.
  • Toys in particular dolls, are owned by people the world over, and have been for hundreds of years. Children use dolls to play with, for companionship and also sometimes to invoke a sense of security. Children, especially young children, often develop a very strong bond with their dolls, which may even play a part in the child's development. Dolls are also owned by adults for numerous reasons, be it as collector's items, for their aesthetic qualities or emotional attachment.
  • dolls have developed and have become increasingly sophisticated and, in fact, more life-like.
  • the inventor is, for example, aware of dolls that are capable of simulating limited human behaviour, such as crying, sleeping, talking and even simulating humanly bodily functions such as eating and excreting bodily waste.
  • electronic appliances for example, microphones, sound transducers, movement actuators and the like have been incorporated into dolls.
  • United States patent application number US2007/0128979 entitled “Interactive Hi-tech Doll", for example, discloses a doll which produces human-like facial expressions, recognizes certain words when they are spoken by humans, and which is able to carry on a limited conversation with a living person based on certain pre-defined question and answer scenarios.
  • the doll's recognition of the spoken words is based on speech and voice recognition technology controlled by a processor incorporated in the doll, and allows the doll to be trained to identify the voice of a specific person, as well as assign a specific role, such as that of its mother, to the person.
  • the doll is equipped with movement actuators in its face, allowing movement of its eyes, mouth and cheeks to exhibit certain pre-defined facial expressions concurrently with spoken words or separately to simulate human emotions.
  • the limited conversational skills are based on basic voice and speech recognition techniques which are widely known in the field.
  • the doll will ask a pre-recorded question and expect to receive a specific answer. If it receives the expected answer the doll reacts favourably and if it receives any unexpected answer, it reacts less favourably.
  • the doll has long-term learning capabilities. Instead, its behaviour appears to be governed by a state machine that responds primarily to the current user input and its built in clock.
  • a toy comprising a body that includes at least one input sensor for receiving an input from a human user, at least one output apparatus by means of which the toy interacts with the user, a processor in communication with the input sensor and the output apparatus, and a memory in communication with the processor, the toy being characterized in that the processor is programmed to classify each received input as either positive or negative, to adjust an accumulated input stored in the memory in accordance with the classification, and to send control signals to the output apparatus that are dependent on the accumulated input, the toy thereby exhibiting increased bonding behaviour in response to a series of predominantly positive inputs over time, and decreased bonding behaviour in response to a series of predominantly negative inputs over time.
  • the received input to correspond to human interaction with the toy including one or more of sound, motion and image; for the processor to classify sound associated with shouting and motion associated with physical abuse as negative inputs; for the toy to include at least two input sensors, a first of which is a microphone configured to detect voice and voice amplitude and a second of which is an accelerometer configured to detect motion and acceleration of the toy; for the accumulated input to be representative, at least to some degree, of the voice of a preferred user of the toy; for the processor to be programmed to determine a degree of similarity between a received voice input received by the microphone and the accumulated input; for the accumulated input to be adjusted to become increasingly representative of a user when the received input is classified as positive, and for it to become less representative of a preferred user or remain unchanged when the degree of similarity is low or the received input is classified as negative; for the processor to be programmed to classify a received voice input at an amplitude above a predefined maximum voice amplitude as a negative input,
  • Still further features of the invention provide for the toy to include timing means connected to the processor and for the processor to be programmed to classify an absence of received input for longer than a predefined period of time as negative input and to adjust the accumulated input to become less representative of the preferred user in response thereto; and for the output apparatus to include one or both of a sound transducer and movement actuators and for the processor to be programmed to send control signals to the output apparatus more frequently and/or of a higher quality, when the degree of similarity of a received voice input is high, and for the processor to be programmed to send control signals to the output apparatus less frequently and/or of a lower quality, when the degree of similarity of the received voice input is low.
  • the accumulated input to comprise a collection of characteristics extracted from a voice associated with a generic background speaker, each characteristic having a variable weight associated therewith so that the collection of weighted characteristics is representative of the voice of a preferred user; for the weights associated with the characteristics to be adjusted in order to make the accumulated input increasingly or less representative of the voice of the preferred user; and for the accumulated input to be adjusted to become increasingly representative of the voice of at least one alternative user as the accumulated input becomes less representative of the voice of a current preferred user, the alternative user becoming a new preferred user when the accumulated input becomes more representative of the voice of the alternative user than that of the current preferred user.
  • the invention also provides a method of simulating bonding behaviour in a toy towards a human including the steps of storing an accumulated input representative of a preferred user in a memory associated with the toy, receiving an input from a user by means of at least one input sensor incorporated in the toy, classifying the input as either positive or negative, adjusting the accumulated input to become increasingly representative of the preferred user in response to a positive input and less representative of the preferred user in response to a negative input, and issuing control signals to output apparatus of the toy in response to the input, the control signals being dependent on the accumulated input.
  • Further features of the invention provide for the method to include the steps of classifying a received voice input above a predefined amplitude as a negative input, classifying a received motion input beyond a predefined acceleration range as a negative input, and classifying an absence of received input for longer that a predetermined period of time as a negative input; and determining a degree of similarity of a received voice input to that of a preferred user and issuing control signals to the output apparatus of the toy which are proportional to the degree of similarity.
  • Figure 1 is a schematic representation of the internal components of a toy doll capable of exhibiting bonding behaviour towards a human being according to a first embodiment of the invention
  • Figure 2 is a schematic representation of an alternative embodiment of the toy doll of Figure 1 ;
  • Figure 3 is a flow diagram showing the macro behaviour of a toy doll according to the invention.
  • FIG. 10 of the accompanying drawings show the internal functional components (10) of a toy doll (not shown) in accordance with a first embodiment of the invention.
  • the doll contains a body which is not shown in the drawings as it can take on any number of appearances, for example, that of infants, toddlers, animals or even toy characters.
  • the components (10) are conveniently located inside the doll, for example in a chest cavity of the body, where they are protected by the body. Access may be provided in strategic positions on the body in order to access certain parts of the components that may need periodic replacement or maintenance, for example a power supply or battery pack.
  • the components (10) include the following to support the required behaviour: a digital central processing unit (CPU) (12), which includes timing means (14), in this example a digital timer, a storage unit (16) in the form of a nonvolatile memory module, input sensors (18) to detect an input, in this embodiment a microphone (20) and accelerometer (22), and output apparatus (24) to communicate with a user.
  • the output apparatus in this embodiment include a sound transducer (26) and movement actuators (28) connected to the limbs (not shown) of the toy. It should be appreciated that the movement actuators (28) can be connected to any limbs of the toy in order to control their movement.
  • the CPU (12) is connected to the input sensors (18) and output apparatus (26) with an input interface (30) and output interface (32), respectively.
  • the input interface (30) includes an analog-to-digital (A/D) converter (34) and the output interface (32) includes a digital-to-analog (D/A) converter (36).
  • Machine instructions in the form of software (not shown) is stored in the memory (16) or on additional memory modules (38) to drive the input interface (30) and output interface (32) and their respective A/D and D/A converters.
  • the machine instructions also include instructions causing the CPU to receive input via the input sensors, process received inputs, and send control signals to the output apparatus.
  • Additional software governing the behaviour of the toy is also stored in the memory (16) along with an accumulated input variable in the form of a digital model (not shown) which comprises a collection of characteristics or properties extracted from the voice and/or behaviour of users, including a current preferred user along with a reference of how the characteristics of the preferred user is to be distinguished from other users in general.
  • the accumulated input is representative, to a variable extent, of the current preferred user and is stored in the non-volatile memory module (16).
  • the software further includes voice and speech recognition functionality and other feature extraction software allowing the processor to analyse received inputs and determine the degree to which it corresponds to the digital model of the current preferred user, thus yielding a degree of similarity of the received voice input to that of the preferred user as represented by the accumulated input.
  • the memory (16) furthermore contains software allowing the CPU to analyse an input detected by the input sensors (18) and classify the input as being either positive or negative in nature and also to assign a degree of positivity or negativity to the received input. If the interaction with the current user, as received via the input, is deemed positive, the input is used to provide further learning of the properties of the current user and the accumulated input is updated with such further properties. It will be appreciated that the addition of further properties of the current user to the accumulated input, insofar as the input is classified as positive, makes the accumulated input increasingly representative of the current user and therefore represents an increasingly stronger bond to the current user.
  • the accumulated input will become increasingly representative of the preferred user simulating an increasingly stronger bond to it, but if the current user does not represent the preferred user, the toy will diminish its bond with the preferred user and increase its bond with the current user. It is therefore possible for the current user to become the preferred user by continuous positive interaction with the toy. If the interactions with the toy are deemed negative and to the degree that the current user matches the properties representative of the preferred user as contained in the accumulated input, an unlearning process gradually returns or degrades the accumulated input to become less representative of the preferred user and become more representative of other or a general background user.
  • the degree of learning or unlearning may be proportional to the degree to which the interaction from the user is classified as positive or negative.
  • the machine instructions include threshold values for received voice input amplitude as well as detected motion input acceleration. If voice is received having an amplitude above the amplitude threshold value, such voice will be classified as negative input in that it corresponds to shouting or noise. Acceleration above the maximum threshold will likewise be classified as negative input in that it corresponds to physical abuse, throwing or falling. It is also foreseeable that the software may allow the CPU (12) to identify standard deviations in pitch patterns of sound inputs as singing and standard accelerations between predefined minimum and maximum thresholds as rocking, which will be interpreted as positive inputs.
  • the software In addition to the inputs such as speech and motion as detected by the sensors (18), the software also causes the CPU (12) to monitor the timer (14) and identify a lack of interaction with the toy for longer than a specified period. This corresponds to neglect of the toy and will be classified as negative input and influence the accumulated input accordingly, resulting in an unlearning of the preferred user.
  • the CPU (12) is instructed to learn or reinforce the properties of the current user, by making the accumulated input increasingly representative of the preferred user, proportional to the degree of positivity of the received input at a step (44), after which the CPU (12) send instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the positivity of the input at a step (46).
  • the CPU (12) determines whether the current user is also the current preferred user or if the input is identified as neglect at a step (48). If the current user is not the current preferred user and the input is also not identified as neglect, the CPU (12) again send instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the negativity of the input at step (46). If, however, the current user is identified as the current preferred user or the input is identified as neglect at step (48), the
  • CPU (12) is instructed to unlearn the properties of the current user proportional to the degree of negativity of the input at a step (50), after which the CPU (12) sends instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the negativity of the input at step (46).
  • the CPU (12) waits for the next input to be received or for the timer to indicate a lack of interaction.
  • FIG. 2 An alternative embodiment of the invention is shown in Figure 2.
  • the embodiment of Figure 2 again includes a digital central processing unit (CPU) (12), which includes a digital timer (14), a storage unit (16) in the form of a non-volatile memory module, input sensors (18) to detect an input and a microphone (20) and accelerometer (22).
  • This embodiment additionally includes a digital image recorder (50) which, in this embodiment, is a digital camera.
  • the embodiment also includes output apparatus (24) to communicate with the user.
  • the output apparatus again include a sound transducer (26) and movement actuators (28) connected to the limbs (not shown) of the toy.
  • the CPU (12) is connected to the input sensors (18) and output apparatus (26) with an input interface (30) and output interface (32), respectively.
  • the input interface (30) includes an analog-to-digital (A/D) converter (34) and the output interface (32) includes a digital-to-analog (D/A) converter (36).
  • Machine instructions in the form of software (not shown) is stored in the memory (16) or on additional memory modules (38) to drive the input interface (30) and output interface (32) and their respective A/D and D/A converters.
  • the digital camera (50) may be used to periodically capture an image of a user, for example when interaction from a user is detected. This image may be used in combination with a voice recording or separately, to recognise the face of the preferred user.
  • Complicated image recognition software is available that may be employed to compare a digital image to an image of the preferred user stored in the memory (16). As is described above and further below for voice recognition, the image recognition software may be used to determine a degree of similarity between an image taken with the camera (50) of the preferred user, and an image taken of a current user at a later stage.
  • the control signals sent by the CPU (12) to the output apparatus (24) may again be dependent on the degree of similarity between the images of the current user and that of the preferred user.
  • the above description provides a general overview of the working of the toy. What follows is a more detailed analysis of the algorithms employed by the software and executed by the CPU (12).
  • the algorithms be it software or hardware implementations and which may not reside in the memory (16), will execute on the CPU (12) to evaluate the interactions with the current user and based on that change its internal representation (the accumulated input) of the preferred user as well as determine the nature of its interactions with the user.
  • the input from the user in this case speech, is sampled when detected and made available to the CPU in a digital format.
  • This signal is then digitally processed to determine its relevant information content.
  • it is sub-divided into a sequence of 30 ms frames overlapping each other by 50%.
  • Each frame is shaped by a windowing function, and its power level as well as MeI Frequency Cepstral Coefficients (MFCCs) are determined (various other analyses such as RASTA PLP can also be used).
  • MFCCs MeI Frequency Cepstral Coefficients
  • All this information is combined into a feature vector x(n) which summarises the relevant speech information for that frame.
  • the index n denotes the specific frame number where this vector was determined.
  • the signal can be divided into silence and speech segments, for which several implementations are known.
  • the input obtained from accelerometers can be collected in another feature vector y(n) summarising the motion of the toy.
  • both the signal power (amplitude) as well as the pitch frequency are known as a function of time.
  • the loudness of the voice is directly determined from this power. If the loudness remains between pre- established minimum and maximum thresholds, the interaction is considered to be positive. The total absence of voice during a predetermined interval will be considered as neglect and therefore negative, while the presence of overly loud voice above the maximum threshold will be considered as shouting and therefore also negative.
  • a generic background speaker is represented with a Gaussian Mixture Model (GMM), referred to here as a Universal Background Model (UBM).
  • GMM Gaussian Mixture Model
  • UBM Universal Background Model
  • This UBM is then adapted to the speech of a specific target speaker, in this embodiment the preferred user, via a process such as Maximum a Posteriori
  • MAP Maximum-Likelihood Linear Regression
  • MLLR Maximum-Likelihood Linear Regression
  • MLED Maximum-Likelihood Eigendecomposition
  • f denotes either a Gaussian or GMM probability density function and the subscripts T and U respectively denote the target and UBM speaker.
  • N frames are collected before doing so, with N chosen such that it corresponds to a time duration in the range of 10 to 30 seconds.
  • TNORM is another notable example that replaces the single UBM with a number of background speaker models.
  • a multi-dimensional Gaussian density consists of a mean/centroid vector m and a covariance matrix C.
  • MAP adaptation of the Gaussian centroid vector specifically leads to a weighted combination of the existing prior centroid and the newly observed target feature vectors, while leaving the covariance matrices unchanged and intact. This idea is adapted here to allow the system to learn the characteristics of a recent speaker while simultaneously also gradually unlearning the characteristics of earlier speakers in a computationally efficient manner.
  • the adaptation of a single target Gaussian centroid is described first and is later extended to the adaptation of Gaussian centroids embedded in a GMM.
  • the target centroid is cloned from the UBM. The preferred user is therefore indistinguishable from the generic background speaker at this stage. Therefore
  • T denotes the target
  • U denotes the UBM
  • n the adaptation time step.
  • the target centroid is a function of time n, whereas the UBM centroid remains constant.
  • a target feature vector is now observed which is derived from the speech of a user, which is denoted by x(n). The target centroid is then adapted using the recursion
  • Table 1 Effective memory length for different values of ⁇ . The duration in minutes is based on 15 ms time steps.
  • A IO- 5 I l + ⁇
  • GMM Gaussian Mixture Model
  • Adaptation of the GMM is correspondingly done by proportionally using the feature vector to update each of the Gaussian components. This changes the original updated recursion to:
  • the preferred user engages in "abusive" behaviour, we want to rapidly fade that user from the toy's memory.
  • the preferred user is recognized by a high identification score s(X) and the presence of abuse is typified by a high negative value of the interaction quality Q. Their combined presence accelerates the above unlearning process by immediately applying this procedure, but with a hugely increased value of

Abstract

A toy capable of exhibiting bonding behaviour towards a user and a method of simulating such behaviour is provided. The toy includes input sensors (18) for receiving interactive input from users, output apparatus (24) for communicating with users, a processor (12) and memory (16) containing machine instructors causing the processor (12) to receive interactive input, process received input and send control signals to the output apparatus. The processor (12) classifies received input as either positive or negative and adjusts an accumulated input stored in the memory (16) in accordance with the classification. The control signals, in turn, are dependent on the accumulated input.

Description

A TOY EXHIBITING BONDING BEHAVIOUR
FIELD OF THE INVENTION
This invention relates to an interactive toy, more specifically a doll, capable of exhibiting bonding behaviour towards natural persons which mimics the bonding that naturally occurs between a parent and child. The invention extends to a method for simulating bonding behaviour by a toy towards a natural person or persons.
BACKGROUND TO THE INVENTION
Toys, in particular dolls, are owned by people the world over, and have been for hundreds of years. Children use dolls to play with, for companionship and also sometimes to invoke a sense of security. Children, especially young children, often develop a very strong bond with their dolls, which may even play a part in the child's development. Dolls are also owned by adults for numerous reasons, be it as collector's items, for their aesthetic qualities or emotional attachment.
Along with technological advances made over the past years, dolls have developed and have become increasingly sophisticated and, in fact, more life-like. The inventor is, for example, aware of dolls that are capable of simulating limited human behaviour, such as crying, sleeping, talking and even simulating humanly bodily functions such as eating and excreting bodily waste. The inventor is furthermore aware that electronic appliances, for example, microphones, sound transducers, movement actuators and the like have been incorporated into dolls.
United States patent application number US2007/0128979, entitled "Interactive Hi-tech Doll", for example, discloses a doll which produces human-like facial expressions, recognizes certain words when they are spoken by humans, and which is able to carry on a limited conversation with a living person based on certain pre-defined question and answer scenarios. The doll's recognition of the spoken words is based on speech and voice recognition technology controlled by a processor incorporated in the doll, and allows the doll to be trained to identify the voice of a specific person, as well as assign a specific role, such as that of its mother, to the person. The doll is equipped with movement actuators in its face, allowing movement of its eyes, mouth and cheeks to exhibit certain pre-defined facial expressions concurrently with spoken words or separately to simulate human emotions. The limited conversational skills are based on basic voice and speech recognition techniques which are widely known in the field. In each scenario, the doll will ask a pre-recorded question and expect to receive a specific answer. If it receives the expected answer the doll reacts favourably and if it receives any unexpected answer, it reacts less favourably. There is, however, no mention in the application that the doll has long-term learning capabilities. Instead, its behaviour appears to be governed by a state machine that responds primarily to the current user input and its built in clock.
OBJECT OF THE INVENTION
It is an object of this invention to provide an interactive toy, more specifically a doll, capable of simulating bonding behaviour towards a person, which is an improvement over the prior art outlined above.
SUMMARY OF THE INVENTION
In accordance with this invention there is provided a toy comprising a body that includes at least one input sensor for receiving an input from a human user, at least one output apparatus by means of which the toy interacts with the user, a processor in communication with the input sensor and the output apparatus, and a memory in communication with the processor, the toy being characterized in that the processor is programmed to classify each received input as either positive or negative, to adjust an accumulated input stored in the memory in accordance with the classification, and to send control signals to the output apparatus that are dependent on the accumulated input, the toy thereby exhibiting increased bonding behaviour in response to a series of predominantly positive inputs over time, and decreased bonding behaviour in response to a series of predominantly negative inputs over time.
Further features of the invention provide for the received input to correspond to human interaction with the toy including one or more of sound, motion and image; for the processor to classify sound associated with shouting and motion associated with physical abuse as negative inputs; for the toy to include at least two input sensors, a first of which is a microphone configured to detect voice and voice amplitude and a second of which is an accelerometer configured to detect motion and acceleration of the toy; for the accumulated input to be representative, at least to some degree, of the voice of a preferred user of the toy; for the processor to be programmed to determine a degree of similarity between a received voice input received by the microphone and the accumulated input; for the accumulated input to be adjusted to become increasingly representative of a user when the received input is classified as positive, and for it to become less representative of a preferred user or remain unchanged when the degree of similarity is low or the received input is classified as negative; for the processor to be programmed to classify a received voice input at an amplitude above a predefined maximum voice amplitude as a negative input, and below it as a positive input; for the processor to be programmed to classify a detected motion input at an acceleration above a predefined maximum acceleration threshold as a negative input, and below it as a positive input; and for the processor to be programmed to determine a degree of positivity or negativity, as the case may be, of a received input and to adjust the accumulated input proportionate to the degree or positivity or negativity. Still further features of the invention provide for the toy to include timing means connected to the processor and for the processor to be programmed to classify an absence of received input for longer than a predefined period of time as negative input and to adjust the accumulated input to become less representative of the preferred user in response thereto; and for the output apparatus to include one or both of a sound transducer and movement actuators and for the processor to be programmed to send control signals to the output apparatus more frequently and/or of a higher quality, when the degree of similarity of a received voice input is high, and for the processor to be programmed to send control signals to the output apparatus less frequently and/or of a lower quality, when the degree of similarity of the received voice input is low.
Yet further features of the invention provide for the accumulated input to comprise a collection of characteristics extracted from a voice associated with a generic background speaker, each characteristic having a variable weight associated therewith so that the collection of weighted characteristics is representative of the voice of a preferred user; for the weights associated with the characteristics to be adjusted in order to make the accumulated input increasingly or less representative of the voice of the preferred user; and for the accumulated input to be adjusted to become increasingly representative of the voice of at least one alternative user as the accumulated input becomes less representative of the voice of a current preferred user, the alternative user becoming a new preferred user when the accumulated input becomes more representative of the voice of the alternative user than that of the current preferred user.
The invention also provides a method of simulating bonding behaviour in a toy towards a human including the steps of storing an accumulated input representative of a preferred user in a memory associated with the toy, receiving an input from a user by means of at least one input sensor incorporated in the toy, classifying the input as either positive or negative, adjusting the accumulated input to become increasingly representative of the preferred user in response to a positive input and less representative of the preferred user in response to a negative input, and issuing control signals to output apparatus of the toy in response to the input, the control signals being dependent on the accumulated input.
Further features of the invention provide for the method to include the steps of classifying a received voice input above a predefined amplitude as a negative input, classifying a received motion input beyond a predefined acceleration range as a negative input, and classifying an absence of received input for longer that a predetermined period of time as a negative input; and determining a degree of similarity of a received voice input to that of a preferred user and issuing control signals to the output apparatus of the toy which are proportional to the degree of similarity.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described, by way of example only with reference to the accompanying representations in which:
Figure 1 is a schematic representation of the internal components of a toy doll capable of exhibiting bonding behaviour towards a human being according to a first embodiment of the invention;
Figure 2 is a schematic representation of an alternative embodiment of the toy doll of Figure 1 ; and
Figure 3 is a flow diagram showing the macro behaviour of a toy doll according to the invention.
DETAILED DESCRIPTION WITH REFERENCE TO THE DRAWINGS Figure 1 of the accompanying drawings show the internal functional components (10) of a toy doll (not shown) in accordance with a first embodiment of the invention. The doll contains a body which is not shown in the drawings as it can take on any number of appearances, for example, that of infants, toddlers, animals or even toy characters. The components (10) are conveniently located inside the doll, for example in a chest cavity of the body, where they are protected by the body. Access may be provided in strategic positions on the body in order to access certain parts of the components that may need periodic replacement or maintenance, for example a power supply or battery pack.
The components (10) include the following to support the required behaviour: a digital central processing unit (CPU) (12), which includes timing means (14), in this example a digital timer, a storage unit (16) in the form of a nonvolatile memory module, input sensors (18) to detect an input, in this embodiment a microphone (20) and accelerometer (22), and output apparatus (24) to communicate with a user. The output apparatus in this embodiment include a sound transducer (26) and movement actuators (28) connected to the limbs (not shown) of the toy. It should be appreciated that the movement actuators (28) can be connected to any limbs of the toy in order to control their movement. The CPU (12) is connected to the input sensors (18) and output apparatus (26) with an input interface (30) and output interface (32), respectively. The input interface (30) includes an analog-to-digital (A/D) converter (34) and the output interface (32) includes a digital-to-analog (D/A) converter (36). Machine instructions in the form of software (not shown) is stored in the memory (16) or on additional memory modules (38) to drive the input interface (30) and output interface (32) and their respective A/D and D/A converters. The machine instructions also include instructions causing the CPU to receive input via the input sensors, process received inputs, and send control signals to the output apparatus. Additional software governing the behaviour of the toy is also stored in the memory (16) along with an accumulated input variable in the form of a digital model (not shown) which comprises a collection of characteristics or properties extracted from the voice and/or behaviour of users, including a current preferred user along with a reference of how the characteristics of the preferred user is to be distinguished from other users in general. The accumulated input is representative, to a variable extent, of the current preferred user and is stored in the non-volatile memory module (16). The software further includes voice and speech recognition functionality and other feature extraction software allowing the processor to analyse received inputs and determine the degree to which it corresponds to the digital model of the current preferred user, thus yielding a degree of similarity of the received voice input to that of the preferred user as represented by the accumulated input.
The memory (16) furthermore contains software allowing the CPU to analyse an input detected by the input sensors (18) and classify the input as being either positive or negative in nature and also to assign a degree of positivity or negativity to the received input. If the interaction with the current user, as received via the input, is deemed positive, the input is used to provide further learning of the properties of the current user and the accumulated input is updated with such further properties. It will be appreciated that the addition of further properties of the current user to the accumulated input, insofar as the input is classified as positive, makes the accumulated input increasingly representative of the current user and therefore represents an increasingly stronger bond to the current user. If the current user also closely represents the preferred user, the accumulated input will become increasingly representative of the preferred user simulating an increasingly stronger bond to it, but if the current user does not represent the preferred user, the toy will diminish its bond with the preferred user and increase its bond with the current user. It is therefore possible for the current user to become the preferred user by continuous positive interaction with the toy. If the interactions with the toy are deemed negative and to the degree that the current user matches the properties representative of the preferred user as contained in the accumulated input, an unlearning process gradually returns or degrades the accumulated input to become less representative of the preferred user and become more representative of other or a general background user.
The degree of learning or unlearning, as the case may be, may be proportional to the degree to which the interaction from the user is classified as positive or negative. The machine instructions (software) include threshold values for received voice input amplitude as well as detected motion input acceleration. If voice is received having an amplitude above the amplitude threshold value, such voice will be classified as negative input in that it corresponds to shouting or noise. Acceleration above the maximum threshold will likewise be classified as negative input in that it corresponds to physical abuse, throwing or falling. It is also foreseeable that the software may allow the CPU (12) to identify standard deviations in pitch patterns of sound inputs as singing and standard accelerations between predefined minimum and maximum thresholds as rocking, which will be interpreted as positive inputs.
To the extent that the interactions from a user are deemed positive and the characteristics of the current user matches that of the preferred user closely, in other words there is a high degree of similarity between the voice of the current user and that of the preferred user (represented by the accumulated input), positive responses from the toy, as dictated by instructions sent to the output apparatus (26) by the CPU (12), will increase, in frequency and/or in quality. Conversely, if the characteristics of the current user do not match that of the preferred user, positive responses from the toy, as dictated by the instructions sent to the output apparatus (26) by the CPU (12) will decrease in frequency and/or in quality. In addition to the inputs such as speech and motion as detected by the sensors (18), the software also causes the CPU (12) to monitor the timer (14) and identify a lack of interaction with the toy for longer than a specified period. This corresponds to neglect of the toy and will be classified as negative input and influence the accumulated input accordingly, resulting in an unlearning of the preferred user.
The macro behaviour of the toy can be explained more simply with reference to the flow diagram shown in Figure 3. In Figure 3, when an input is detected by one of the input sensors (18) at a step (40), the CPU (12) classifies the input as positive or negative and measures its degree of positivity or negativity, as the case may be. The CPU (12) also determines the degree of similarity of the voice associated with a voice input to that of the preferred user, in the drawing this step is referred to as the quality of match to the bonded user. If the input was classified as positive, this is identified at a step
(42), and the CPU (12) is instructed to learn or reinforce the properties of the current user, by making the accumulated input increasingly representative of the preferred user, proportional to the degree of positivity of the received input at a step (44), after which the CPU (12) send instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the positivity of the input at a step (46).
If the input is identified as negative at step (42), the CPU (12) determines whether the current user is also the current preferred user or if the input is identified as neglect at a step (48). If the current user is not the current preferred user and the input is also not identified as neglect, the CPU (12) again send instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the negativity of the input at step (46). If, however, the current user is identified as the current preferred user or the input is identified as neglect at step (48), the
CPU (12) is instructed to unlearn the properties of the current user proportional to the degree of negativity of the input at a step (50), after which the CPU (12) sends instructions to the output apparatus (18), proportional to the degree of similarity of the current user to the preferred user and the negativity of the input at step (46).
On completion of the instructions sent to the output apparatus at step (46), the CPU (12) waits for the next input to be received or for the timer to indicate a lack of interaction.
An alternative embodiment of the invention is shown in Figure 2. In the figure, like numerals indicate like features to the embodiment illustrated in Figure 1. The embodiment of Figure 2 again includes a digital central processing unit (CPU) (12), which includes a digital timer (14), a storage unit (16) in the form of a non-volatile memory module, input sensors (18) to detect an input and a microphone (20) and accelerometer (22). This embodiment additionally includes a digital image recorder (50) which, in this embodiment, is a digital camera. The embodiment also includes output apparatus (24) to communicate with the user. The output apparatus again include a sound transducer (26) and movement actuators (28) connected to the limbs (not shown) of the toy. The CPU (12) is connected to the input sensors (18) and output apparatus (26) with an input interface (30) and output interface (32), respectively. The input interface (30) includes an analog-to-digital (A/D) converter (34) and the output interface (32) includes a digital-to-analog (D/A) converter (36). Machine instructions in the form of software (not shown) is stored in the memory (16) or on additional memory modules (38) to drive the input interface (30) and output interface (32) and their respective A/D and D/A converters.
It should be appreciated that in this embodiment of the invention, the digital camera (50) may be used to periodically capture an image of a user, for example when interaction from a user is detected. This image may be used in combination with a voice recording or separately, to recognise the face of the preferred user. Complicated image recognition software is available that may be employed to compare a digital image to an image of the preferred user stored in the memory (16). As is described above and further below for voice recognition, the image recognition software may be used to determine a degree of similarity between an image taken with the camera (50) of the preferred user, and an image taken of a current user at a later stage. The control signals sent by the CPU (12) to the output apparatus (24) may again be dependent on the degree of similarity between the images of the current user and that of the preferred user.
The above description provides a general overview of the working of the toy. What follows is a more detailed analysis of the algorithms employed by the software and executed by the CPU (12). The algorithms, be it software or hardware implementations and which may not reside in the memory (16), will execute on the CPU (12) to evaluate the interactions with the current user and based on that change its internal representation (the accumulated input) of the preferred user as well as determine the nature of its interactions with the user.
The input from the user, in this case speech, is sampled when detected and made available to the CPU in a digital format. This signal is then digitally processed to determine its relevant information content. Although various alternatives are possible, in this embodiment it is sub-divided into a sequence of 30 ms frames overlapping each other by 50%. Each frame is shaped by a windowing function, and its power level as well as MeI Frequency Cepstral Coefficients (MFCCs) are determined (various other analyses such as RASTA PLP can also be used). This is augmented with the pitch frequency at that given time. All this information is combined into a feature vector x(n) which summarises the relevant speech information for that frame. The index n denotes the specific frame number where this vector was determined. With the information available the signal can be divided into silence and speech segments, for which several implementations are known. Similarly the input obtained from accelerometers can be collected in another feature vector y(n) summarising the motion of the toy.
From x(n) both the signal power (amplitude) as well as the pitch frequency are known as a function of time. The loudness of the voice is directly determined from this power. If the loudness remains between pre- established minimum and maximum thresholds, the interaction is considered to be positive. The total absence of voice during a predetermined interval will be considered as neglect and therefore negative, while the presence of overly loud voice above the maximum threshold will be considered as shouting and therefore also negative.
These aspects can be combined into a quality measure over a given period, presented as a value -1 < Q < 1 , where 0 is taken as neutral.
To determine the identity of a speaker, statistical models are used to describe both the target speaker as well as a generic background speaker. Although the description here concerns a particular implementation of modelling speaker characteristics and using this for determining the match between an unknown speech sample and a particular speaker, other techniques for doing so are not excluded. The exact technique or implementation is not critical to this patent and there are several candidates available from the broad fields of speaker recognition and machine learning (pattern recognition) in general. The use of Support Vector Machines (SVM) or other popular pattern classification approaches can conceivably also be used instead of what is described here.
A generic background speaker is represented with a Gaussian Mixture Model (GMM), referred to here as a Universal Background Model (UBM). In its most simplified form such a mixture can collapse to a single Gaussian density, thereby reducing computational requirements greatly. The UBM is typically collectively trained from the speech of a large number of speakers.
This UBM is then adapted to the speech of a specific target speaker, in this embodiment the preferred user, via a process such as Maximum a Posteriori
(MAP) adaptation, Maximum-Likelihood Linear Regression (MLLR), or
Maximum-Likelihood Eigendecomposition (MLED). The trained UBM parameters form a stable initial model estimate, which are then reweighted in some fashion to more closely resemble the characteristics of the preferred user. This results in the preferred speaker model. This approach is discussed in more detail below.
Having a UBM and a target speaker model available allows one to evaluate the closeness of the match of an unknown segment of speech to the model of the preferred user. This is done by evaluating the logarithmic score of this speech segment to both the models of the background speakers (UBM) and the preferred user (as represented by the accumulated input). The difference between those scores approximates the log-likelihood-ratio (LLR) score and directly translates to how well the preferred user matches with the current speech. Mathematically the LLR score of the nth frame, s(n), is expressed as:
s(x(n)) = lofrM")))- l°dfu (*(«)))
where f denotes either a Gaussian or GMM probability density function and the subscripts T and U respectively denote the target and UBM speaker.
Basing a decision on a single frame is precarious. Typically N frames are collected before doing so, with N chosen such that it corresponds to a time duration in the range of 10 to 30 seconds. The score for such a segment is then given by N-I s(x)= ∑s(x(rr)) π=0
with X = (x(O), ... ,x(7V - 1)}. A larger score indicates a larger possibility that the speech originated from the preferred user (a high degree of similarity), with a value of zero indicating that the speech cannot be distinguished from that of the generic background speaker (a low degree of similarity). Once again there are several other alternatives for this. Test normalization
(TNORM) is another notable example that replaces the single UBM with a number of background speaker models.
A multi-dimensional Gaussian density consists of a mean/centroid vector m and a covariance matrix C. MAP adaptation of the Gaussian centroid vector specifically leads to a weighted combination of the existing prior centroid and the newly observed target feature vectors, while leaving the covariance matrices unchanged and intact. This idea is adapted here to allow the system to learn the characteristics of a recent speaker while simultaneously also gradually unlearning the characteristics of earlier speakers in a computationally efficient manner.
The adaptation of a single target Gaussian centroid is described first and is later extended to the adaptation of Gaussian centroids embedded in a GMM. Before first use of the toy, the target centroid is cloned from the UBM. The preferred user is therefore indistinguishable from the generic background speaker at this stage. Therefore
where once again T denotes the target, U denotes the UBM, and the quantity n denotes the adaptation time step. Note that the target centroid is a function of time n, whereas the UBM centroid remains constant. A target feature vector is now observed which is derived from the speech of a user, which is denoted by x(n). The target centroid is then adapted using the recursion
W7. (n) = λx(n) + (l - λjntj. (n - 1)
with λ a small positive constant and n = 0, 1, 2, ... This difference equation represents a digital lowpass filter with a DC gain of 1. The smaller the value of λ, the more emphasis is being placed on the existing centroid value and the less on the newly observed feature vector. Therefore λ effectively controls the length of memory that the system has of past centroids. The effective length of this memory can be determined by noting how long it takes for the impulse response of this filter to subside to about 10% of the original impulse height. The following table summarises this:
Table 1: Effective memory length for different values of λ. The duration in minutes is based on 15 ms time steps.
Therefore, for λ = 10"5 about one hour of sustained speech is required to unlearn the previous speaker and bond to a new preferred speaker. Such a learning rate can be modulated by the quality of the interaction by setting it as
A = IO-5I l + ^ A more sophisticated system uses a Gaussian Mixture Model (GMM), consisting of K Gaussian component models, instead of a single Gaussian density as discussed above. If the likelihood of feature vector x(n) given the rth Gaussian component is given by fi(x(n)), the likelihood resulting from the GMM will be the weighted sum
with Wj the mixture weights and / = 1,2, ..., K. When updating such a model, a target feature vector x(n) will now be proportionally associated with the various Gaussian components, instead of entirely with only one Gaussian. These proportionality constants are known as responsibilities and can be determined as
,M=-- 1*
Σ>ΛM»))
Adaptation of the GMM is correspondingly done by proportionally using the feature vector to update each of the Gaussian components. This changes the original updated recursion to:
TfIrXn)
Using this method of adaptation will maintain the bonding of an existing user as long as that user sustains interaction. If, however, another user starts to interact with the toy, the memory of the original user will gradually fade and be replaced by that of the new one, which is precisely the desired behaviour.
When the current preferred user is neglecting interaction with the toy we also want him/her to fade from the toy's memory, in other words for the toy to unlearn his/her voice characteristics. This is achieved by periodically inserting extra feature vectors ' ~ u-' originating from the UBM centroids, into the adaption process. Their corresponding responsibility constants should be
This will move the target model away from the characteristics of the preferred user, and closer to the generic background speaker. However, the effect of these vectors should be much less pronounced than that of the true target speaker input vectors. They should therefore be inserted after roughly every 20 (or more) time frames, making this unlearning process approximately 20 times slower than the learning process. This serves two purposes. Firstly, the target model is continually being stabilised towards the UBM, providing some extra robustness against extraneous environmental noise, and secondly, should the user ignore the toy for an extended period, the toy will gradually "forget" this user.
If the preferred user engages in "abusive" behaviour, we want to rapidly fade that user from the toy's memory. The preferred user is recognized by a high identification score s(X) and the presence of abuse is typified by a high negative value of the interaction quality Q. Their combined presence accelerates the above unlearning process by immediately applying this procedure, but with a hugely increased value of
1 ( 2 "J λ - -max 0, ?ττ ~ l
3 V l + e"j(jr) J
This will rapidly move the target model back to the UBM while still taking into account the uncertainty that the speech actually arose from the preferred speaker. To the extent that the interactions are deemed a) positive and b) the match with the preferred user is strong, positive interactions from the toy will increase, both in frequency and quality. These are expressed in terms of the spoken response of the toy, possible facial expression control, as well as the movements made by its limbs.
Although the description here concerns particular implementations for detecting a quiet soothing voice versus shouting, as well as a soft rocking motion versus throwing or falling, other implementation for doing so, as well as other types of gestures to be considered, are not excluded. The exact technique or implementation is not critical to this patent.
Furthermore, although not described here, similar processes can be devised for distinguishing the face of the preferred individual from that of a generic face representation. One approach for this is by measuring how the preferred face deviates from the generic face provided by the first components of an eigenface representation.
It should be appreciated that the above description is by way of example only and that numerous modifications, adaptations, and other implementations are possible. For example, substitutions, additions, or modifications may be made to the elements illustrated in the drawings, and the methods described herein may be modified by substituting, reordering, or adding stages to the disclosed methods. Furthermore, any elements described as being of a digital nature may equally well be implemented with analog circuitry if appropriate changes are made to the hardware of the toy. Accordingly, the above detailed description does not limit the invention.

Claims

CLAIMS:
1. A toy comprising a body that includes at least one input sensor (18) for receiving an input from a human user, at least one output apparatus (24) by means of which the toy interacts with the user, a processor
(12) in communication with the input sensor (18) and the output apparatus (24) and a memory (16) in communication with the processor (12), the toy being characterized in that the processor (12) is programmed to classify each received input as either positive or negative, to adjust an accumulated input stored in the memory (16) in accordance with the classification, and to send control signals to the output apparatus (24) that are dependent on the accumulated input, the toy thereby exhibiting increased bonding behaviour in response to a series of predominantly positive inputs over time, and decreased bonding behaviour in response to a series of predominantly negative inputs over time.
2. A toy as claimed in claim 1 in which the received input corresponds to human interaction with the toy corresponding to one or more of sound, motion and image.
3. A toy as claimed in claim 2 in which the processor (12) classifies sound associated with shouting and motion associated with physical abuse as negative inputs.
4. A toy as claimed in any one of the preceding claims which includes at least two input sensors (18), a first of which is a microphone (20) configured to detect voice and voice amplitude and a second of which is an accelerometer (22) configured to detect motion and acceleration of the toy.
5. A toy as claimed in any one of the preceding claims in which the accumulated input is representative, at least to some degree, of the voice of a preferred user of the toy.
6. A toy as claimed in claim 4 or claim 5 in which the processor (12) is programmed to determine a degree of similarity between a received voice input received by the microphone (20) and the accumulated input.
7. A toy as claimed in claim 6 in which the accumulated input is adjusted to become increasingly representative of a user when the received input is classified as positive, and for it to become less representative of a preferred user or remain unchanged when the degree of similarity is low or the received input is classified as negative.
8. A toy as claimed in any one of the preceding claims in which the processor (12) is programmed to classify a received voice input at an amplitude above a predefined maximum voice amplitude as a negative input, and below it as a positive input.
9. A toy as claimed in any one of the preceding claims in which the processor (12) is programmed to classify a detected motion input at an acceleration above a predefined maximum acceleration threshold as a negative input, and below it as a positive input.
10. A toy as claimed in any one of the preceding claims in which the processor (12) is programmed to determine a degree of positivity or negativity, as the case may be, of a received input and to adjust the accumulated input proportionate to the degree or positivity or negativity.
11. A toy as claimed in any one of the preceding claims which includes timing means (14) in communication with the processor (12) and in which the processor (12) is programmed to classify an absence of received input for longer than a predefined period of time as negative input and to adjust the accumulated input to become less representative of the preferred user in response thereto.
12. A toy as claimed in any one of the preceding claims in which the output apparatus (24) include one or both of a sound transducer (26) and movement actuators (28) and in which the processor (12) is programmed to send control signals to the output apparatus (24) more frequently and/or of a higher quality, when the degree of similarity of a received voice input is high, and in which the processor (12) is programmed to send control signals to the output apparatus (24) less frequently and/or of a lower quality, when the degree of similarity of the received voice input is low.
13. A toy as claimed in any one of the preceding claims in which the accumulated input comprises a collection of characteristics extracted from a voice associated with a generic background speaker, each characteristic having a variable weigh associated therewith so that the collection of weighted characteristics is representative of the voice of a preferred user.
14. A toy as claimed in claim 13 in which the variable weights associated with the characteristics are adjusted in order to make the accumulated input increasingly or less representative of the voice of the preferred user.
15. A toy as claimed in claim 13 or claim 14 in which the accumulated input is adjusted to become increasingly representative of the voice of at least one alternative user as the accumulated input becomes less representative of the voice of a current preferred user, the alternative user becoming a new preferred user when the accumulated input becomes more representative of the voice of the alternative user than that of the current preferred user.
16. A method of simulating bonding behaviour in a toy towards a human including the steps of storing an accumulated input representative of a preferred user in a memory (16) associated with the toy, receiving an input from a user by means of at least one input sensor (18) incorporated in the toy, classifying the input as either positive or negative, adjusting the accumulated input to become increasingly representative of the preferred user in response to a positive input and less representative of the preferred user in response to a negative input, and issuing control signals to output apparatus (26) of the toy in response to the input, the control signals being dependent on the accumulated input.
17. A method as claimed in claim 16 including the steps of classifying a received voice input above a predefined amplitude as a negative input, classifying a received motion input outside a predefined acceleration range as a negative input, and classifying an absence of received input for longer that a predetermined period of time as a negative input.
18. A method as claimed in claim 16 or claim 17 including the step of determining a degree of similarity of a received voice input to that of a preferred user and issuing control signals to the output apparatus of the toy which are proportional to the degree of similarity.
EP09828710A 2008-11-27 2009-11-27 A toy exhibiting bonding behaviour Ceased EP2367606A4 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
ZA200804571 2008-11-27
ZA200808880 2009-03-05
PCT/IB2009/007585 WO2010061286A1 (en) 2008-11-27 2009-11-27 A toy exhibiting bonding behaviour

Publications (2)

Publication Number Publication Date
EP2367606A1 true EP2367606A1 (en) 2011-09-28
EP2367606A4 EP2367606A4 (en) 2012-09-19

Family

ID=42225297

Family Applications (1)

Application Number Title Priority Date Filing Date
EP09828710A Ceased EP2367606A4 (en) 2008-11-27 2009-11-27 A toy exhibiting bonding behaviour

Country Status (6)

Country Link
US (1) US20110230114A1 (en)
EP (1) EP2367606A4 (en)
CN (1) CN102227240B (en)
HK (1) HK1163003A1 (en)
WO (1) WO2010061286A1 (en)
ZA (1) ZA201103438B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150138333A1 (en) * 2012-02-28 2015-05-21 Google Inc. Agent Interfaces for Interactive Electronics that Support Social Cues
BR112014027594A2 (en) * 2012-05-09 2017-06-27 Koninklijke Philips Nv device for supporting a person's behavior change, method for supporting a person's behavior change using a computer device and program
US9304652B1 (en) 2012-12-21 2016-04-05 Intellifect Incorporated Enhanced system and method for providing a virtual space
US10771247B2 (en) 2013-03-15 2020-09-08 Commerce Signals, Inc. Key pair platform and system to manage federated trust networks in distributed advertising
US10157390B2 (en) 2013-03-15 2018-12-18 Commerce Signals, Inc. Methods and systems for a virtual marketplace or exchange for distributed signals
US11222346B2 (en) 2013-03-15 2022-01-11 Commerce Signals, Inc. Method and systems for distributed signals for use with advertising
US10803512B2 (en) 2013-03-15 2020-10-13 Commerce Signals, Inc. Graphical user interface for object discovery and mapping in open systems
US10743732B2 (en) 2013-06-07 2020-08-18 Intellifect Incorporated System and method for presenting user progress on physical figures
US9836806B1 (en) 2013-06-07 2017-12-05 Intellifect Incorporated System and method for presenting user progress on physical figures
US9728097B2 (en) 2014-08-19 2017-08-08 Intellifect Incorporated Wireless communication between physical figures to evidence real-world activity and facilitate development in real and virtual spaces
CN105597331B (en) * 2016-02-24 2019-02-01 苏州乐派特机器人有限公司 The programming toy in kind that intelligence linearly concatenates
US10380852B2 (en) 2017-05-12 2019-08-13 Google Llc Systems, methods, and devices for activity monitoring via a home assistant
US20200269421A1 (en) * 2017-10-30 2020-08-27 Sony Corporation Information processing device, information processing method, and program
US20230201730A1 (en) * 2021-12-28 2023-06-29 Anthony Blackwell Speaking Doll Assembly

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994000831A1 (en) * 1992-06-23 1994-01-06 Charles Borg Device to facilitate alternative response behaviour
WO1999054015A1 (en) * 1998-04-16 1999-10-28 Creator Ltd. Interactive toy
US6048209A (en) * 1998-05-26 2000-04-11 Bailey; William V. Doll simulating adaptive infant behavior
US20020016128A1 (en) * 2000-07-04 2002-02-07 Tomy Company, Ltd. Interactive toy, reaction behavior pattern generating device, and reaction behavior pattern generating method
US20030045203A1 (en) * 1999-11-30 2003-03-06 Kohtaro Sabe Robot apparatus, control method thereof, and method for judging character of robot apparatus

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5443388A (en) * 1994-08-01 1995-08-22 Jurmain; Richard N. Infant simulation system for pregnancy deterrence and child care training
JPH10289006A (en) * 1997-04-11 1998-10-27 Yamaha Motor Co Ltd Method for controlling object to be controlled using artificial emotion
US6604980B1 (en) * 1998-12-04 2003-08-12 Realityworks, Inc. Infant simulator
WO1999032203A1 (en) * 1997-12-19 1999-07-01 Smartoy Ltd. A standalone interactive toy
US6056618A (en) * 1998-05-26 2000-05-02 Larian; Isaac Toy character with electronic activities-oriented game unit
US6663393B1 (en) * 1999-07-10 2003-12-16 Nabil N. Ghaly Interactive play device and method
US6669527B2 (en) * 2001-01-04 2003-12-30 Thinking Technology, Inc. Doll or toy character adapted to recognize or generate whispers
JP4595436B2 (en) * 2004-03-25 2010-12-08 日本電気株式会社 Robot, control method thereof and control program
GB2425490A (en) * 2005-04-26 2006-11-01 Steven Lipman Wireless communication toy
US7837531B2 (en) * 2005-10-31 2010-11-23 Les Friedland Toy doll
US20070128979A1 (en) * 2005-12-07 2007-06-07 J. Shackelford Associates Llc. Interactive Hi-Tech doll

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994000831A1 (en) * 1992-06-23 1994-01-06 Charles Borg Device to facilitate alternative response behaviour
WO1999054015A1 (en) * 1998-04-16 1999-10-28 Creator Ltd. Interactive toy
US6048209A (en) * 1998-05-26 2000-04-11 Bailey; William V. Doll simulating adaptive infant behavior
US20030045203A1 (en) * 1999-11-30 2003-03-06 Kohtaro Sabe Robot apparatus, control method thereof, and method for judging character of robot apparatus
US20020016128A1 (en) * 2000-07-04 2002-02-07 Tomy Company, Ltd. Interactive toy, reaction behavior pattern generating device, and reaction behavior pattern generating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2010061286A1 *

Also Published As

Publication number Publication date
CN102227240A (en) 2011-10-26
WO2010061286A1 (en) 2010-06-03
EP2367606A4 (en) 2012-09-19
ZA201103438B (en) 2012-01-25
HK1163003A1 (en) 2012-09-07
US20110230114A1 (en) 2011-09-22
CN102227240B (en) 2013-11-13

Similar Documents

Publication Publication Date Title
US20110230114A1 (en) Toy exhibiting bonding behavior
JP6888096B2 (en) Robot, server and human-machine interaction methods
CN108231070B (en) Voice conversation device, voice conversation method, recording medium, and robot
US10702991B2 (en) Apparatus, robot, method and recording medium having program recorded thereon
EP1113417B1 (en) Apparatus, method and recording medium for speech synthesis
WO2018103028A1 (en) Audio playback device, system, and method
CN108735219A (en) A kind of voice recognition control method and device
CN111475206B (en) Method and apparatus for waking up wearable device
US20020042713A1 (en) Toy having speech recognition function and two-way conversation for dialogue partner
CN108670196B (en) Method and device for monitoring sleep state of infant
JPH10328422A (en) Automatically responding toy
JP2019217122A (en) Robot, method for controlling robot and program
CN112634944A (en) Method for recognizing sound event
CA2498232A1 (en) Breath-sensitive toy
Westerman et al. Modelling the development of mirror neurons for auditory-motor integration
JP5602753B2 (en) A toy showing nostalgic behavior
KR20200055467A (en) Artificial intelligence based necklace type wearable device
JP2005231012A (en) Robot device and its control method
CN115705841A (en) Speech recognition using an accelerometer to sense bone conduction
KR102426792B1 (en) Method for recognition of silent speech and apparatus thereof
CN111050266B (en) Method and system for performing function control based on earphone detection action
Diep et al. Neuron-like approach to speech recognition
JP3919726B2 (en) Learning apparatus and method
JP7169029B1 (en) Baby type dialogue robot, baby type dialogue method and baby type dialogue program
CN112562653B (en) Offline voice recognition learning method based on human behavior experience

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20110624

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

RIN1 Information on inventor provided before grant (corrected)

Inventor name: DU PREEZ, JOHAN, ADAM

Inventor name: SCHWARDT, LUDWIG, CARL

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20120820

RIC1 Information provided on ipc code assigned before grant

Ipc: A63H 3/28 20060101AFI20120813BHEP

17Q First examination report despatched

Effective date: 20161215

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20180423