US20130339027A1 - Depth based context identification - Google Patents

Depth based context identification Download PDF

Info

Publication number
US20130339027A1
US20130339027A1 US13/524,351 US201213524351A US2013339027A1 US 20130339027 A1 US20130339027 A1 US 20130339027A1 US 201213524351 A US201213524351 A US 201213524351A US 2013339027 A1 US2013339027 A1 US 2013339027A1
Authority
US
United States
Prior art keywords
user
gesture
verbal commands
command
depth camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/524,351
Other versions
US9092394B2 (en
Inventor
Tarek El Dokor
James Holmes
Jordan Cluster
Stuart Yamamoto
Pedram Vaghefinazari
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honda Motor Co Ltd
Edge 3 Technologies LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US13/524,351 priority Critical patent/US9092394B2/en
Assigned to Edge3 Technologies, LLC reassignment Edge3 Technologies, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CLUSTER, JORDAN, DOKOR, Tarek El, HOLMES, JAMES
Assigned to HONDA MOTOR CO., LTD. reassignment HONDA MOTOR CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VAGHEFINAZARI, PEDRAM, YAMAMOTO, STUART
Priority to KR1020157001026A priority patent/KR102061925B1/en
Priority to CN201380030981.8A priority patent/CN104620257B/en
Priority to PCT/US2013/036654 priority patent/WO2013188002A1/en
Priority to JP2015517255A priority patent/JP6010692B2/en
Priority to EP13804195.9A priority patent/EP2862125B1/en
Publication of US20130339027A1 publication Critical patent/US20130339027A1/en
Priority to IL236089A priority patent/IL236089A/en
Publication of US9092394B2 publication Critical patent/US9092394B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/08Cursor circuits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer

Definitions

  • the present invention is related to recognizing voice commands using pose or gesture information to increase the accuracy of speech recognition.
  • a driver or a passenger of a vehicle typically operates various devices in the vehicle using switches, screens, keypads or other input mechanism using fingers or hands.
  • Such input mechanisms may be used to operate, for example, a navigation system, an entertainment system, a climate system or a phone system.
  • a complicated series of operations must be performed on the input mechanism to issue a desired command to the devices.
  • Speech recognition is the process of converting an acoustic signal to speech elements (e.g., phonemes, words and sentences). Speech recognition has found application in various areas ranging from telephony to vehicle operation.
  • the audio signal is collected by input devices (e.g., a microphone), converted to a digital signal, and then processed using one or more algorithms to output speech elements contained in the audio signal.
  • the recognized speech elements can be the final results of speech recognition or intermediate information used for further processing.
  • One of the issues in using voice recognition in vehicles is that similar or the same verbal commands may be used for different devices. Sharing of similar or the same verbal commands causes ambiguity in verbal commands. For example, a command such as “locate XYZ” may indicate the locating of a particular point-of-interest (POI) in the context of navigation whereas the same command may also indicate identification of a sound track in an entertainment system. If the context of the user's command is not properly identified, operations other that what are intended by the user may be carried out by the devices in the vehicle.
  • POI point-of-interest
  • Embodiments of the present invention provide a system or a method of recognizing verbal commands based on the pose or gesture of a user.
  • One or more devices among a plurality of devices that are likely to be targeted by the user for an operation are selected by gesture information representing the pose or gesture of the user.
  • a plurality of verbal commands associated with the one or more devices targeted by the user are selected based on the received gesture information.
  • An audio signal is processed using the selected plurality of verbal commands to determine a device command for operating the one or more devices.
  • a depth camera is used for capturing at least one depth image.
  • Each of the depth images covers at least a part of the user and comprises pixels representing distances from the depth camera to the at least part of the user.
  • the at least one depth image is processed to determine the pose or gesture of the user.
  • the gesture information is generated based on the recognized pose or gesture.
  • the at least part of the user comprise a hand or a forearm of the user.
  • the depth camera is installed in an overhead console in a vehicle with a field of view covering the user.
  • the plurality of devices comprise at least a navigation system and an entertainment system in a vehicle.
  • the gesture information indicates whether a hand or forearm of a user is located within a distance from the depth camera or beyond the distance from the depth camera.
  • a first set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located within the distance.
  • a second set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
  • the first set of verbal commands is associated with performing navigation operations in a vehicle.
  • the second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.
  • FIG. 1A is a side view of a vehicle equipped with a command processing system, according to one embodiment.
  • FIG. 1B is a top view of the vehicle of FIG. 1A , according to one embodiment.
  • FIG. 2 is a block diagram of a command processing system, according to one embodiment.
  • FIG. 3 is a block diagram of a speech recognition module, according to one embodiment.
  • FIG. 4 is a conceptual diagram illustrating a search region for a point-of-interest, according to one embodiment.
  • FIG. 5 is a flowchart for a method of performing speech recognition based on depth images captured by a camera, according to one embodiment.
  • Certain aspects of the embodiments include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
  • Embodiments also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • Embodiments relate to selecting or pruning applicable verbal commands associated with speech recognition based on a user's motion or gesture detected from a depth camera. Depending on the depth of the user's hand or forearm relative to the depth camera, the context of the verbal command is determined and one or more command dictionaries corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected command dictionaries. By using command dictionaries depending on the context, the accuracy of the speech recognition is increased.
  • the term “user” includes a driver of a vehicle as well as a passenger. The user may be anyone attempting to control one or more devices in the vehicle.
  • a “pose” refers to the configuration of body parts of a user.
  • the pose may, for example, indicate relationships of a hand and a forearm of the user relative to other body parts or a reference point (e.g., a camera).
  • a “gesture” refers to a series of configuration of body parts of a user that changes with progress of time.
  • the gesture for example, may include a series of arm and hand movement pointing to a direction.
  • a “device command” refers to an instruction for operating or controlling a device.
  • the device command may be received and interpreted by the device to perform a certain operation or a set of operations.
  • a “navigation operation” refers to an operation by a user for using a computing device (e.g., an onboard telematics device) to identify, locate, choose or obtain information for driving to a destination.
  • the navigation operation may include providing user input to select an address or point of interest, and choosing an address or point of interest displayed as a result of providing the user input.
  • FIGS. 1A and 1B illustrate a vehicle 100 equipped with a command processing system, according to one embodiment.
  • the command processing system may include, among other components, a central processing unit 120 and an overhead console unit 110 .
  • the command processing system may be connected to other components (e.g., a navigation system and an entertainment system) of the vehicle 100 to perform various operations.
  • the command processing system recognizes verbal commands based on a user's motion or gesture, as described below in detail with reference to FIGS. 3 and 4 .
  • the central processing unit 120 processes an audio signal to detect a user's verbal commands included in the audio signal.
  • the central processing unit 120 is connected to other components such as a cabin system (e.g., a navigation system, entertainment system, climate control system and diagnostic system).
  • the central processing unit 120 controls these devices based on verbal commands received from the user.
  • the central processing unit 120 may be a stand-alone device or may be a part of a larger system (e.g., telematics system).
  • the central processing unit 120 is described below in detail with reference to FIG. 2 .
  • the central processing unit 120 may be placed at any locations within the vehicle 100 . As illustrated in FIGS. 1A and 1B , the central processing unit 120 may be located at the center console of the vehicle 100 . Alternatively, the central processing unit 120 may be installed within the dashboard of the vehicle 100 . Further, the central processing unit 120 may also be installed on the ceiling of the vehicle.
  • the overhead console unit 110 is located at the ceiling of the vehicle interior and includes sensors (e.g., microphone and camera) to capture depth images of the user and detect audio signals, as described below in detail with reference to FIG. 2 .
  • the overhead console unit 110 may include various other components such as a garage opener.
  • the sensors of the overhead console unit 110 communicate with the central processing unit 120 to provide signals for detecting the user's verbal command.
  • the communication between the sensors of the overhead console unit 110 and the central processing unit 120 can be established by any wired or wireless communication medium currently used or to be developed in the future.
  • FIG. 2 is a block diagram illustrating the command processing system 200 , according to one embodiment.
  • the command processing system 200 may include, among other components, a processor 210 , an output interface 214 , an input interface 218 , memory 240 and a bus connecting these components.
  • the command processing system 200 may also include a depth camera 222 and a microphone 260 .
  • the depth camera 222 and the microphone 260 are connected to the input interface 218 via channels 220 , 262 .
  • the command processing system 200 may include more than one depth camera or microphone.
  • the processor 210 executes instructions stored in the memory 240 and processes the sensor data received via the input interface 218 . Although only a single processor 210 is illustrated in FIG. 2 , more than one processor may be used to increase the processing capacity of the command processing system 200 .
  • the output interface 214 is hardware, software, firmware or a combination thereof for sending data including device commands to other devices such as a navigation system, an entertainment system, a climate control system and a diagnostic system via communication channels. To send the data, the output interface 214 may format and regulate signals to comply with predetermined communication protocols.
  • the input interface 218 is hardware, software, firmware or a combination thereof for receiving the sensor signals from the overhead console unit 110 .
  • the sensor signals include the depth images received via channel 220 , and the audio signals received via channel 262 .
  • the input interface 218 may buffer the received sensor signals and perform pre-processing on the sensor signals before forwarding the sensor signals to the processor 210 or the memory 240 via bus 268 .
  • the depth camera 222 captures the depth images of the driver and sends the depth images to the input interface 218 via the channel 220 .
  • the depth camera 222 may be embodied as a time-of-flight (TOF) camera, a stereovision camera or other types of cameras that generate depth images including information on distance to different points of objects within its field of view.
  • the stereovision camera uses two lenses to capture images from different locations. The captured images are then processed to generate the depth images.
  • the depth camera 222 generates grayscale images with each pixel indicating the distance from the depth camera 222 to a point of an object (e.g., the driver) corresponding to the pixel.
  • the depth camera 222 is installed on the overhead console unit 110 and has a field of view 116 overlooking the driver of the vehicle 100 .
  • the depth camera 222 advantageously has an unobstructed view of the driver and the center console of the vehicle 100 .
  • the depth of the driver's hand or arm relative to the depth camera 222 provides indication of the operations intended by the driver, as described below in detail with reference to the gesture recognition module 252 .
  • the microphone 260 senses acoustic waves and converts the acoustic waves into analog electric signals.
  • the microphone 260 includes an analog-to-digital (A/D) converter for converting the analog electric signals into digital signals.
  • the converted digital signals are sent to the input interface 218 via the channel 262 .
  • the A/D converter may be included in the input interface 218 .
  • the microphone 260 sends analog electric signals to the input interface 218 via the channel 262 for conversion to digital signals and further processing.
  • the memory 240 stores instructions to be executed by the processor 210 and other data associated with the instructions.
  • the memory 240 may be volatile memory, non-volatile memory or a combination thereof.
  • the memory 240 may store, among other software modules, a command format module 244 , a gesture recognition module 252 and a speech recognition module 256 .
  • the memory 240 may include other software modules such as an operating system, the description of which is omitted herein for the sake of brevity.
  • the gesture recognition module 252 detects the driver's gestures or motions based on the depth images captured by the depth camera 222 . In one embodiment, the gesture recognition module 252 detects the location and/or motions of the hand or forearm to determine the context of verbal commands. In one embodiment, the gesture recognition module 252 determines the location of the driver's hand or forearm relative to the depth camera 222 . If the driver's hand or forearm is closer to the depth camera 222 (i.e., the distance from the depth camera 222 to the hand or forearm is below a threshold), for example, the driver is likely to be taking actions or making gestures associated with navigation operations (e.g., pointing a finger towards a direction outside the window).
  • navigation operations e.g., pointing a finger towards a direction outside the window.
  • the driver's hand or forearm is away from the depth camera 222 (i.e., the distance from the depth camera 222 to the hand or the forearm is at or above the threshold)
  • the driver is likely to be taking actions or making gestures associated with other control functions typically provided in the center console (e.g., operate an entertainment system and climate control system).
  • the gesture recognition module 252 may employ a computing algorithm that clusters groups of pixels in the depth images and tracks the locations of these groups with progress of time to determine the driver's motions or gesture.
  • the pixels may be clustered into groups based on the proximity of the two-dimensional distance of pixels and the depth difference of the pixels.
  • the gesture recognition module 252 may also store a model of human body and map the groups of pixels to the stored model to accurately detect and track the locations of the hand and/or forearm.
  • the gesture recognition module 252 may further detect the location of the driver's hand with a higher resolution to determine the device associated with the driver's operation. If the center console of the vehicle has switches or knobs for operating the entertainment system at the middle of the center console and switches for a climate control system at both sides, the location of the driver's hand around the middle of the center console indicates that the driver is engaged in operations of the entertainment system. If the driver's hand is closer to the sides of the center console than the middle portion of the center console, the driver is more likely to be engaged in operations of the climate control system. Hence, the command processing system 200 may use the gesture information on the specific location of the hand at the time verbal commands are issued by the driver to determine a device associated with the verbal commands.
  • the speech recognition module 256 determines the verbal command issued by the driver. To determine the verbal command, the speech recognition module 256 receives gesture information about the driver's gesture from the gesture recognition module 252 , as described below in detail with reference to FIG. 3 .
  • the command format module 244 translates the verbal commands detected at the speech recognition module 256 into device commands for operating devices installed in the vehicle 100 .
  • Each device installed in the vehicle 100 may require commands to be provided in a different format.
  • the command format module 244 translates the commands into a format that can be processed by each device.
  • the command format module 244 may request further information from the driver if the issued verbal command is unclear, ambiguous or deficient. Such request for further information may be made via a speaker.
  • the command format 244 may also combine the information from the gesture recognition module 252 to generate a device command, as described below in detail with reference to FIG. 4 .
  • the command format module 244 , the gesture recognition module 252 and the speech recognition module 256 need not be stored in the same memory 240 .
  • the gesture recognition module 252 may be stored in memory in an overhead console unit whereas speech recognition module 256 and the command format module 244 may be stored in memory in a center console unit.
  • one or more of these modules may be embodied as a dedicated hardware component.
  • FIG. 3 is a block diagram illustrating components of the speech recognition module 256 , according to one embodiment.
  • the speech recognition module 256 may include, among other components, a gesture recognition interface 312 , a command extraction module 316 and a command dictionary 320 .
  • the speech recognition module 256 may also include other modules such as a history management module that retains the list of verbal commands previously issued by a user.
  • the gesture recognition interface 312 enables the speech recognition module 256 to communicate with the gesture recognition module 252 .
  • the gesture information received from the gesture recognition module 252 via the gesture recognition interface 312 indicates the location of the driver's hand or forearm.
  • the command dictionary 320 includes commands associated with various devices of the vehicle 100 .
  • the command dictionary 320 includes a plurality of dictionaries 320 A through 320 N, each associated with a device or system of the vehicle 100 .
  • dictionary 320 A stores commands associated with the operation of a navigation system
  • dictionary 320 B stores commands associated with the operation of an entertainment system
  • dictionary 320 C stores commands associated with a climate control system.
  • the command extraction module 316 extracts the verbal commands included in the audio signal based on the gesture data and commands stored in selected command dictionaries 320 . After the gesture information is received, the command extraction module 316 selects one or more dictionaries based on the location of the user's hand or forearm as indicated by the gesture information. If the gesture data indicates that the user's hand or forearm is at a certain pose, dictionaries associated with devices in the vicinity of the driver's hand or forearm are selected for command extraction. For example, if the user's hand is within a certain distance from an entertainment system, a dictionary (e.g., dictionary 320 B) associated with the entertainment system is selected for command extraction.
  • a dictionary e.g., dictionary 320 B
  • the command extraction module 316 determines that the verbal commands are associated with the navigation system. Hence, the command extraction module 316 selects and uses a dictionary (e.g., dictionary 320 A) associated with the navigation operation to perform speech recognition.
  • a dictionary e.g., dictionary 320 A
  • the verbal command recognized by the command extraction module 316 is combined with gesture information to generate navigation commands at the command format module 244 .
  • the gesture information may indicate, for example, the orientation of the driver's finger, as described below in detail with reference to FIG. 4 .
  • the command extraction module 316 may use more than one dictionary to extract the verbal commands. If the hand of the user is located around the center console, dictionaries associated with any devices (e.g., the entertainment system or the climate control system) that can be operated at the center console may be selected.
  • devices e.g., the entertainment system or the climate control system
  • the command extraction module 316 assigns probability weights to commands based on the location of the user's hand or forearm.
  • the command extraction module 316 uses a statistical model that computes probabilities of spoken verbal commands based on phonemes appearing in a sequence.
  • the statistical model may include parameters that take into account of the location of the hand or forearm in determining the most likely command intended by the driver.
  • the speech recognition module 256 of FIG. 3 is merely illustrative. Various modifications can be made to the speech recognition module 256 .
  • the command dictionary 320 may map each of a plurality of commands to one or more devices.
  • a user can conveniently identify a point-of-interest or destination. While pointing to a point-of-interest or destination, the user can utter a command requesting the navigation system to identify and/or set a point-of-interest.
  • the command format module 244 may combine the commands recognized from speech and parameters extracted from the gesture information to generate a navigation command.
  • FIG. 4 is a conceptual diagram illustrating a search region for a point-of-interest, according to one embodiment.
  • the driver wishes to navigate to building 410 or wants to identify the name or address of building 410 .
  • the driver states a verbal command “identify that building” while pointing his finger towards the building 410 .
  • the gesture recognition module 252 may detect that the driver is pointing his finger in direction O-A (shown in a dashed line).
  • the speech recognition module 256 receives gesture information from the gesture recognition module 252 indicating that the user's arm and forearm is raised.
  • the speech recognition module 256 determines that the verbal command is associated with a navigation system (since the hand and forearm is raised) and uses a dictionary associated with the navigation system to recognize the verbal command.
  • the speech recognition module 256 sends the identified verbal command to the command format module 244 .
  • the command format module 244 receives the verbal command, analyzes the verbal command and determines that the phrase “that building” needs further clarification.
  • the command format module 244 analyzes the gesture information and uses a parameter in the gesture information indicating the orientation (indicated by line O-A) of the user's finger to generates device command requesting the navigation system to identify any points-of-interest in the direction of line O-A.
  • the parameter may be angle ⁇ relative to the front direction of the vehicle 100 .
  • the navigation system receives the device command, and establishes a search cone represented by O-B-C-O.
  • the search cone has a height of R indicating the search radius (e.g., 10 miles) and has a cone angle of 2 ⁇ .
  • the cone angle 2 ⁇ may be increased to expand the search or to allow increased tolerance for errors.
  • the navigation system performs the search within the search region identified by the search cone, taking into account vehicle speed and the direction of the vehicle movement. In one embodiment, priority is given to the points of interest that are closer to the vehicle.
  • the navigation system presents a list of points-of-interest found within the search region to the user. The user may then indicate the point-of-interest from the searched list and request further actions (e.g., navigate to the point-of-interest or make a phone call to the point-of-interest).
  • FIG. 5 is a flowchart illustrating a method of recognizing verbal commands based on a driver's motions or gesture, according to one embodiment.
  • the command processing system 200 generates 506 depth images using the depth camera 222 . Using the generated depth images, the command processing system 200 generates 510 gesture information of the user. The gesture information may indicate, among other things, the location of hands or forearms of the user relative to the depth camera 222 .
  • the command processing system 200 selects 514 one or more dictionaries for recognizing verbal commands.
  • Each dictionary may include commands for a certain system or device in the vehicle 100 .
  • the command processing system 200 also generates a digital audio signal representing the driver's utterance based on an acoustic signal received at the microphone 260 .
  • the command processing system 200 After the applicable dictionary or dictionaries are selected, the command processing system 200 performs 518 speech recognition on the generated audio signal using one or more selected dictionaries.
  • the accuracy of the speech recognition can be increased.
  • the command processing system 200 After a verbal command is generated, the command processing system 200 generates 522 a device command corresponding to the verbal command by translating the verbal command into the device command. If needed, the command processing system 200 may add, modify or request information for generating the device command.
  • speech recognition may be performed 518 to generate a set of candidate verbal commands.
  • the final verbal command may be selected from the set of candidate verbal commands based on the determination 510 of the driver's gesture.
  • one or more processes may be performed in parallel. For example, generating 506 the depth images at the depth camera 222 may be performed in parallel with generating 516 the audio signal.
  • one or more cameras are used to increase the accuracy of gesture detection.
  • the cameras may also capture color images.
  • the color images may detect skin tone that represents the driver's hands. By correlating the color images with depth images, the location of the hand or forearm can be detected more accurately.
  • two or more cameras may be located at different locations of the ceiling or elsewhere in the vehicle 100 to complement or supplant the depth images captured at one depth camera.
  • one or more components of the command processing system 200 may be embodied by a remote server communicating with the command processing system 200 installed in the vehicle 100 .
  • the speech recognition module 256 is embodied in a remote server that communicates wirelessly with the command processing system 200 installed in the vehicle 100 .
  • the command processing system 200 is used in a transport apparatus other than a vehicle.
  • the command processing system 200 can be used, for example, in airplanes or motorcycles.

Abstract

A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.

Description

    FIELD OF THE INVENTION
  • The present invention is related to recognizing voice commands using pose or gesture information to increase the accuracy of speech recognition.
  • BACKGROUND OF THE INVENTION
  • A driver or a passenger of a vehicle typically operates various devices in the vehicle using switches, screens, keypads or other input mechanism using fingers or hands. Such input mechanisms may be used to operate, for example, a navigation system, an entertainment system, a climate system or a phone system. Sometimes, a complicated series of operations must be performed on the input mechanism to issue a desired command to the devices. However, it is preferable for the driver to keep both hands on a steering wheel and operate these input devices by a hand intermittently for only a brief period of time. Depending on the complexity of the operations, it may take multiple attempts to operate the input devices before the driver can perform operations as desired.
  • Hence, it is advantageous to use a mode of operation that makes less use of a driver's hands. One mode of such operation is speech recognition. Speech recognition is the process of converting an acoustic signal to speech elements (e.g., phonemes, words and sentences). Speech recognition has found application in various areas ranging from telephony to vehicle operation. In a speech recognition system, the audio signal is collected by input devices (e.g., a microphone), converted to a digital signal, and then processed using one or more algorithms to output speech elements contained in the audio signal. Depending on the field of application, the recognized speech elements can be the final results of speech recognition or intermediate information used for further processing.
  • One of the issues in using voice recognition in vehicles is that similar or the same verbal commands may be used for different devices. Sharing of similar or the same verbal commands causes ambiguity in verbal commands. For example, a command such as “locate XYZ” may indicate the locating of a particular point-of-interest (POI) in the context of navigation whereas the same command may also indicate identification of a sound track in an entertainment system. If the context of the user's command is not properly identified, operations other that what are intended by the user may be carried out by the devices in the vehicle.
  • Unintended operations and time spent in subsequent remedial actions due to ambiguous verbal command may deteriorate user experience and cause the user to revert to manual operations.
  • SUMMARY OF THE INVENTION
  • Embodiments of the present invention provide a system or a method of recognizing verbal commands based on the pose or gesture of a user. One or more devices among a plurality of devices that are likely to be targeted by the user for an operation are selected by gesture information representing the pose or gesture of the user. A plurality of verbal commands associated with the one or more devices targeted by the user are selected based on the received gesture information. An audio signal is processed using the selected plurality of verbal commands to determine a device command for operating the one or more devices.
  • In one embodiment of the present invention, a depth camera is used for capturing at least one depth image. Each of the depth images covers at least a part of the user and comprises pixels representing distances from the depth camera to the at least part of the user. The at least one depth image is processed to determine the pose or gesture of the user. The gesture information is generated based on the recognized pose or gesture.
  • In one embodiment, the at least part of the user comprise a hand or a forearm of the user.
  • In one embodiment, the depth camera is installed in an overhead console in a vehicle with a field of view covering the user.
  • In one embodiment, the plurality of devices comprise at least a navigation system and an entertainment system in a vehicle.
  • In one embodiment, the gesture information indicates whether a hand or forearm of a user is located within a distance from the depth camera or beyond the distance from the depth camera. A first set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located within the distance. A second set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
  • In one embodiment, the first set of verbal commands is associated with performing navigation operations in a vehicle. The second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.
  • The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings.
  • FIG. 1A is a side view of a vehicle equipped with a command processing system, according to one embodiment.
  • FIG. 1B is a top view of the vehicle of FIG. 1A, according to one embodiment.
  • FIG. 2 is a block diagram of a command processing system, according to one embodiment.
  • FIG. 3 is a block diagram of a speech recognition module, according to one embodiment.
  • FIG. 4 is a conceptual diagram illustrating a search region for a point-of-interest, according to one embodiment.
  • FIG. 5 is a flowchart for a method of performing speech recognition based on depth images captured by a camera, according to one embodiment.
  • DETAILED DESCRIPTION OF THE DISCLOSURE
  • A preferred embodiment is now described with reference to the figures where like reference numbers indicate identical or functionally similar elements.
  • Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
  • Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as modules or code devices, without loss of generality.
  • However, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or “determining” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
  • Certain aspects of the embodiments include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
  • Embodiments also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode.
  • In addition, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope, which is set forth in the following claims.
  • Embodiments relate to selecting or pruning applicable verbal commands associated with speech recognition based on a user's motion or gesture detected from a depth camera. Depending on the depth of the user's hand or forearm relative to the depth camera, the context of the verbal command is determined and one or more command dictionaries corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected command dictionaries. By using command dictionaries depending on the context, the accuracy of the speech recognition is increased.
  • As used herein, the term “user” includes a driver of a vehicle as well as a passenger. The user may be anyone attempting to control one or more devices in the vehicle.
  • As used herein, a “pose” refers to the configuration of body parts of a user. The pose may, for example, indicate relationships of a hand and a forearm of the user relative to other body parts or a reference point (e.g., a camera).
  • As used herein, a “gesture” refers to a series of configuration of body parts of a user that changes with progress of time. The gesture, for example, may include a series of arm and hand movement pointing to a direction.
  • As used herein, a “device command” refers to an instruction for operating or controlling a device. The device command may be received and interpreted by the device to perform a certain operation or a set of operations.
  • As used herein, a “navigation operation” refers to an operation by a user for using a computing device (e.g., an onboard telematics device) to identify, locate, choose or obtain information for driving to a destination. For example, the navigation operation may include providing user input to select an address or point of interest, and choosing an address or point of interest displayed as a result of providing the user input.
  • Overview of Vehicle Equipped with Verbal Command System
  • FIGS. 1A and 1B illustrate a vehicle 100 equipped with a command processing system, according to one embodiment. The command processing system may include, among other components, a central processing unit 120 and an overhead console unit 110. The command processing system may be connected to other components (e.g., a navigation system and an entertainment system) of the vehicle 100 to perform various operations. The command processing system recognizes verbal commands based on a user's motion or gesture, as described below in detail with reference to FIGS. 3 and 4.
  • The central processing unit 120 processes an audio signal to detect a user's verbal commands included in the audio signal. The central processing unit 120 is connected to other components such as a cabin system (e.g., a navigation system, entertainment system, climate control system and diagnostic system). The central processing unit 120 controls these devices based on verbal commands received from the user. The central processing unit 120 may be a stand-alone device or may be a part of a larger system (e.g., telematics system). The central processing unit 120 is described below in detail with reference to FIG. 2.
  • The central processing unit 120 may be placed at any locations within the vehicle 100. As illustrated in FIGS. 1A and 1B, the central processing unit 120 may be located at the center console of the vehicle 100. Alternatively, the central processing unit 120 may be installed within the dashboard of the vehicle 100. Further, the central processing unit 120 may also be installed on the ceiling of the vehicle.
  • The overhead console unit 110 is located at the ceiling of the vehicle interior and includes sensors (e.g., microphone and camera) to capture depth images of the user and detect audio signals, as described below in detail with reference to FIG. 2. The overhead console unit 110 may include various other components such as a garage opener. The sensors of the overhead console unit 110 communicate with the central processing unit 120 to provide signals for detecting the user's verbal command.
  • The communication between the sensors of the overhead console unit 110 and the central processing unit 120 can be established by any wired or wireless communication medium currently used or to be developed in the future.
  • Example Command Processing System
  • FIG. 2 is a block diagram illustrating the command processing system 200, according to one embodiment. The command processing system 200 may include, among other components, a processor 210, an output interface 214, an input interface 218, memory 240 and a bus connecting these components. The command processing system 200 may also include a depth camera 222 and a microphone 260. The depth camera 222 and the microphone 260 are connected to the input interface 218 via channels 220, 262. Although not illustrated in FIG. 2, the command processing system 200 may include more than one depth camera or microphone.
  • The processor 210 executes instructions stored in the memory 240 and processes the sensor data received via the input interface 218. Although only a single processor 210 is illustrated in FIG. 2, more than one processor may be used to increase the processing capacity of the command processing system 200.
  • The output interface 214 is hardware, software, firmware or a combination thereof for sending data including device commands to other devices such as a navigation system, an entertainment system, a climate control system and a diagnostic system via communication channels. To send the data, the output interface 214 may format and regulate signals to comply with predetermined communication protocols.
  • The input interface 218 is hardware, software, firmware or a combination thereof for receiving the sensor signals from the overhead console unit 110. The sensor signals include the depth images received via channel 220, and the audio signals received via channel 262. The input interface 218 may buffer the received sensor signals and perform pre-processing on the sensor signals before forwarding the sensor signals to the processor 210 or the memory 240 via bus 268.
  • The depth camera 222 captures the depth images of the driver and sends the depth images to the input interface 218 via the channel 220. The depth camera 222 may be embodied as a time-of-flight (TOF) camera, a stereovision camera or other types of cameras that generate depth images including information on distance to different points of objects within its field of view. The stereovision camera uses two lenses to capture images from different locations. The captured images are then processed to generate the depth images. In one embodiment, the depth camera 222 generates grayscale images with each pixel indicating the distance from the depth camera 222 to a point of an object (e.g., the driver) corresponding to the pixel.
  • Referring to FIG. 1A, the depth camera 222 is installed on the overhead console unit 110 and has a field of view 116 overlooking the driver of the vehicle 100. By installing the depth camera 222 on the overhead console unit 110, the depth camera 222 advantageously has an unobstructed view of the driver and the center console of the vehicle 100. Further, the depth of the driver's hand or arm relative to the depth camera 222 provides indication of the operations intended by the driver, as described below in detail with reference to the gesture recognition module 252.
  • The microphone 260 senses acoustic waves and converts the acoustic waves into analog electric signals. The microphone 260 includes an analog-to-digital (A/D) converter for converting the analog electric signals into digital signals. The converted digital signals are sent to the input interface 218 via the channel 262. Alternatively, the A/D converter may be included in the input interface 218. In this case, the microphone 260 sends analog electric signals to the input interface 218 via the channel 262 for conversion to digital signals and further processing.
  • The memory 240 stores instructions to be executed by the processor 210 and other data associated with the instructions. The memory 240 may be volatile memory, non-volatile memory or a combination thereof. The memory 240 may store, among other software modules, a command format module 244, a gesture recognition module 252 and a speech recognition module 256. The memory 240 may include other software modules such as an operating system, the description of which is omitted herein for the sake of brevity.
  • The gesture recognition module 252 detects the driver's gestures or motions based on the depth images captured by the depth camera 222. In one embodiment, the gesture recognition module 252 detects the location and/or motions of the hand or forearm to determine the context of verbal commands. In one embodiment, the gesture recognition module 252 determines the location of the driver's hand or forearm relative to the depth camera 222. If the driver's hand or forearm is closer to the depth camera 222 (i.e., the distance from the depth camera 222 to the hand or forearm is below a threshold), for example, the driver is likely to be taking actions or making gestures associated with navigation operations (e.g., pointing a finger towards a direction outside the window). To the contrary, if the driver's hand or forearm is away from the depth camera 222 (i.e., the distance from the depth camera 222 to the hand or the forearm is at or above the threshold), the driver is likely to be taking actions or making gestures associated with other control functions typically provided in the center console (e.g., operate an entertainment system and climate control system).
  • The gesture recognition module 252 may employ a computing algorithm that clusters groups of pixels in the depth images and tracks the locations of these groups with progress of time to determine the driver's motions or gesture. The pixels may be clustered into groups based on the proximity of the two-dimensional distance of pixels and the depth difference of the pixels. The gesture recognition module 252 may also store a model of human body and map the groups of pixels to the stored model to accurately detect and track the locations of the hand and/or forearm.
  • In one embodiment, the gesture recognition module 252 may further detect the location of the driver's hand with a higher resolution to determine the device associated with the driver's operation. If the center console of the vehicle has switches or knobs for operating the entertainment system at the middle of the center console and switches for a climate control system at both sides, the location of the driver's hand around the middle of the center console indicates that the driver is engaged in operations of the entertainment system. If the driver's hand is closer to the sides of the center console than the middle portion of the center console, the driver is more likely to be engaged in operations of the climate control system. Hence, the command processing system 200 may use the gesture information on the specific location of the hand at the time verbal commands are issued by the driver to determine a device associated with the verbal commands.
  • The speech recognition module 256 determines the verbal command issued by the driver. To determine the verbal command, the speech recognition module 256 receives gesture information about the driver's gesture from the gesture recognition module 252, as described below in detail with reference to FIG. 3.
  • The command format module 244 translates the verbal commands detected at the speech recognition module 256 into device commands for operating devices installed in the vehicle 100. Each device installed in the vehicle 100 may require commands to be provided in a different format. Hence, the command format module 244 translates the commands into a format that can be processed by each device. Further, the command format module 244 may request further information from the driver if the issued verbal command is unclear, ambiguous or deficient. Such request for further information may be made via a speaker. The command format 244 may also combine the information from the gesture recognition module 252 to generate a device command, as described below in detail with reference to FIG. 4.
  • The command format module 244, the gesture recognition module 252 and the speech recognition module 256 need not be stored in the same memory 240. For example, the gesture recognition module 252 may be stored in memory in an overhead console unit whereas speech recognition module 256 and the command format module 244 may be stored in memory in a center console unit. Further, one or more of these modules may be embodied as a dedicated hardware component.
  • Example Architecture of Speech Recognition Module
  • FIG. 3 is a block diagram illustrating components of the speech recognition module 256, according to one embodiment. The speech recognition module 256 may include, among other components, a gesture recognition interface 312, a command extraction module 316 and a command dictionary 320. The speech recognition module 256 may also include other modules such as a history management module that retains the list of verbal commands previously issued by a user.
  • The gesture recognition interface 312 enables the speech recognition module 256 to communicate with the gesture recognition module 252. In one embodiment, the gesture information received from the gesture recognition module 252 via the gesture recognition interface 312 indicates the location of the driver's hand or forearm.
  • The command dictionary 320 includes commands associated with various devices of the vehicle 100. The command dictionary 320 includes a plurality of dictionaries 320A through 320N, each associated with a device or system of the vehicle 100. For example, dictionary 320A stores commands associated with the operation of a navigation system, dictionary 320B stores commands associated with the operation of an entertainment system, and dictionary 320C stores commands associated with a climate control system.
  • The command extraction module 316 extracts the verbal commands included in the audio signal based on the gesture data and commands stored in selected command dictionaries 320. After the gesture information is received, the command extraction module 316 selects one or more dictionaries based on the location of the user's hand or forearm as indicated by the gesture information. If the gesture data indicates that the user's hand or forearm is at a certain pose, dictionaries associated with devices in the vicinity of the driver's hand or forearm are selected for command extraction. For example, if the user's hand is within a certain distance from an entertainment system, a dictionary (e.g., dictionary 320B) associated with the entertainment system is selected for command extraction.
  • Conversely, if the driver's hand or forearm is away from these devices and is raised above a certain level (i.e., raised above the dashboard) at the time the verbal commands are issued, the command extraction module 316 determines that the verbal commands are associated with the navigation system. Hence, the command extraction module 316 selects and uses a dictionary (e.g., dictionary 320A) associated with the navigation operation to perform speech recognition.
  • In one embodiment, the verbal command recognized by the command extraction module 316 is combined with gesture information to generate navigation commands at the command format module 244. The gesture information may indicate, for example, the orientation of the driver's finger, as described below in detail with reference to FIG. 4.
  • The command extraction module 316 may use more than one dictionary to extract the verbal commands. If the hand of the user is located around the center console, dictionaries associated with any devices (e.g., the entertainment system or the climate control system) that can be operated at the center console may be selected.
  • In one embodiment, the command extraction module 316 assigns probability weights to commands based on the location of the user's hand or forearm. The command extraction module 316 uses a statistical model that computes probabilities of spoken verbal commands based on phonemes appearing in a sequence. The statistical model may include parameters that take into account of the location of the hand or forearm in determining the most likely command intended by the driver.
  • The speech recognition module 256 of FIG. 3 is merely illustrative. Various modifications can be made to the speech recognition module 256. For example, instead of having multiple dictionaries, the command dictionary 320 may map each of a plurality of commands to one or more devices.
  • Example Detecting Point-of-Interest Using Gesture and Verbal Command
  • By using a combination of hand gesture and a voice command, a user can conveniently identify a point-of-interest or destination. While pointing to a point-of-interest or destination, the user can utter a command requesting the navigation system to identify and/or set a point-of-interest. The command format module 244 may combine the commands recognized from speech and parameters extracted from the gesture information to generate a navigation command.
  • FIG. 4 is a conceptual diagram illustrating a search region for a point-of-interest, according to one embodiment. In FIG. 4, the driver wishes to navigate to building 410 or wants to identify the name or address of building 410. The driver states a verbal command “identify that building” while pointing his finger towards the building 410. Due to various inaccuracies, the gesture recognition module 252 may detect that the driver is pointing his finger in direction O-A (shown in a dashed line).
  • In response, the speech recognition module 256 receives gesture information from the gesture recognition module 252 indicating that the user's arm and forearm is raised. The speech recognition module 256 determines that the verbal command is associated with a navigation system (since the hand and forearm is raised) and uses a dictionary associated with the navigation system to recognize the verbal command. The speech recognition module 256 sends the identified verbal command to the command format module 244.
  • The command format module 244 receives the verbal command, analyzes the verbal command and determines that the phrase “that building” needs further clarification. The command format module 244 analyzes the gesture information and uses a parameter in the gesture information indicating the orientation (indicated by line O-A) of the user's finger to generates device command requesting the navigation system to identify any points-of-interest in the direction of line O-A. For example, the parameter may be angle θ relative to the front direction of the vehicle 100.
  • The navigation system receives the device command, and establishes a search cone represented by O-B-C-O. The search cone has a height of R indicating the search radius (e.g., 10 miles) and has a cone angle of 2α. The cone angle 2α may be increased to expand the search or to allow increased tolerance for errors. The navigation system performs the search within the search region identified by the search cone, taking into account vehicle speed and the direction of the vehicle movement. In one embodiment, priority is given to the points of interest that are closer to the vehicle. In one embodiment, the navigation system presents a list of points-of-interest found within the search region to the user. The user may then indicate the point-of-interest from the searched list and request further actions (e.g., navigate to the point-of-interest or make a phone call to the point-of-interest).
  • Example Method of Recognizing Verbal Commands Based on Gesture Data
  • FIG. 5 is a flowchart illustrating a method of recognizing verbal commands based on a driver's motions or gesture, according to one embodiment. The command processing system 200 generates 506 depth images using the depth camera 222. Using the generated depth images, the command processing system 200 generates 510 gesture information of the user. The gesture information may indicate, among other things, the location of hands or forearms of the user relative to the depth camera 222.
  • Based on the gesture information, the command processing system 200 selects 514 one or more dictionaries for recognizing verbal commands. Each dictionary may include commands for a certain system or device in the vehicle 100.
  • The command processing system 200 also generates a digital audio signal representing the driver's utterance based on an acoustic signal received at the microphone 260.
  • After the applicable dictionary or dictionaries are selected, the command processing system 200 performs 518 speech recognition on the generated audio signal using one or more selected dictionaries. By limiting or pruning applicable verbal commands based on the gesture information indicating the user's pose or gesture at the time the verbal commands are spoken, the accuracy of the speech recognition can be increased.
  • After a verbal command is generated, the command processing system 200 generates 522 a device command corresponding to the verbal command by translating the verbal command into the device command. If needed, the command processing system 200 may add, modify or request information for generating the device command.
  • The processes and their sequence as illustrated in FIG. 5 are merely illustrative. Various modifications can be made to the processes and/or the sequence. For example, speech recognition may be performed 518 to generate a set of candidate verbal commands. Subsequently, the final verbal command may be selected from the set of candidate verbal commands based on the determination 510 of the driver's gesture. Further, one or more processes may be performed in parallel. For example, generating 506 the depth images at the depth camera 222 may be performed in parallel with generating 516 the audio signal.
  • ALTERNATIVE EMBODIMENTS
  • In one or more embodiments, one or more cameras are used to increase the accuracy of gesture detection. The cameras may also capture color images. The color images may detect skin tone that represents the driver's hands. By correlating the color images with depth images, the location of the hand or forearm can be detected more accurately. Further, two or more cameras may be located at different locations of the ceiling or elsewhere in the vehicle 100 to complement or supplant the depth images captured at one depth camera.
  • In one or more embodiments, one or more components of the command processing system 200 may be embodied by a remote server communicating with the command processing system 200 installed in the vehicle 100. For example, the speech recognition module 256 is embodied in a remote server that communicates wirelessly with the command processing system 200 installed in the vehicle 100.
  • In one or more embodiments, the command processing system 200 is used in a transport apparatus other than a vehicle. The command processing system 200 can be used, for example, in airplanes or motorcycles.
  • Although several embodiments are described above, various modifications can be made within the scope of the present disclosure. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims (21)

What is claimed is:
1. A computer-implemented method of recognizing verbal commands, comprising:
capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
recognizing a pose or gesture of the user based on the captured depth image; and
generating the gesture information based on the recognized pose or gesture.
determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation based on the gesture information;
selecting a plurality of verbal commands associated with the one or more devices determined as being targeted;
receiving an audio signal including utterance by the user at a time when the user is taking the pose or the gesture; and
determining a device command for operating the one or more devices by performing speech recognition on the audio signal using the selected plurality of verbal commands.
2. The method of claim 1, wherein the at least part of the user comprises a hand or a forearm of the user.
3. The method of claim 1, wherein the depth camera is installed in an overhead console in the vehicle, the depth camera overlooking the user.
4. The method of claim 1, wherein the plurality of devices comprise at least a navigation system and an entertainment system in the vehicle.
5. The method of claim 1, wherein the gesture information indicates whether a hand or forearm of the user is located within a distance from the depth camera or beyond the distance from the depth camera, and wherein a first set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located within the distance, and wherein a second set of verbal commands are selected responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
6. The method of claim 5, wherein the first set of verbal commands is associated with performing navigation operations in the vehicle.
7. The method of claim 6, wherein the first set of verbal commands comprises a command for identifying or setting a point-of-interest for the navigation operations.
8. The method of claim 6, wherein the second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.
9. A command processing system for recognizing verbal commands, comprising:
a depth camera positioned in a vehicle and configured to capture at least one depth image by a depth camera, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user; and
a gesture recognition module coupled to the depth camera, the gesture recognition module configured to recognize the pose or gesture of the user based on the captured depth image and generate the gesture information based on the recognized pose or gesture;
a gesture recognition interface configured to generate the gesture information based on the recognized pose or gesture; and
a command extraction module configured to:
determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation based on the received gesture information;
select a plurality of verbal commands associated with the one or more devices determined as being targeted;
receive an audio signal including utterance by the user while the user is taking the pose or the gesture; and
determine a device command for operating the one or more devices by performing speech recognition on the audio signal using the selected plurality of verbal commands.
10. The command processing system of claim 9, wherein the at least part of the user comprises a hand or a forearm of the user.
11. The command processing system of claim 9, wherein the depth camera is installed in an overhead console in the vehicle overlooking the user.
12. The command processing system of claim 11, wherein the depth camera comprises a stereovision camera feeding captured images for processing into the at least one depth image.
13. The command processing system of claim 9, wherein the plurality of devices comprise at least a navigation system and an entertainment system in the vehicle.
14. The command processing system of claim 9, wherein the gesture information indicates whether a hand or forearm of the user is located within a distance from the depth camera or beyond the distance from the depth camera, and wherein the command extraction module selects a first set of verbal commands responsive to the gesture information indicating that the hand or the forearm is located within the distance and selects a second set of verbal commands responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
15. The command processing system of claim 14, wherein the first set of verbal commands is associated with performing navigation operations in the vehicle.
16. The command processing system of claim 14, wherein the first set of verbal commands comprise a command for identifying or setting a point-of-interest for the navigation operations.
17. The command processing system of claim 16, wherein the second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.
18. A non-transitory computer readable storage medium for recognizing verbal commands, the computer readable storage medium structured to store instructions, when executed, cause a processor to:
capture at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
recognize a pose or gesture of the user based on the captured depth image;
generating the gesture information based on the recognized pose or gesture;
determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation based on the received gesture information;
select a plurality of verbal commands associated with the one or more devices determined as being targeted;
receive an audio signal including utterance by the user while the user is taking the pose or the gesture; and
determine a device command for operating the one or more devices by performing speech recognition on the audio signal using the selected plurality of verbal commands.
19. The computer readable storage medium of claim 18, wherein the at least part of the user comprises a hand or a forearm of the user.
20. The computer readable storage medium of claim 18, wherein the depth camera is installed in an overhead console in the vehicle overlooking the user.
21. The computer readable storage medium of claim 18, wherein the plurality of devices comprise at least a navigation system and an entertainment system in the vehicle.
US13/524,351 2012-06-15 2012-06-15 Depth based context identification Active 2033-01-18 US9092394B2 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US13/524,351 US9092394B2 (en) 2012-06-15 2012-06-15 Depth based context identification
EP13804195.9A EP2862125B1 (en) 2012-06-15 2013-04-15 Depth based context identification
PCT/US2013/036654 WO2013188002A1 (en) 2012-06-15 2013-04-15 Depth based context identification
CN201380030981.8A CN104620257B (en) 2012-06-15 2013-04-15 Linguistic context identification based on depth
KR1020157001026A KR102061925B1 (en) 2012-06-15 2013-04-15 Depth based context identification
JP2015517255A JP6010692B2 (en) 2012-06-15 2013-04-15 Speech command recognition method and speech command recognition processing system
IL236089A IL236089A (en) 2012-06-15 2014-12-04 Depth based context identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/524,351 US9092394B2 (en) 2012-06-15 2012-06-15 Depth based context identification

Publications (2)

Publication Number Publication Date
US20130339027A1 true US20130339027A1 (en) 2013-12-19
US9092394B2 US9092394B2 (en) 2015-07-28

Family

ID=49756700

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/524,351 Active 2033-01-18 US9092394B2 (en) 2012-06-15 2012-06-15 Depth based context identification

Country Status (7)

Country Link
US (1) US9092394B2 (en)
EP (1) EP2862125B1 (en)
JP (1) JP6010692B2 (en)
KR (1) KR102061925B1 (en)
CN (1) CN104620257B (en)
IL (1) IL236089A (en)
WO (1) WO2013188002A1 (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140229174A1 (en) * 2011-12-29 2014-08-14 Intel Corporation Direct grammar access
US20140309878A1 (en) * 2013-04-15 2014-10-16 Flextronics Ap, Llc Providing gesture control of associated vehicle functions across vehicle zones
US20140379346A1 (en) * 2013-06-21 2014-12-25 Google Inc. Video analysis based language model adaptation
US20150029094A1 (en) * 2012-10-22 2015-01-29 Sony Corporation User interface with location mapping
US20150058003A1 (en) * 2013-08-23 2015-02-26 Honeywell International Inc. Speech recognition system
WO2015153835A1 (en) * 2014-04-03 2015-10-08 Honda Motor Co., Ltd Systems and methods for the detection of implicit gestures
US9342797B2 (en) 2014-04-03 2016-05-17 Honda Motor Co., Ltd. Systems and methods for the detection of implicit gestures
US20160140955A1 (en) * 2014-11-13 2016-05-19 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US20160179462A1 (en) * 2014-12-22 2016-06-23 Intel Corporation Connected device voice command support
CN105741312A (en) * 2014-12-09 2016-07-06 株式会社理光 Target object tracking method and device
CN106030697A (en) * 2014-02-26 2016-10-12 三菱电机株式会社 In-vehicle control apparatus and in-vehicle control method
CN106537489A (en) * 2014-07-22 2017-03-22 三菱电机株式会社 Method and system for recognizing speech including sequence of words
GB2547980A (en) * 2016-01-08 2017-09-06 Ford Global Tech Llc System and method for feature activation via gesture recognition and voice command
US9881610B2 (en) 2014-11-13 2018-01-30 International Business Machines Corporation Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
US9928734B2 (en) 2016-08-02 2018-03-27 Nio Usa, Inc. Vehicle-to-pedestrian communication systems
US9946906B2 (en) 2016-07-07 2018-04-17 Nio Usa, Inc. Vehicle with a soft-touch antenna for communicating sensitive information
US9963106B1 (en) 2016-11-07 2018-05-08 Nio Usa, Inc. Method and system for authentication in autonomous vehicles
GB2556191A (en) * 2016-09-28 2018-05-23 Lenovo Singapore Pte Ltd Gesture detection
US9984572B1 (en) 2017-01-16 2018-05-29 Nio Usa, Inc. Method and system for sharing parking space availability among autonomous vehicles
US10008201B2 (en) * 2015-09-28 2018-06-26 GM Global Technology Operations LLC Streamlined navigational speech recognition
US10031521B1 (en) 2017-01-16 2018-07-24 Nio Usa, Inc. Method and system for using weather information in operation of autonomous vehicles
US10074223B2 (en) 2017-01-13 2018-09-11 Nio Usa, Inc. Secured vehicle for user use only
US10234302B2 (en) 2017-06-27 2019-03-19 Nio Usa, Inc. Adaptive route and motion planning based on learned external and internal vehicle environment
US10249104B2 (en) 2016-12-06 2019-04-02 Nio Usa, Inc. Lease observation and event recording
US10286915B2 (en) 2017-01-17 2019-05-14 Nio Usa, Inc. Machine learning for personalized driving
US20190228769A1 (en) * 2018-01-22 2019-07-25 Toyota Jidosha Kabushiki Kaisha Information processing device and information processing method
US10369974B2 (en) 2017-07-14 2019-08-06 Nio Usa, Inc. Control and coordination of driverless fuel replenishment for autonomous vehicles
US10369966B1 (en) 2018-05-23 2019-08-06 Nio Usa, Inc. Controlling access to a vehicle using wireless access devices
US10410250B2 (en) 2016-11-21 2019-09-10 Nio Usa, Inc. Vehicle autonomy level selection based on user context
US10410064B2 (en) 2016-11-11 2019-09-10 Nio Usa, Inc. System for tracking and identifying vehicles and pedestrians
US10409382B2 (en) 2014-04-03 2019-09-10 Honda Motor Co., Ltd. Smart tutorial for gesture control system
US10464530B2 (en) 2017-01-17 2019-11-05 Nio Usa, Inc. Voice biometric pre-purchase enrollment for autonomous vehicles
US10466657B2 (en) 2014-04-03 2019-11-05 Honda Motor Co., Ltd. Systems and methods for global adaptation of an implicit gesture control system
US10471829B2 (en) 2017-01-16 2019-11-12 Nio Usa, Inc. Self-destruct zone and autonomous vehicle navigation
GB2545526B (en) * 2015-12-17 2020-03-11 Jaguar Land Rover Ltd In vehicle system and method for providing information regarding points of interest
US10606274B2 (en) 2017-10-30 2020-03-31 Nio Usa, Inc. Visual place recognition based self-localization for autonomous vehicles
US10635109B2 (en) 2017-10-17 2020-04-28 Nio Usa, Inc. Vehicle path-planner monitor and controller
US10692126B2 (en) 2015-11-17 2020-06-23 Nio Usa, Inc. Network-based system for selling and servicing cars
US10694357B2 (en) 2016-11-11 2020-06-23 Nio Usa, Inc. Using vehicle sensor data to monitor pedestrian health
US10708547B2 (en) 2016-11-11 2020-07-07 Nio Usa, Inc. Using vehicle sensor data to monitor environmental and geologic conditions
US10710633B2 (en) 2017-07-14 2020-07-14 Nio Usa, Inc. Control of complex parking maneuvers and autonomous fuel replenishment of driverless vehicles
US10720154B2 (en) * 2014-12-25 2020-07-21 Sony Corporation Information processing device and method for determining whether a state of collected sound data is suitable for speech recognition
US10717412B2 (en) 2017-11-13 2020-07-21 Nio Usa, Inc. System and method for controlling a vehicle using secondary access methods
US10837790B2 (en) 2017-08-01 2020-11-17 Nio Usa, Inc. Productive and accident-free driving modes for a vehicle
US10897469B2 (en) 2017-02-02 2021-01-19 Nio Usa, Inc. System and method for firewalls between vehicle networks
US10935978B2 (en) 2017-10-30 2021-03-02 Nio Usa, Inc. Vehicle self-localization using particle filters and visual odometry
US11568239B2 (en) * 2019-08-13 2023-01-31 Lg Electronics Inc. Artificial intelligence server and method for providing information to user
DE102022103066A1 (en) 2022-02-09 2023-08-10 Cariad Se Method for providing a geographically located electronic reminder note in a motor vehicle
US11904888B2 (en) * 2021-11-12 2024-02-20 Ford Global Technologies, Llc Controlling vehicle functions

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102012013503B4 (en) * 2012-07-06 2014-10-09 Audi Ag Method and control system for operating a motor vehicle
US20140122086A1 (en) * 2012-10-26 2014-05-01 Microsoft Corporation Augmenting speech recognition with depth imaging
WO2015026834A1 (en) * 2013-08-19 2015-02-26 Nant Holdings Ip, Llc Camera-to-camera interactions, systems and methods
JP2015153324A (en) * 2014-02-18 2015-08-24 株式会社Nttドコモ Information search device, information search method, and information search program
JP2016218852A (en) * 2015-05-22 2016-12-22 ソニー株式会社 Information processor, information processing method, and program
CN105957521B (en) * 2016-02-29 2020-07-10 青岛克路德机器人有限公司 Voice and image composite interaction execution method and system for robot
WO2017179335A1 (en) * 2016-04-11 2017-10-19 ソニー株式会社 Information processing device, information processing method, and program
EP3451335B1 (en) * 2016-04-29 2023-06-14 Vtouch Co., Ltd. Optimum control method based on multi-mode command of operation-voice, and electronic device to which same is applied
CN106373568A (en) * 2016-08-30 2017-02-01 深圳市元征科技股份有限公司 Intelligent vehicle unit control method and device
WO2018061743A1 (en) * 2016-09-28 2018-04-05 コニカミノルタ株式会社 Wearable terminal
JP2019191946A (en) * 2018-04-25 2019-10-31 パイオニア株式会社 Information processing device
US10872604B2 (en) 2018-05-17 2020-12-22 Qualcomm Incorporated User experience evaluation
US11305614B2 (en) 2018-10-11 2022-04-19 SK Hynix Inc. System for cooling storage device and smart vehicle including the same
CN110730115B (en) * 2019-09-11 2021-11-09 北京小米移动软件有限公司 Voice control method and device, terminal and storage medium
US11873000B2 (en) 2020-02-18 2024-01-16 Toyota Motor North America, Inc. Gesture detection for transport control

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040030560A1 (en) * 2002-06-28 2004-02-12 Masayuki Takami Voice control system
US20040141634A1 (en) * 2002-10-25 2004-07-22 Keiichi Yamamoto Hand pattern switch device
US20040193413A1 (en) * 2003-03-25 2004-09-30 Wilson Andrew D. Architecture for controlling a computer using hand gestures
US20050134117A1 (en) * 2003-12-17 2005-06-23 Takafumi Ito Interface for car-mounted devices
US7295904B2 (en) * 2004-08-31 2007-11-13 International Business Machines Corporation Touch gesture based interface for motor vehicle
US20110022393A1 (en) * 2007-11-12 2011-01-27 Waeller Christoph Multimode user interface of a driver assistance system for inputting and presentation of information
US20110115702A1 (en) * 2008-07-08 2011-05-19 David Seaberg Process for Providing and Editing Instructions, Data, Data Structures, and Algorithms in a Computer System
US20120075184A1 (en) * 2010-09-25 2012-03-29 Sriganesh Madhvanath Silent speech based command to a computing device
US8296151B2 (en) * 2010-06-18 2012-10-23 Microsoft Corporation Compound gesture-speech commands
US20130307771A1 (en) * 2012-05-18 2013-11-21 Microsoft Corporation Interaction and management of devices using gaze detection

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06131437A (en) 1992-10-20 1994-05-13 Hitachi Ltd Method for instructing operation in composite form
US6243683B1 (en) 1998-12-29 2001-06-05 Intel Corporation Video control of speech recognition
US7920102B2 (en) 1999-12-15 2011-04-05 Automotive Technologies International, Inc. Vehicular heads-up display system
US6624833B1 (en) 2000-04-17 2003-09-23 Lucent Technologies Inc. Gesture-based input interface system with shadow detection
US6804396B2 (en) 2001-03-28 2004-10-12 Honda Giken Kogyo Kabushiki Kaisha Gesture recognition system
US7775883B2 (en) 2002-11-05 2010-08-17 Disney Enterprises, Inc. Video actuated interactive environment
US7665041B2 (en) 2003-03-25 2010-02-16 Microsoft Corporation Architecture for controlling a computer using hand gestures
JP5130504B2 (en) 2003-07-02 2013-01-30 新世代株式会社 Information processing apparatus, information processing method, program, and storage medium
ATE382848T1 (en) * 2003-08-14 2008-01-15 Harman Becker Automotive Sys COMPUTER-AID SYSTEM AND METHOD FOR OUTPUTING INFORMATION TO A DRIVER OF A VEHICLE
JP2007121576A (en) * 2005-10-26 2007-05-17 Matsushita Electric Works Ltd Voice operation device
JP2007237785A (en) * 2006-03-06 2007-09-20 National Univ Corp Shizuoka Univ On-vehicle information presentation system
JP2008045962A (en) * 2006-08-14 2008-02-28 Nissan Motor Co Ltd Navigation device for vehicle
JP2008145676A (en) * 2006-12-08 2008-06-26 Denso Corp Speech recognition device and vehicle navigation device
JP2009025715A (en) * 2007-07-23 2009-02-05 Xanavi Informatics Corp In-vehicle device and speech recognition method
US8321219B2 (en) 2007-10-05 2012-11-27 Sensory, Inc. Systems and methods of performing speech recognition using gestures
JP4609527B2 (en) 2008-06-03 2011-01-12 株式会社デンソー Automotive information provision system
EP2304527A4 (en) 2008-06-18 2013-03-27 Oblong Ind Inc Gesture-based control system for vehicle interfaces
US20100057781A1 (en) * 2008-08-27 2010-03-04 Alpine Electronics, Inc. Media identification system and method
EP2219097A1 (en) 2009-02-13 2010-08-18 Ecole Polytechnique Federale De Lausanne (Epfl) Man-machine interface method executed by an interactive device
US20100274480A1 (en) 2009-04-27 2010-10-28 Gm Global Technology Operations, Inc. Gesture actuated point of interest information systems and methods
US9377857B2 (en) 2009-05-01 2016-06-28 Microsoft Technology Licensing, Llc Show body position
US9047256B2 (en) * 2009-12-30 2015-06-02 Iheartmedia Management Services, Inc. System and method for monitoring audience in response to signage
US8817087B2 (en) 2010-11-01 2014-08-26 Robert Bosch Gmbh Robust video-based handwriting and gesture recognition for in-car applications

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040030560A1 (en) * 2002-06-28 2004-02-12 Masayuki Takami Voice control system
US20040141634A1 (en) * 2002-10-25 2004-07-22 Keiichi Yamamoto Hand pattern switch device
US20040193413A1 (en) * 2003-03-25 2004-09-30 Wilson Andrew D. Architecture for controlling a computer using hand gestures
US20050134117A1 (en) * 2003-12-17 2005-06-23 Takafumi Ito Interface for car-mounted devices
US7295904B2 (en) * 2004-08-31 2007-11-13 International Business Machines Corporation Touch gesture based interface for motor vehicle
US20110022393A1 (en) * 2007-11-12 2011-01-27 Waeller Christoph Multimode user interface of a driver assistance system for inputting and presentation of information
US20110115702A1 (en) * 2008-07-08 2011-05-19 David Seaberg Process for Providing and Editing Instructions, Data, Data Structures, and Algorithms in a Computer System
US8296151B2 (en) * 2010-06-18 2012-10-23 Microsoft Corporation Compound gesture-speech commands
US20120075184A1 (en) * 2010-09-25 2012-03-29 Sriganesh Madhvanath Silent speech based command to a computing device
US20130307771A1 (en) * 2012-05-18 2013-11-21 Microsoft Corporation Interaction and management of devices using gaze detection

Cited By (91)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9487167B2 (en) * 2011-12-29 2016-11-08 Intel Corporation Vehicular speech recognition grammar selection based upon captured or proximity information
US20140229174A1 (en) * 2011-12-29 2014-08-14 Intel Corporation Direct grammar access
US9142071B2 (en) 2012-03-14 2015-09-22 Flextronics Ap, Llc Vehicle zone-based intelligent console display settings
US20160039430A1 (en) * 2012-03-14 2016-02-11 Autoconnect Holdings Llc Providing gesture control of associated vehicle functions across vehicle zones
US9323342B2 (en) * 2012-10-22 2016-04-26 Sony Corporation User interface with location mapping
US20150029094A1 (en) * 2012-10-22 2015-01-29 Sony Corporation User interface with location mapping
US20140309878A1 (en) * 2013-04-15 2014-10-16 Flextronics Ap, Llc Providing gesture control of associated vehicle functions across vehicle zones
US20140379346A1 (en) * 2013-06-21 2014-12-25 Google Inc. Video analysis based language model adaptation
US20150058003A1 (en) * 2013-08-23 2015-02-26 Honeywell International Inc. Speech recognition system
US9847082B2 (en) * 2013-08-23 2017-12-19 Honeywell International Inc. System for modifying speech recognition and beamforming using a depth image
CN106030697A (en) * 2014-02-26 2016-10-12 三菱电机株式会社 In-vehicle control apparatus and in-vehicle control method
US20160336009A1 (en) * 2014-02-26 2016-11-17 Mitsubishi Electric Corporation In-vehicle control apparatus and in-vehicle control method
US9881605B2 (en) * 2014-02-26 2018-01-30 Mitsubishi Electric Corporation In-vehicle control apparatus and in-vehicle control method
US9342797B2 (en) 2014-04-03 2016-05-17 Honda Motor Co., Ltd. Systems and methods for the detection of implicit gestures
WO2015153835A1 (en) * 2014-04-03 2015-10-08 Honda Motor Co., Ltd Systems and methods for the detection of implicit gestures
US10409382B2 (en) 2014-04-03 2019-09-10 Honda Motor Co., Ltd. Smart tutorial for gesture control system
US10466657B2 (en) 2014-04-03 2019-11-05 Honda Motor Co., Ltd. Systems and methods for global adaptation of an implicit gesture control system
US11243613B2 (en) 2014-04-03 2022-02-08 Honda Motor Co., Ltd. Smart tutorial for gesture control system
CN106537489A (en) * 2014-07-22 2017-03-22 三菱电机株式会社 Method and system for recognizing speech including sequence of words
US9626001B2 (en) * 2014-11-13 2017-04-18 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US9899025B2 (en) 2014-11-13 2018-02-20 International Business Machines Corporation Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
US20160140955A1 (en) * 2014-11-13 2016-05-19 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US9805720B2 (en) * 2014-11-13 2017-10-31 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US20160140963A1 (en) * 2014-11-13 2016-05-19 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US9632589B2 (en) * 2014-11-13 2017-04-25 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US20170133016A1 (en) * 2014-11-13 2017-05-11 International Business Machines Corporation Speech recognition candidate selection based on non-acoustic input
US9881610B2 (en) 2014-11-13 2018-01-30 International Business Machines Corporation Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
CN105741312A (en) * 2014-12-09 2016-07-06 株式会社理光 Target object tracking method and device
US10275214B2 (en) * 2014-12-22 2019-04-30 Intel Corporation Connected device voice command support
US20160179462A1 (en) * 2014-12-22 2016-06-23 Intel Corporation Connected device voice command support
US9811312B2 (en) * 2014-12-22 2017-11-07 Intel Corporation Connected device voice command support
US10720154B2 (en) * 2014-12-25 2020-07-21 Sony Corporation Information processing device and method for determining whether a state of collected sound data is suitable for speech recognition
US10008201B2 (en) * 2015-09-28 2018-06-26 GM Global Technology Operations LLC Streamlined navigational speech recognition
US11715143B2 (en) 2015-11-17 2023-08-01 Nio Technology (Anhui) Co., Ltd. Network-based system for showing cars for sale by non-dealer vehicle owners
US10692126B2 (en) 2015-11-17 2020-06-23 Nio Usa, Inc. Network-based system for selling and servicing cars
GB2545526B (en) * 2015-12-17 2020-03-11 Jaguar Land Rover Ltd In vehicle system and method for providing information regarding points of interest
GB2547980A (en) * 2016-01-08 2017-09-06 Ford Global Tech Llc System and method for feature activation via gesture recognition and voice command
US10304261B2 (en) 2016-07-07 2019-05-28 Nio Usa, Inc. Duplicated wireless transceivers associated with a vehicle to receive and send sensitive information
US11005657B2 (en) 2016-07-07 2021-05-11 Nio Usa, Inc. System and method for automatically triggering the communication of sensitive information through a vehicle to a third party
US10679276B2 (en) 2016-07-07 2020-06-09 Nio Usa, Inc. Methods and systems for communicating estimated time of arrival to a third party
US10388081B2 (en) 2016-07-07 2019-08-20 Nio Usa, Inc. Secure communications with sensitive user information through a vehicle
US9946906B2 (en) 2016-07-07 2018-04-17 Nio Usa, Inc. Vehicle with a soft-touch antenna for communicating sensitive information
US10262469B2 (en) 2016-07-07 2019-04-16 Nio Usa, Inc. Conditional or temporary feature availability
US10685503B2 (en) 2016-07-07 2020-06-16 Nio Usa, Inc. System and method for associating user and vehicle information for communication to a third party
US10699326B2 (en) 2016-07-07 2020-06-30 Nio Usa, Inc. User-adjusted display devices and methods of operating the same
US10032319B2 (en) 2016-07-07 2018-07-24 Nio Usa, Inc. Bifurcated communications to a third party through a vehicle
US10354460B2 (en) 2016-07-07 2019-07-16 Nio Usa, Inc. Methods and systems for associating sensitive information of a passenger with a vehicle
US9984522B2 (en) 2016-07-07 2018-05-29 Nio Usa, Inc. Vehicle identification or authentication
US10672060B2 (en) 2016-07-07 2020-06-02 Nio Usa, Inc. Methods and systems for automatically sending rule-based communications from a vehicle
US9928734B2 (en) 2016-08-02 2018-03-27 Nio Usa, Inc. Vehicle-to-pedestrian communication systems
GB2556191B (en) * 2016-09-28 2021-03-10 Lenovo Singapore Pte Ltd Gesture detection
GB2556191A (en) * 2016-09-28 2018-05-23 Lenovo Singapore Pte Ltd Gesture detection
US9963106B1 (en) 2016-11-07 2018-05-08 Nio Usa, Inc. Method and system for authentication in autonomous vehicles
US10031523B2 (en) 2016-11-07 2018-07-24 Nio Usa, Inc. Method and system for behavioral sharing in autonomous vehicles
US11024160B2 (en) 2016-11-07 2021-06-01 Nio Usa, Inc. Feedback performance control and tracking
US10083604B2 (en) 2016-11-07 2018-09-25 Nio Usa, Inc. Method and system for collective autonomous operation database for autonomous vehicles
US10410064B2 (en) 2016-11-11 2019-09-10 Nio Usa, Inc. System for tracking and identifying vehicles and pedestrians
US10694357B2 (en) 2016-11-11 2020-06-23 Nio Usa, Inc. Using vehicle sensor data to monitor pedestrian health
US10708547B2 (en) 2016-11-11 2020-07-07 Nio Usa, Inc. Using vehicle sensor data to monitor environmental and geologic conditions
US10410250B2 (en) 2016-11-21 2019-09-10 Nio Usa, Inc. Vehicle autonomy level selection based on user context
US11710153B2 (en) 2016-11-21 2023-07-25 Nio Technology (Anhui) Co., Ltd. Autonomy first route optimization for autonomous vehicles
US11922462B2 (en) 2016-11-21 2024-03-05 Nio Technology (Anhui) Co., Ltd. Vehicle autonomous collision prediction and escaping system (ACE)
US10970746B2 (en) 2016-11-21 2021-04-06 Nio Usa, Inc. Autonomy first route optimization for autonomous vehicles
US10515390B2 (en) 2016-11-21 2019-12-24 Nio Usa, Inc. Method and system for data optimization
US10949885B2 (en) 2016-11-21 2021-03-16 Nio Usa, Inc. Vehicle autonomous collision prediction and escaping system (ACE)
US10699305B2 (en) 2016-11-21 2020-06-30 Nio Usa, Inc. Smart refill assistant for electric vehicles
US10249104B2 (en) 2016-12-06 2019-04-02 Nio Usa, Inc. Lease observation and event recording
US10074223B2 (en) 2017-01-13 2018-09-11 Nio Usa, Inc. Secured vehicle for user use only
US10471829B2 (en) 2017-01-16 2019-11-12 Nio Usa, Inc. Self-destruct zone and autonomous vehicle navigation
US10031521B1 (en) 2017-01-16 2018-07-24 Nio Usa, Inc. Method and system for using weather information in operation of autonomous vehicles
US9984572B1 (en) 2017-01-16 2018-05-29 Nio Usa, Inc. Method and system for sharing parking space availability among autonomous vehicles
US10464530B2 (en) 2017-01-17 2019-11-05 Nio Usa, Inc. Voice biometric pre-purchase enrollment for autonomous vehicles
US10286915B2 (en) 2017-01-17 2019-05-14 Nio Usa, Inc. Machine learning for personalized driving
US10897469B2 (en) 2017-02-02 2021-01-19 Nio Usa, Inc. System and method for firewalls between vehicle networks
US11811789B2 (en) 2017-02-02 2023-11-07 Nio Technology (Anhui) Co., Ltd. System and method for an in-vehicle firewall between in-vehicle networks
US10234302B2 (en) 2017-06-27 2019-03-19 Nio Usa, Inc. Adaptive route and motion planning based on learned external and internal vehicle environment
US10710633B2 (en) 2017-07-14 2020-07-14 Nio Usa, Inc. Control of complex parking maneuvers and autonomous fuel replenishment of driverless vehicles
US10369974B2 (en) 2017-07-14 2019-08-06 Nio Usa, Inc. Control and coordination of driverless fuel replenishment for autonomous vehicles
US10837790B2 (en) 2017-08-01 2020-11-17 Nio Usa, Inc. Productive and accident-free driving modes for a vehicle
US11726474B2 (en) 2017-10-17 2023-08-15 Nio Technology (Anhui) Co., Ltd. Vehicle path-planner monitor and controller
US10635109B2 (en) 2017-10-17 2020-04-28 Nio Usa, Inc. Vehicle path-planner monitor and controller
US10606274B2 (en) 2017-10-30 2020-03-31 Nio Usa, Inc. Visual place recognition based self-localization for autonomous vehicles
US10935978B2 (en) 2017-10-30 2021-03-02 Nio Usa, Inc. Vehicle self-localization using particle filters and visual odometry
US10717412B2 (en) 2017-11-13 2020-07-21 Nio Usa, Inc. System and method for controlling a vehicle using secondary access methods
CN110070861A (en) * 2018-01-22 2019-07-30 丰田自动车株式会社 Information processing unit and information processing method
US20190228769A1 (en) * 2018-01-22 2019-07-25 Toyota Jidosha Kabushiki Kaisha Information processing device and information processing method
US10943587B2 (en) * 2018-01-22 2021-03-09 Toyota Jidosha Kabushiki Kaisha Information processing device and information processing method
US10369966B1 (en) 2018-05-23 2019-08-06 Nio Usa, Inc. Controlling access to a vehicle using wireless access devices
US11568239B2 (en) * 2019-08-13 2023-01-31 Lg Electronics Inc. Artificial intelligence server and method for providing information to user
US11904888B2 (en) * 2021-11-12 2024-02-20 Ford Global Technologies, Llc Controlling vehicle functions
DE102022103066A1 (en) 2022-02-09 2023-08-10 Cariad Se Method for providing a geographically located electronic reminder note in a motor vehicle

Also Published As

Publication number Publication date
KR20150044874A (en) 2015-04-27
IL236089A (en) 2016-02-29
IL236089A0 (en) 2015-02-01
EP2862125A4 (en) 2016-01-13
WO2013188002A1 (en) 2013-12-19
EP2862125A1 (en) 2015-04-22
JP2015526753A (en) 2015-09-10
JP6010692B2 (en) 2016-10-19
US9092394B2 (en) 2015-07-28
CN104620257B (en) 2017-12-12
CN104620257A (en) 2015-05-13
EP2862125B1 (en) 2017-02-22
KR102061925B1 (en) 2020-01-02

Similar Documents

Publication Publication Date Title
US9092394B2 (en) Depth based context identification
EP3627180B1 (en) Sensor calibration method and device, computer device, medium, and vehicle
CN102023703B (en) Combined lip reading and voice recognition multimodal interface system
US8818716B1 (en) System and method for gesture-based point of interest search
US20180033429A1 (en) Extendable vehicle system
US11289074B2 (en) Artificial intelligence apparatus for performing speech recognition and method thereof
US20190120649A1 (en) Dialogue system, vehicle including the dialogue system, and accident information processing method
US9421866B2 (en) Vehicle system and method for providing information regarding an external item a driver is focusing on
CN113302664A (en) Multimodal user interface for a vehicle
JP6604151B2 (en) Speech recognition control system
US20230102157A1 (en) Contextual utterance resolution in multimodal systems
US20200152203A1 (en) Agent device, agent presentation method, and storage medium
US11450316B2 (en) Agent device, agent presenting method, and storage medium
US11810575B2 (en) Artificial intelligence robot for providing voice recognition function and method of operating the same
US10655981B2 (en) Method for updating parking area information in a navigation system and navigation system
WO2017015882A1 (en) Navigation device and navigation method
CN114175114A (en) System and method for identifying points of interest from inside an autonomous vehicle
JP6385624B2 (en) In-vehicle information processing apparatus, in-vehicle apparatus, and in-vehicle information processing method
US11057734B2 (en) Geospecific information system and method
KR20100062413A (en) Method and apparatus for controling speech recognition of telematics apparatus
JP2020166073A (en) Voice interface system, control method, and program
US20240037956A1 (en) Data processing system, data processing method, and information providing system
JP2006030908A (en) Voice recognition device for vehicle and moving body
JP2022103553A (en) Information providing device, information providing method, and program
CN113168833A (en) Method for operating an interactive information system of a vehicle and vehicle

Legal Events

Date Code Title Description
AS Assignment

Owner name: HONDA MOTOR CO., LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YAMAMOTO, STUART;VAGHEFINAZARI, PEDRAM;REEL/FRAME:028413/0381

Effective date: 20120607

Owner name: EDGE3 TECHNOLOGIES, LLC, ARIZONA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DOKOR, TAREK EL;HOLMES, JAMES;CLUSTER, JORDAN;REEL/FRAME:028413/0407

Effective date: 20120605

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8