US20170236450A1 - Apparatus for bi-directional sign language/speech translation in real time and method - Google Patents

Apparatus for bi-directional sign language/speech translation in real time and method Download PDF

Info

Publication number
US20170236450A1
US20170236450A1 US15/188,099 US201615188099A US2017236450A1 US 20170236450 A1 US20170236450 A1 US 20170236450A1 US 201615188099 A US201615188099 A US 201615188099A US 2017236450 A1 US2017236450 A1 US 2017236450A1
Authority
US
United States
Prior art keywords
sign
speech
user
real time
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/188,099
Other versions
US10089901B2 (en
Inventor
Woo Sug Jung
Hwa Suk Kim
Jun Ki JEON
Sun Joong Kim
Hyun Woo Lee
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hyundai Motor Co
Kia Corp
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Assigned to ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE reassignment ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JEON, JUN KI, JUNG, WOO SUG, KIM, HWA SUK, KIM, SUN JOONG, LEE, HYUN WOO
Publication of US20170236450A1 publication Critical patent/US20170236450A1/en
Application granted granted Critical
Publication of US10089901B2 publication Critical patent/US10089901B2/en
Assigned to HYUNDAI MOTOR COMPANY, KIA CORPORATION reassignment HYUNDAI MOTOR COMPANY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/009Teaching or communicating with deaf persons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/014Hand-worn input/output arrangements, e.g. data gloves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • G06K9/00355
    • G06T7/0081
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G06T2207/20144
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L2021/065Aids for the handicapped in understanding

Definitions

  • One or more example embodiments relate to an apparatus for bi-directional sign language/speech translation in real time and method, and more particularly, to an apparatus for bi-directional sign language/speech translation in real time and method that automatically translates from a sign to a speech or from a speech to a sign in real time to solve an issue of existing unidirectional or fragmentary sign language translation technology.
  • a subtitle service includes closed captioning (CC), subtitles for hard of hearing (HoH), and subtitles for deaf and hard of hearing (SDF) which all help a hearing-impaired person not to experience alienation in everyday life.
  • the subtitle service is classified into two types, one that adds subtitles and dubbings in various languages to video contents, and the other that changes a speech into a real-time subtitle by a text interpreter and provides the real-time subtitle to a hearing-impaired person through a predetermined terminal via a server.
  • the subtitle service may be applicable to broadcasts or performances where information is transferred in one direction, but may be difficult to apply to a service requiring bidirectional information exchange, such as everyday life. Further, the subtitle service provides pre-processed subtitle data for provided contents, and thus a real-time service may not be guaranteed.
  • a video relay service allows a hearing-impaired person to transfer sign language information via a sign language interpreter being connected through a voice over Internet protocol (VoIP) video call service, the sign language interpreter to transfer a speech to a person who is not hearing-impaired and also to provide the interpretation service in a reverse order.
  • VoIP voice over Internet protocol
  • the video relay service has restrictions in that connection with the sign language interpreter over a network is needed to receive the service, and a relatively long time is required to translate a sign into a speech or a speech into a sign via the sign language interpreter, and thus is difficult to apply to everyday conversions.
  • a gesture-based sign language translation service is a technology that recognizes a sign, for example, gesture, of a hearing-impaired person and converts the sign into a speech, and changes a speech of a person who is not hearing-impaired into a sign, thereby alleviating an inconvenience of the hearing-impaired person.
  • the gesture-based sign language translation may not provide simultaneous bidirectional sign-speech translation and situation-based sign language translation, and thus the speed and accuracy of sign language translation may decrease.
  • a method for real-time automatic speech-sign translation is provided herein to alleviate inconveniences that people with hard-hearing experience in everyday life.
  • An aspect provides an apparatus for bi-directional sign language/speech translation in real time and method that may automatically translate a sign into a speech or a speech into a sign using a body-attached device, for example, smart glasses, to solve existing issues and alleviate an inconvenience of a hearing-impaired person in everyday life.
  • Another aspect also provides a gesture-based apparatus for bi-directional sign language/speech translation in real time and method that may translate a sign to a speech, translate a speech to a sign through a separate process, and collect/analyze location and surrounding environment information of a user, thereby improving the speed and accuracy of sign language translation.
  • an apparatus for bi-directional sign language/speech translation in real time including a pattern analyzer configured to analyze a used pattern of sign language by user, a speech-sign outputter configured to recognize a speech externally made through a microphone and output a sign corresponding to the speech, and a sign-speech outputter configured to recognize a sign sensed through a camera and output a speech corresponding to the sign.
  • the speech-sign outputter may include a speech recognizer configured to recognize the speech through the microphone and remove noise from the speech, an index generator configured to generate a sign index to translate into a sign corresponding to the noise-removed speech, and a sign outputter configured to output the sign corresponding to the speech based on the generated sign index.
  • the index generator may include a speech-text converter configured to convert the recognized speech into a text using a predefined sign language dictionary, and an index determiner configured to determine a sign index with respect to the text based on the text and the used pattern of sign language by the user.
  • the pattern analyzer may be configured to analyze the used pattern of sign language by the user by analyzing at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user.
  • the sign outputter may include a display mode controller configured to control an output based on a display mode to display one of a sign and a text corresponding to the recognized speech, and an outputter configured to output a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
  • the display mode controller may be configured to control the display mode of the outputter based on a sign display event or a generation period of the sign mapped to the sign index.
  • the outputter may be configured to synchronize a sign or a text to the sign index based on information transferred based on the display mode and output the synchronized sign or text on a display of the apparatus for bi-directional sign language/speech translation in real time.
  • the sign-speech outputter may include a sign recognizer configured to recognize the sign sensed through the camera, an index generator configured to generate a sign index to translate into the speech corresponding to the recognized sign, and a speech outputter configured to output the speech corresponding to the sign based on the generated sign index.
  • the sign recognizer may include a wearing sensor configured to sense whether the user is wearing gesture gloves, and a sign collector configured to collect the sign based on whether the user is wearing the gesture gloves.
  • the sign collector may be configured to collect the sign by removing a background from a color image acquired by the camera when the user is not wearing the gesture gloves.
  • the sign collector may be configured to collect the sign based on a finger motion of the user collected from the camera when the user is wearing the gesture gloves.
  • the index generator may include an index determiner configured to determine the sign index with respect to the recognized sign using a predefined sign language dictionary, a sign-text converter configured to convert the recognized sign to a text based on the determined sign index, and a sentence generator configured to generate a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
  • the speech outputter may be configured to output a speech corresponding to the sentence associated with the sign with respect to the text.
  • a method for bi-directional sign language/speech translation in real time performed by an apparatus for bi-directional sign language/speech translation in real time, the method including analyzing a used pattern of sign language by user who uses the apparatus for bi-directional sign language/speech translation in real time, recognizing a sign or speech externally made by the user through a camera or a microphone, identifying the sign or speech of the user recognized through the camera or the microphone, and outputting a speech corresponding to the sign or a sign corresponding to the speech through a different translation path based on a result of the identifying.
  • the outputting may include removing noise from the speech when the speech of the user is identified, generating a sign index to translate into a sign corresponding to the recognized speech, and outputting the sign corresponding to the speech based on the generated sign index.
  • the generating may include converting the recognized speech into a text using a predefined sign language dictionary, and determining a sign index with respect to the text based on the text and the used pattern of sign language by the user.
  • the outputting of the sign corresponding to the speech may include controlling whether to display a sign or a text based on a display mode to display one of a sign and a text corresponding to the recognized speech, and outputting a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
  • the outputting of the speech corresponding to the sign may include recognizing the sign when the sign of the user is identified, generating an sign index to translate into the speech corresponding to the recognized sign, and outputting the speech corresponding to the sign based on the generated sign index.
  • the recognizing may include sensing whether the user is wearing gesture gloves, and collecting the sign based on whether the user is wearing the gesture gloves.
  • the generating may include determining the sign index with respect to the recognized sign using a predefined sign language dictionary, converting the recognized sign to a text based on the determined sign index, and generating a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
  • FIG. 1 is a diagram illustrating an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment
  • FIG. 2 is a block diagram illustrating a speech-sign outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment
  • FIG. 3 is a block diagram illustrating a sign-speech outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment
  • FIG. 4 is a block diagram illustrating an operation of analyzing a used pattern of sign language by user according to an example embodiment
  • FIG. 5 is a diagram illustrating an operation of an apparatus for bi-directional sign language/speech translation in real time interoperating with a network according to an example embodiment.
  • FIG. 1 is a diagram illustrating an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • an apparatus 101 for bi-directional sign language/speech translation in real time may translate a speech into a sign or a sign into a speech in real time, and output a result of translation to provide a convenience to a user who uses a sign language.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may be an apparatus to perform sign-speech translation with a head-mounted display (HMD).
  • the apparatus 101 for bi-directional sign language/speech translation in real time may include a smart terminal.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may recognize a speech or sign externally made by a user through a microphone 106 or a camera 107 .
  • the apparatus 101 for bi-directional sign language/speech translation in real time may identify the recognized speech or sign of the user, and translate the speech into a sign or the sign into a speech based on a result of the identifying.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may perform translation through a different translation path based on the result of the identifying.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may include a pattern analyzer 102 , a speech-sign outputter 103 , and a sign-speech outputter 104 .
  • the speech-sign outputter 103 may translate the speech into a sign, and output a result of translation, for example, the sign.
  • the sign-speech outputter 104 may translate the sign into a speech, and output a result of translation, for example, the speech.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may perform duplex translation from a speech into a sign or from a sign into a speech by separately performing a process of translating a speech into a sign and a process of translating a sign into a speech.
  • the operation of translating a speech into a sign and the operation of translating a sign into a speech will be described in detail with reference to FIGS. 2 and 3 , respectively.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may use a body-attached device 105 to recognize the speech or sign externally made by the user.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may interoperate with the body-attached device 105 , or operate in the body-attached device 105 depending on situations.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may be configured separately from the body-attached device 105 to translate a speech into a sign or a sign into a speech by interoperating with the body-attached device 105 .
  • the apparatus 101 for bi-directional sign language/speech translation in real time may be configured to be included in the body-attached device 105 , in detail, to operate in the body-attached device 105 , to translate a speech into a sign or a sign into a speech in real time.
  • the body-attached device 105 may include a microphone, a speaker, a display device, and a camera, and may be implemented in a wearable form to be attached to a body of the user.
  • the body-attached device 105 may be implemented as a device attachable to a body of the user, for example, an eyewear type device or a watch type device.
  • the pattern analyzer 102 may analyze an used pattern of sign language of the user to improve the accuracy and speed of real-time speech-sign or sign-speech translation.
  • the pattern analyzer 102 may analyze the used pattern of sign language by the user.
  • the used pattern of sign language by the user may include, for example, location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, and a behavior pattern of the user.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may translate a speech or a sign based on the analyzed sign use pattern of the user, thereby minimizing unnecessary sign translation and improving the accuracy and speed of translation.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may predict information to be used for sign-speech translation by analyzing the life pattern of the user and inputting/analyzing the location information and the surrounding environment information, thereby guaranteeing the accuracy of translation content and real-time sign-speech translation.
  • a configuration for the foregoing will be described in detail with reference to FIG. 4 .
  • the apparatus 101 for bi-directional sign language/speech translation in real time may perform duplex sign-speech and speech-sign translation in real time with the body-attached device 105 to solve an issue of unidirectional or fragmentary sign language translation technology.
  • the apparatus 101 for bi-directional sign language/speech translation in real time may include a translation path for sign-speech translation and a translation path for speech-sign translation separately, thereby alleviating an inconvenience in the existing unidirectional or fragmentary sign language translation technology.
  • FIG. 2 is a block diagram illustrating a speech-sign outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • an apparatus 201 for bi-directional sign language/speech translation in real time may include a pattern analyzer 202 , a speech-sign outputter 203 , and a sign-speech outputter 204 .
  • the speech-sign outputter 203 may translate a speech into a sign, and output a result of translation, for example, the sign.
  • the sign-speech outputter 204 may translate a sign into a speech, and output a result of translation, for example, the speech.
  • a process of translating a speech into a sign and outputting a result of translation, for example, the sign will be described in detail based on the speech-sign outputter 203 .
  • the pattern analyzer 202 may analyze a used pattern of sign language by user who is wearing the apparatus 201 for bi-directional sign language/speech translation in real time.
  • the pattern analyzer 202 may analyze the used pattern of sign language including at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user.
  • the configuration of the pattern analyzer 202 will be described in detail with reference to FIG. 4 .
  • the speech-sign outputter 203 may include a speech recognizer 205 , an index generator 206 , and a sign outputter 209 to perform an operation of translating a speech into a sign and outputting the sign.
  • the speech recognizer 205 may recognize a speech through a microphone.
  • the speech recognizer 205 may recognize the speech collected through the microphone included in a body-attached device.
  • the body-attached device may collect a speech externally made through the microphone, and transfer the collected speech to the speech recognizer 205 .
  • the speech recognizer 205 may recognize the speech received from the body-attached device.
  • the speech recognizer 205 may remove noise from the recognized speech.
  • the speech recognized through the microphone may be a sound externally made, and may include a sound for speech-sign translation and ambient noise.
  • the speech recognizer 205 may remove the noise included in the speech recognized through the microphone to extract only the speech for speech-sign translation.
  • the speech recognizer 205 may remove the ambient noise included in the speech.
  • the ambient noise may include all sounds occurring around the user, for example, a subway sound, an automobile horn sound, a step sound, and a music sound.
  • the speech recognizer 205 may separate a speech of a user other than the user who requests speech-sign translation from the noise-removed speech.
  • the speech recognized through the microphone may include ambient noise and a speech of a third party located adjacent to the user, as described above.
  • the speech recognizer 205 may separate the speech of the third party, except for the user, from the noise-removed sound, thereby increasing the speech recognition accuracy for speech-sign translation.
  • the speech recognizer 205 may generate a speech including only an intrinsic sound of the user who requests translation, by filtering out the ambient noise and the speech of the third party in the speech recognized through the microphone.
  • the index generator 206 may generate a sign index to translate into the sign corresponding to the speech from which the noise and the speech of the third party are removed.
  • the sign index may be information to be used to generate an image associated with the sign based on the speech of the user.
  • the index generator 206 may include a speech-text converter 207 , and an index determiner 208 .
  • the speech-text converter 207 may convert the recognized speech into a text using a predefined sign language dictionary.
  • the speech-text converter 207 may perform an operation of converting the speech into the text using a text-to-speech (TTS) engine.
  • TTS text-to-speech
  • the speech may be converted into the text and a sign corresponding to the text may be output since a worldwide sign language dictionary is defined based on a text, and the speech may be converted into the text to transfer, to the user, an image associated with the sign and information with respect to the speech using the text.
  • the sign language dictionary may be used to minimize an amount of data to be transmitted to translate a speech into a sign, thereby improving the speed at which an image associated with the sign is generated.
  • the index determiner 208 may determine a sign index with respect to the text based on the text and the used pattern of sign language by the user.
  • the index determiner 208 may utilize the text and the used pattern of sign language in which the location information, the surrounding environment information, and the life pattern of the user are analyzed, thereby improving the speed and accuracy for sign language translation.
  • the index determiner 208 may determine the sign index to generate the image associated with the sign through a pre-embedded sign language dictionary based on the speech corresponding to the text.
  • the index determiner 208 may determine the sign index corresponding to the speech based on more generalized information by determining a speech-text-sign index based on the sign language dictionary.
  • the sign outputter 209 may receive the sign index form the index generator 206 , and output the sign corresponding to the speech based on the received sign index.
  • the sign outputter 209 may provide a sign or text corresponding to content of the speech to the user based on the sign index generated based on the speech.
  • the speech outputter 209 may include a display mode controller 210 , and an outputter 211 .
  • the display mode controller 210 may control an output based on a display mode to display one of the sign and the text corresponding to the recognized speech.
  • the display mode may include a sign display mode and a text display mode.
  • the display mode controller 210 may select the display mode to display one of the sign and the text corresponding to the speech.
  • the display mode controller 210 may control the display mode of the outputter 211 based on a sign display event or a generation period of a sign mapped to the sign index.
  • the display mode controller 210 may transfer, to the outputter 211 , the text or the image associated with the sign corresponding to the sign index based on the selected display mode.
  • the outputter 211 may output the sign mapped to the generated sign index or the text corresponding to the sign mapped to the generated sign index based on the information received based on the display mode selected by the display mode controller 210 .
  • the outputter 211 may represent a speech using a sign which is convenient for a hearing-impaired person, and also display the speech using a text which has a relatively wide expression range when compared to a sign.
  • the outputter 211 may display the sign or the text corresponding to the speech in view of information expression limitation occurring when the image associated with the sign is output to the user.
  • the outputter 211 may display the sign translated from the speech through a display of the body-attached device interoperating with the apparatus 201 for bi-directional sign language/speech translation in real time.
  • the outputter 211 may display the image associated with the sign corresponding to the sign index on smart glasses.
  • the smart glasses are disposed adjacent to eyes of the user.
  • the outputter 211 may represent the speech corresponding to the sign index using the text, and output the text on the smart glasses.
  • the operation of generating the image associated with the sign based on the sign index may be omitted.
  • information may be transferred faster than the process of transferring information using the image associated with the sign.
  • the outputter 211 may synchronize the sign or the text with the sign index, thereby alleviating a user inconvenience occurring in the sign language translation process.
  • FIG. 3 is a block diagram illustrating a sign-speech outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • an apparatus 301 for bi-directional sign language/speech translation in real time may include a pattern analyzer 302 , a speech-sign outputter 303 , and a sign-speech outputter 304 .
  • the speech-sign outputter 303 may translate a speech into a sign, and output a result of translation, for example, the sign.
  • the sign-speech outputter 304 may translate a sign into a speech, and output a result of translation, for example, the speech.
  • a process of translating a sign into a speech and outputting a result of translation, for example, the speech will be described in detail based on the sign-speech outputter 304 .
  • the pattern analyzer 302 may analyze a used pattern of sign language by user who is wearing the apparatus 301 for bi-directional sign language/speech translation in real time.
  • the pattern analyzer 302 may analyze the used pattern of sign language including at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user.
  • the configuration of the pattern analyzer 302 will be described in detail with reference to FIG. 4 .
  • the sign-speech outputter 304 may include a sign recognizer 305 , an index generator 308 , and a speech outputter 312 to perform an operation of translating a sign into a speech and outputting the speech.
  • the sign recognizer 305 may recognize a sign through a camera.
  • the sign recognizer 305 may recognize the sign collected through the camera included in a body-attached device.
  • the body-attached device may collect a sign externally sensed through the camera, and transfer the collected sign to the sign recognizer 305 .
  • the sign recognizer 305 may recognize the sign received from the body-attached device.
  • the sign recognizer 305 may collect the sign from a color image or based on a finger motion, for example, a gesture, of the user depending on a sign collecting scheme.
  • the sign recognizer 305 may recognize the sign using the color image or the gesture based on whether gesture gloves configured to directly generate a sign are used.
  • the gesture gloves may be a device worn on hands of the user to enable the camera to recognize the gesture of the user.
  • the gesture gloves may include Magic Gloves.
  • the sign recognizer 305 may include a wearing sensor 306 , and a sign collector 307 .
  • the wearing sensor 306 may sense whether the user is wearing the gesture gloves.
  • the wearing sensor 306 may sense whether the user is wearing the gesture gloves configured to input sign information, to control an operating state of a sign language translation apparatus to recognize a sign.
  • a system overhead may increase, and the speed and accuracy of translation may decrease.
  • whether the user is wearing the gesture gloves may be sensed to prevent extraction of a sign from the color image when the user is wearing the gesture gloves.
  • the sign collector 307 may collect the sign based on whether the user is wearing the gesture gloves. In detail, in a case in which the user is not wearing the gesture gloves, the sign collector 307 may collect the sign by removing a background from the color image acquired by the camera. The sign collector 307 may receive the color image from a depth sensor or an RGB camera. The sign collector 307 may remove unnecessary information which is unrelated to the sign, for example, the background, from the color image. That is, in the case in which the user is not wearing the gesture gloves, the sign collector 307 may collect the sign by extracting information related to the sign from the color image.
  • the sign collector 307 may collect the sign based on the finger motion of the user collected from the camera.
  • the sign collector 307 may receive information related to a hand motion or a finger motion of a human using a device such as Magic Gloves. That is, the sign collector 307 may directly collect a gesture of the user as the information related to the sign using the gesture gloves in the case in which the user is wearing the gesture gloves.
  • the index generator 308 may generate a sign index to translate into the speech corresponding to the recognized sign.
  • the index generator 308 may include an index determiner 309 , a sign-text converter 310 , and a sentence generator 311 .
  • the index determiner 309 may determine the sign index with respect to the recognized sign using a predefined sign language dictionary.
  • the index determiner 309 may recognize the sign of the user, and determine the sign index to generate a text using the sign language dictionary stored in a form of a database.
  • the sign-text converter 310 may convert the sign into a text based on the determined sign index.
  • the sentence generator 311 may generate a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
  • the sign may have a relatively narrow expression range when compared to a method of expressing using a speech or a text, and thus there is a limitation to expression using a sentence.
  • a function to automatically generate a sentence based on a text may be provided.
  • the sentence generator 311 may generate the sentence associated with the sign by performing a keyword combination with everyday conversation sentences stored through the pattern analyzer 302 based on the text generated by the sign-text converter 310 .
  • the sentence generator 311 may predict information to be used for sign-speech translation by analyzing a life pattern of the user, and inputting/analyzing location information and surrounding environment information, thereby guaranteeing the accuracy of translation content and real-time sign-speech translation.
  • the speech outputter 312 may convert the sentence generated by the sentence generator 311 into a speech, and transfer the speech to the user.
  • the speech outputter 312 may apply a sentence-speech conversion method such as a TTS engine to output the speech corresponding to the sentence.
  • the speech outputter 312 may convert a digital speech generated by the sentence-speech conversion method into an analog speech through digital to audio (D/A) conversion, and output the analog speech to the user.
  • D/A digital to audio
  • smart glasses may be used as a body-attached device to transfer translated sign information to the user.
  • the smart glasses since the smart glasses are a device to be attached to a body, the smart glasses may have a display of a restricted size to transfer sign information. Further, when a text and a sign are projected simultaneously on the display of the smart glasses, the user may experience confusion and fatigue rapidly.
  • a type of information to be projected on the smart glasses for example, a text or a sign
  • a user input for example, a sign or a speech
  • a result may be output accordingly.
  • user convenience-related information for example, sign information generation speed and period that the user may feel most comfortable with, may be set, and a result may be output accordingly.
  • FIG. 4 is a block diagram illustrating an operation of analyzing a used pattern of sign language by user according to an example embodiment.
  • the pattern analyzer 102 included in the apparatus 101 for bi-directional sign language/speech translation in real time may analyze a used pattern of sign language by user.
  • the user is likely to be in a similar space at a similar time.
  • the used pattern of sign language by the user may be analyzed in view of spatial/temporal correlation of the user.
  • the pattern analyzer 102 may infer a life pattern to be observed in the near future by analyzing a past life pattern of the user.
  • the pattern analyzer 102 may prepare information to be used for sign translation based on a result of inference in advance, thereby increasing the speed of sign translation and guaranteeing real-time sign translation. Further, the pattern analyzer 102 may increase the accuracy of sign translation by correcting an uncertain speech signal based on related information.
  • the pattern analyzer 102 may include a time information collector 401 , a life pattern analyzer 402 , a location information collector 403 , a surrounding environment information collector 404 , a surrounding environment information analyzer 405 , a sign category information generator 406 , a sign category keyword comparator 407 , and a sign category database (DB) 408 .
  • a time information collector 401 may include a time information collector 401 , a life pattern analyzer 402 , a location information collector 403 , a surrounding environment information collector 404 , a surrounding environment information analyzer 405 , a sign category information generator 406 , a sign category keyword comparator 407 , and a sign category database (DB) 408 .
  • DB sign category database
  • the life pattern analyzer 402 may accumulate and manage time information input through the time information collector 401 and current location information of the user input through the location information collector 403 .
  • the life pattern analyzer 402 may analyze a past behavior pattern of the user based on accumulated information of the time information and the current location information of the user, and infer a life pattern expected to be observed in the near future based on the past behavior pattern.
  • the surrounding environment information analyzer 405 may infer a type of a current space in which the user is located by analyzing image and sound information of the current space collected from the surrounding environment information collector 404 . For example, in a case in which “coffee” is extracted as a keyword by analyzing the image and sound information input from the surrounding environment information collector 404 , the surrounding environment information analyzer 405 may infer that the current space corresponds to a café, a coffee shop, or a teahouse.
  • the sign category information generator 406 may extract a sign information category to be used in the near future by analyzing the past life pattern and the surrounding environment information of the user analyzed by the life pattern analyzer 402 and the surrounding environment information analyzer 405 .
  • the sign category information generator 406 may generate a category of ordering a coffee at a café.
  • the generated category information may be transmitted to the sign outputter 209 of FIG. 2 or the speech outputter 312 of FIG. 3 through the sign category DB 408 , converted into a speech, a text, or a sign, and transferred to the user or a speaker.
  • the sign category keyword comparator 407 may perform an operation of solving an issue resulting from an error in a sign category.
  • a sign category to be inferred may include an error.
  • the sign category keyword comparator 407 may be provided.
  • the sign category keyword comparator 407 may compare the information input from the speech-text converter 207 of FIG. 2 or the sign-text converter 310 of FIG. 3 with the inferred sign category information. In a case in which the input information does not match the inferred sign category, the sign category keyword comparator 407 may determine the inferred sign category to be incorrect. The sign category keyword comparator 407 may transmit a signal related to the determination to the sign outputter 209 or the speech outputter 312 , and block a speech, a text, or a sign image to be output from the sign category DB 408 .
  • the sign outputter 209 or the speech outputter 312 receiving the signal related to the determination may block the information received from the sign category DB 408 , and output information related to at least one of the speech, the text, or the sign.
  • the information collected from the location information collector 403 and the surrounding environment information collector 404 may be reported continuously through a network connected to the apparatus 101 for bi-directional sign language/speech translation in real time.
  • FIG. 5 is a diagram illustrating an operation of an apparatus for bi-directional sign language/speech translation in real time connected to a network according to an example embodiment.
  • an apparatus 502 for bi-directional sign language/speech translation in real time may be connected to a network to share information with a user behavior analyzing server 501 which manages image and sound information of a space in which a user is currently located and current location information of the user.
  • the user behavior analyzing server 501 may perform a method for bi-directional sign language/speech translation in real time.
  • the user behavior analyzing server 501 , the apparatus 502 for bi-directional sign language/speech translation in real time, the smart glasses 503 , and the gesture gloves 504 may be connected to one another through a personal area network (PAN).
  • PAN personal area network
  • the apparatus 502 for bi-directional sign language/speech translation in real time for example, a smart terminal, may form a body area network (BAN) or a PAN with the smart glasses 503 and the gesture gloves 504 , for example, body-attached devices.
  • the apparatus 502 for bi-directional sign language/speech translation in real time may be connected to the user behavior analyzing server 501 through the Internet, for example, wired/wireless network.
  • the user behavior analyzing server 501 may analyze a past behavior pattern of the user, and transmit the analyzed information to the apparatus 502 for bi-directional sign language/speech translation in real time, for example, the smart terminal.
  • Information collected by the location information collector 403 and the surrounding environment information collector 404 of FIG. 4 may be retained in the apparatus 502 for bi-directional sign language/speech translation in real time, for example, the smart terminal, and also transmitted to the user behavior analyzing server 501 .
  • the apparatus 502 for bi-directional sign language/speech translation in real time may not store and analyze a large volume of data due to limited hardware resources such as a memory and a processor.
  • a long-term user past behavior analysis may be performed by the user behavior analyzing server 501
  • a short-term user past behavior analysis may be performed by the apparatus 502 for bi-directional sign language/speech translation in real time.
  • Past behavior information analyzed by the user behavior analyzing server 501 may vary depending on user settings. However, it may be basically set to transfer user past behavior analysis information corresponding to a day to the apparatus 502 for bi-directional sign language/speech translation in real time.
  • an apparatus for bi-directional sign language/speech translation in real time and method may alleviate an inconvenience that a hearing-impaired person experiences during communication in everyday life, thereby inducing the hearing-impaired person to live a normal social life and reducing a social cost to be used to solve issues caused by hearing impairment.
  • an apparatus for bi-directional sign language/speech translation in real time and method may be applicable to a wide range of environments where normal speech communication is impossible, for example, an environment where military operations are carried out in silence, or an environment in which communication is impossible due to serious noise.
  • the methods according to the above-described example embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations of the above-described example embodiments.
  • the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
  • the program instructions recorded on the media may be those specially designed and constructed for the purposes of example embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts.

Abstract

Provided is an apparatus for bi-directional sign language/speech translation in real time and method that may automatically translate a sign into a speech or a speech into a sign in real time by separately performing an operation of recognizing a speech externally made through a microphone and outputting a sign corresponding to the speech, and an operation of recognizing a sign sensed through a camera and outputting a speech corresponding to the sign.

Description

    CROSS-REFERENCE TO RELATED APPLICATION(S)
  • This application claims the priority benefit of Korean Patent Application No. 10-2016-0015726, filed on Feb. 11, 2016, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference for all purposes.
  • BACKGROUND
  • 1. Field
  • One or more example embodiments relate to an apparatus for bi-directional sign language/speech translation in real time and method, and more particularly, to an apparatus for bi-directional sign language/speech translation in real time and method that automatically translates from a sign to a speech or from a speech to a sign in real time to solve an issue of existing unidirectional or fragmentary sign language translation technology.
  • 2. Description of Related Art
  • According to global estimates released by the World Health Organization (WHO), as of February, 2013, more than 360 million people in the world, which is greater than the population of the United States, have hard-hearing. Further, according to statistics of e-Nation Index, as of December, 2014, about a quarter of a million people have hard-hearing in Korea. Impairment by loss of physiological function, including hearing impairment, causes physiological and functional issues and serious issues in financial, social, and emotional aspects as well. An enormous social cost is required to solve above issues. Accordingly, media regulation organizations of all over the world start to deal with hearing impairment as an issue of fundamental human rights to improve such issues caused by hearing impairment, and provide three major types of services.
  • A subtitle service includes closed captioning (CC), subtitles for hard of hearing (HoH), and subtitles for deaf and hard of hearing (SDF) which all help a hearing-impaired person not to experience alienation in everyday life. The subtitle service is classified into two types, one that adds subtitles and dubbings in various languages to video contents, and the other that changes a speech into a real-time subtitle by a text interpreter and provides the real-time subtitle to a hearing-impaired person through a predetermined terminal via a server. The subtitle service may be applicable to broadcasts or performances where information is transferred in one direction, but may be difficult to apply to a service requiring bidirectional information exchange, such as everyday life. Further, the subtitle service provides pre-processed subtitle data for provided contents, and thus a real-time service may not be guaranteed.
  • A video relay service allows a hearing-impaired person to transfer sign language information via a sign language interpreter being connected through a voice over Internet protocol (VoIP) video call service, the sign language interpreter to transfer a speech to a person who is not hearing-impaired and also to provide the interpretation service in a reverse order. However, the video relay service has restrictions in that connection with the sign language interpreter over a network is needed to receive the service, and a relatively long time is required to translate a sign into a speech or a speech into a sign via the sign language interpreter, and thus is difficult to apply to everyday conversions.
  • A gesture-based sign language translation service is a technology that recognizes a sign, for example, gesture, of a hearing-impaired person and converts the sign into a speech, and changes a speech of a person who is not hearing-impaired into a sign, thereby alleviating an inconvenience of the hearing-impaired person. However, the gesture-based sign language translation may not provide simultaneous bidirectional sign-speech translation and situation-based sign language translation, and thus the speed and accuracy of sign language translation may decrease.
  • Accordingly, a method for real-time automatic speech-sign translation is provided herein to alleviate inconveniences that people with hard-hearing experience in everyday life.
  • SUMMARY
  • An aspect provides an apparatus for bi-directional sign language/speech translation in real time and method that may automatically translate a sign into a speech or a speech into a sign using a body-attached device, for example, smart glasses, to solve existing issues and alleviate an inconvenience of a hearing-impaired person in everyday life.
  • Another aspect also provides a gesture-based apparatus for bi-directional sign language/speech translation in real time and method that may translate a sign to a speech, translate a speech to a sign through a separate process, and collect/analyze location and surrounding environment information of a user, thereby improving the speed and accuracy of sign language translation.
  • According to an aspect, there is provided an apparatus for bi-directional sign language/speech translation in real time including a pattern analyzer configured to analyze a used pattern of sign language by user, a speech-sign outputter configured to recognize a speech externally made through a microphone and output a sign corresponding to the speech, and a sign-speech outputter configured to recognize a sign sensed through a camera and output a speech corresponding to the sign.
  • The speech-sign outputter may include a speech recognizer configured to recognize the speech through the microphone and remove noise from the speech, an index generator configured to generate a sign index to translate into a sign corresponding to the noise-removed speech, and a sign outputter configured to output the sign corresponding to the speech based on the generated sign index.
  • The index generator may include a speech-text converter configured to convert the recognized speech into a text using a predefined sign language dictionary, and an index determiner configured to determine a sign index with respect to the text based on the text and the used pattern of sign language by the user.
  • The pattern analyzer may be configured to analyze the used pattern of sign language by the user by analyzing at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user.
  • The sign outputter may include a display mode controller configured to control an output based on a display mode to display one of a sign and a text corresponding to the recognized speech, and an outputter configured to output a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
  • The display mode controller may be configured to control the display mode of the outputter based on a sign display event or a generation period of the sign mapped to the sign index.
  • The outputter may be configured to synchronize a sign or a text to the sign index based on information transferred based on the display mode and output the synchronized sign or text on a display of the apparatus for bi-directional sign language/speech translation in real time.
  • The sign-speech outputter may include a sign recognizer configured to recognize the sign sensed through the camera, an index generator configured to generate a sign index to translate into the speech corresponding to the recognized sign, and a speech outputter configured to output the speech corresponding to the sign based on the generated sign index.
  • The sign recognizer may include a wearing sensor configured to sense whether the user is wearing gesture gloves, and a sign collector configured to collect the sign based on whether the user is wearing the gesture gloves.
  • The sign collector may be configured to collect the sign by removing a background from a color image acquired by the camera when the user is not wearing the gesture gloves.
  • The sign collector may be configured to collect the sign based on a finger motion of the user collected from the camera when the user is wearing the gesture gloves.
  • The index generator may include an index determiner configured to determine the sign index with respect to the recognized sign using a predefined sign language dictionary, a sign-text converter configured to convert the recognized sign to a text based on the determined sign index, and a sentence generator configured to generate a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
  • The speech outputter may be configured to output a speech corresponding to the sentence associated with the sign with respect to the text.
  • According to another aspect, there is also provided a method for bi-directional sign language/speech translation in real time performed by an apparatus for bi-directional sign language/speech translation in real time, the method including analyzing a used pattern of sign language by user who uses the apparatus for bi-directional sign language/speech translation in real time, recognizing a sign or speech externally made by the user through a camera or a microphone, identifying the sign or speech of the user recognized through the camera or the microphone, and outputting a speech corresponding to the sign or a sign corresponding to the speech through a different translation path based on a result of the identifying.
  • The outputting may include removing noise from the speech when the speech of the user is identified, generating a sign index to translate into a sign corresponding to the recognized speech, and outputting the sign corresponding to the speech based on the generated sign index.
  • The generating may include converting the recognized speech into a text using a predefined sign language dictionary, and determining a sign index with respect to the text based on the text and the used pattern of sign language by the user.
  • The outputting of the sign corresponding to the speech may include controlling whether to display a sign or a text based on a display mode to display one of a sign and a text corresponding to the recognized speech, and outputting a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
  • The outputting of the speech corresponding to the sign may include recognizing the sign when the sign of the user is identified, generating an sign index to translate into the speech corresponding to the recognized sign, and outputting the speech corresponding to the sign based on the generated sign index.
  • The recognizing may include sensing whether the user is wearing gesture gloves, and collecting the sign based on whether the user is wearing the gesture gloves.
  • The generating may include determining the sign index with respect to the recognized sign using a predefined sign language dictionary, converting the recognized sign to a text based on the determined sign index, and generating a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
  • Additional aspects of example embodiments will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and/or other aspects, features, and advantages of the invention will become apparent and more readily appreciated from the following description of example embodiments, taken in conjunction with the accompanying drawings of which:
  • FIG. 1 is a diagram illustrating an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment;
  • FIG. 2 is a block diagram illustrating a speech-sign outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment;
  • FIG. 3 is a block diagram illustrating a sign-speech outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment;
  • FIG. 4 is a block diagram illustrating an operation of analyzing a used pattern of sign language by user according to an example embodiment; and
  • FIG. 5 is a diagram illustrating an operation of an apparatus for bi-directional sign language/speech translation in real time interoperating with a network according to an example embodiment.
  • DETAILED DESCRIPTION
  • Hereinafter, some example embodiments will be described in detail with reference to the accompanying drawings. Regarding the reference numerals assigned to the elements in the drawings, it should be noted that the same elements will be designated by the same reference numerals, wherever possible, even though they are shown in different drawings. Also, in the description of embodiments, detailed description of well-known related structures or functions will be omitted when it is deemed that such description will cause ambiguous interpretation of the present disclosure.
  • FIG. 1 is a diagram illustrating an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • Referring to FIG. 1, an apparatus 101 for bi-directional sign language/speech translation in real time may translate a speech into a sign or a sign into a speech in real time, and output a result of translation to provide a convenience to a user who uses a sign language. Here, the apparatus 101 for bi-directional sign language/speech translation in real time may be an apparatus to perform sign-speech translation with a head-mounted display (HMD). For example, the apparatus 101 for bi-directional sign language/speech translation in real time may include a smart terminal.
  • In detail, the apparatus 101 for bi-directional sign language/speech translation in real time may recognize a speech or sign externally made by a user through a microphone 106 or a camera 107. The apparatus 101 for bi-directional sign language/speech translation in real time may identify the recognized speech or sign of the user, and translate the speech into a sign or the sign into a speech based on a result of the identifying. In this example, the apparatus 101 for bi-directional sign language/speech translation in real time may perform translation through a different translation path based on the result of the identifying.
  • To achieve the foregoing, the apparatus 101 for bi-directional sign language/speech translation in real time may include a pattern analyzer 102, a speech-sign outputter 103, and a sign-speech outputter 104. When a speech of the user is recognized, the speech-sign outputter 103 may translate the speech into a sign, and output a result of translation, for example, the sign. Conversely, when a sign of the user is recognized, the sign-speech outputter 104 may translate the sign into a speech, and output a result of translation, for example, the speech. In detail, the apparatus 101 for bi-directional sign language/speech translation in real time may perform duplex translation from a speech into a sign or from a sign into a speech by separately performing a process of translating a speech into a sign and a process of translating a sign into a speech. The operation of translating a speech into a sign and the operation of translating a sign into a speech will be described in detail with reference to FIGS. 2 and 3, respectively.
  • The apparatus 101 for bi-directional sign language/speech translation in real time may use a body-attached device 105 to recognize the speech or sign externally made by the user. The apparatus 101 for bi-directional sign language/speech translation in real time may interoperate with the body-attached device 105, or operate in the body-attached device 105 depending on situations. In detail, the apparatus 101 for bi-directional sign language/speech translation in real time may be configured separately from the body-attached device 105 to translate a speech into a sign or a sign into a speech by interoperating with the body-attached device 105. In another example, the apparatus 101 for bi-directional sign language/speech translation in real time may be configured to be included in the body-attached device 105, in detail, to operate in the body-attached device 105, to translate a speech into a sign or a sign into a speech in real time.
  • The body-attached device 105 may include a microphone, a speaker, a display device, and a camera, and may be implemented in a wearable form to be attached to a body of the user. For example, the body-attached device 105 may be implemented as a device attachable to a body of the user, for example, an eyewear type device or a watch type device.
  • The pattern analyzer 102 may analyze an used pattern of sign language of the user to improve the accuracy and speed of real-time speech-sign or sign-speech translation. In detail, the pattern analyzer 102 may analyze the used pattern of sign language by the user. The used pattern of sign language by the user may include, for example, location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, and a behavior pattern of the user. The apparatus 101 for bi-directional sign language/speech translation in real time may translate a speech or a sign based on the analyzed sign use pattern of the user, thereby minimizing unnecessary sign translation and improving the accuracy and speed of translation.
  • The apparatus 101 for bi-directional sign language/speech translation in real time may predict information to be used for sign-speech translation by analyzing the life pattern of the user and inputting/analyzing the location information and the surrounding environment information, thereby guaranteeing the accuracy of translation content and real-time sign-speech translation. A configuration for the foregoing will be described in detail with reference to FIG. 4.
  • The apparatus 101 for bi-directional sign language/speech translation in real time may perform duplex sign-speech and speech-sign translation in real time with the body-attached device 105 to solve an issue of unidirectional or fragmentary sign language translation technology.
  • Further, the apparatus 101 for bi-directional sign language/speech translation in real time may include a translation path for sign-speech translation and a translation path for speech-sign translation separately, thereby alleviating an inconvenience in the existing unidirectional or fragmentary sign language translation technology.
  • FIG. 2 is a block diagram illustrating a speech-sign outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • Referring to FIG. 2, an apparatus 201 for bi-directional sign language/speech translation in real time may include a pattern analyzer 202, a speech-sign outputter 203, and a sign-speech outputter 204. The speech-sign outputter 203 may translate a speech into a sign, and output a result of translation, for example, the sign. The sign-speech outputter 204 may translate a sign into a speech, and output a result of translation, for example, the speech. Hereinafter, a process of translating a speech into a sign and outputting a result of translation, for example, the sign, will be described in detail based on the speech-sign outputter 203.
  • In detail, the pattern analyzer 202 may analyze a used pattern of sign language by user who is wearing the apparatus 201 for bi-directional sign language/speech translation in real time. The pattern analyzer 202 may analyze the used pattern of sign language including at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user. The configuration of the pattern analyzer 202 will be described in detail with reference to FIG. 4.
  • The speech-sign outputter 203 may include a speech recognizer 205, an index generator 206, and a sign outputter 209 to perform an operation of translating a speech into a sign and outputting the sign.
  • The speech recognizer 205 may recognize a speech through a microphone. Here, the speech recognizer 205 may recognize the speech collected through the microphone included in a body-attached device. In detail, the body-attached device may collect a speech externally made through the microphone, and transfer the collected speech to the speech recognizer 205. The speech recognizer 205 may recognize the speech received from the body-attached device. The speech recognizer 205 may remove noise from the recognized speech.
  • Here, the speech recognized through the microphone may be a sound externally made, and may include a sound for speech-sign translation and ambient noise. Thus, the speech recognizer 205 may remove the noise included in the speech recognized through the microphone to extract only the speech for speech-sign translation. In an example, the speech recognizer 205 may remove the ambient noise included in the speech. Here, the ambient noise may include all sounds occurring around the user, for example, a subway sound, an automobile horn sound, a step sound, and a music sound.
  • The speech recognizer 205 may separate a speech of a user other than the user who requests speech-sign translation from the noise-removed speech. In detail, the speech recognized through the microphone may include ambient noise and a speech of a third party located adjacent to the user, as described above. Thus, the speech recognizer 205 may separate the speech of the third party, except for the user, from the noise-removed sound, thereby increasing the speech recognition accuracy for speech-sign translation.
  • The speech recognizer 205 may generate a speech including only an intrinsic sound of the user who requests translation, by filtering out the ambient noise and the speech of the third party in the speech recognized through the microphone.
  • The index generator 206 may generate a sign index to translate into the sign corresponding to the speech from which the noise and the speech of the third party are removed. Here, the sign index may be information to be used to generate an image associated with the sign based on the speech of the user. The index generator 206 may include a speech-text converter 207, and an index determiner 208.
  • The speech-text converter 207 may convert the recognized speech into a text using a predefined sign language dictionary. In detail, the speech-text converter 207 may perform an operation of converting the speech into the text using a text-to-speech (TTS) engine. In this example, the speech may be converted into the text and a sign corresponding to the text may be output since a worldwide sign language dictionary is defined based on a text, and the speech may be converted into the text to transfer, to the user, an image associated with the sign and information with respect to the speech using the text. Further, the sign language dictionary may be used to minimize an amount of data to be transmitted to translate a speech into a sign, thereby improving the speed at which an image associated with the sign is generated.
  • The index determiner 208 may determine a sign index with respect to the text based on the text and the used pattern of sign language by the user. Here, the index determiner 208 may utilize the text and the used pattern of sign language in which the location information, the surrounding environment information, and the life pattern of the user are analyzed, thereby improving the speed and accuracy for sign language translation. The index determiner 208 may determine the sign index to generate the image associated with the sign through a pre-embedded sign language dictionary based on the speech corresponding to the text. Here, the index determiner 208 may determine the sign index corresponding to the speech based on more generalized information by determining a speech-text-sign index based on the sign language dictionary.
  • The sign outputter 209 may receive the sign index form the index generator 206, and output the sign corresponding to the speech based on the received sign index. The sign outputter 209 may provide a sign or text corresponding to content of the speech to the user based on the sign index generated based on the speech. The speech outputter 209 may include a display mode controller 210, and an outputter 211.
  • The display mode controller 210 may control an output based on a display mode to display one of the sign and the text corresponding to the recognized speech. Here, the display mode may include a sign display mode and a text display mode. The display mode controller 210 may select the display mode to display one of the sign and the text corresponding to the speech. In this example, the display mode controller 210 may control the display mode of the outputter 211 based on a sign display event or a generation period of a sign mapped to the sign index.
  • The display mode controller 210 may transfer, to the outputter 211, the text or the image associated with the sign corresponding to the sign index based on the selected display mode.
  • The outputter 211 may output the sign mapped to the generated sign index or the text corresponding to the sign mapped to the generated sign index based on the information received based on the display mode selected by the display mode controller 210. In detail, the outputter 211 may represent a speech using a sign which is convenient for a hearing-impaired person, and also display the speech using a text which has a relatively wide expression range when compared to a sign.
  • The outputter 211 may display the sign or the text corresponding to the speech in view of information expression limitation occurring when the image associated with the sign is output to the user. In detail, the outputter 211 may display the sign translated from the speech through a display of the body-attached device interoperating with the apparatus 201 for bi-directional sign language/speech translation in real time. The outputter 211 may display the image associated with the sign corresponding to the sign index on smart glasses. In this example, the smart glasses are disposed adjacent to eyes of the user. When the image associated with the sign is output, the user may have difficulty in immediately recognizing the image.
  • Thus, the outputter 211 may represent the speech corresponding to the sign index using the text, and output the text on the smart glasses. In a case in which the speech is represented using the text, the operation of generating the image associated with the sign based on the sign index may be omitted. Thus, information may be transferred faster than the process of transferring information using the image associated with the sign.
  • Further, the outputter 211 may synchronize the sign or the text with the sign index, thereby alleviating a user inconvenience occurring in the sign language translation process.
  • FIG. 3 is a block diagram illustrating a sign-speech outputter in an apparatus for bi-directional sign language/speech translation in real time according to an example embodiment.
  • Referring to FIG. 3, an apparatus 301 for bi-directional sign language/speech translation in real time may include a pattern analyzer 302, a speech-sign outputter 303, and a sign-speech outputter 304. The speech-sign outputter 303 may translate a speech into a sign, and output a result of translation, for example, the sign. The sign-speech outputter 304 may translate a sign into a speech, and output a result of translation, for example, the speech. Hereinafter, a process of translating a sign into a speech and outputting a result of translation, for example, the speech, will be described in detail based on the sign-speech outputter 304.
  • In detail, the pattern analyzer 302 may analyze a used pattern of sign language by user who is wearing the apparatus 301 for bi-directional sign language/speech translation in real time. The pattern analyzer 302 may analyze the used pattern of sign language including at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user. The configuration of the pattern analyzer 302 will be described in detail with reference to FIG. 4.
  • The sign-speech outputter 304 may include a sign recognizer 305, an index generator 308, and a speech outputter 312 to perform an operation of translating a sign into a speech and outputting the speech.
  • The sign recognizer 305 may recognize a sign through a camera. Here, the sign recognizer 305 may recognize the sign collected through the camera included in a body-attached device. In detail, the body-attached device may collect a sign externally sensed through the camera, and transfer the collected sign to the sign recognizer 305. The sign recognizer 305 may recognize the sign received from the body-attached device.
  • Here, the sign recognizer 305 may collect the sign from a color image or based on a finger motion, for example, a gesture, of the user depending on a sign collecting scheme. In detail, the sign recognizer 305 may recognize the sign using the color image or the gesture based on whether gesture gloves configured to directly generate a sign are used. Here, the gesture gloves may be a device worn on hands of the user to enable the camera to recognize the gesture of the user. For example, the gesture gloves may include Magic Gloves.
  • To achieve the foregoing, the sign recognizer 305 may include a wearing sensor 306, and a sign collector 307. The wearing sensor 306 may sense whether the user is wearing the gesture gloves. In this example, the wearing sensor 306 may sense whether the user is wearing the gesture gloves configured to input sign information, to control an operating state of a sign language translation apparatus to recognize a sign.
  • In detail, in a case in which the user is wearing the gesture gloves to generate sign information and a sign is extracted from the color image, a system overhead may increase, and the speed and accuracy of translation may decrease. Thus, to solve such issues, whether the user is wearing the gesture gloves may be sensed to prevent extraction of a sign from the color image when the user is wearing the gesture gloves.
  • The sign collector 307 may collect the sign based on whether the user is wearing the gesture gloves. In detail, in a case in which the user is not wearing the gesture gloves, the sign collector 307 may collect the sign by removing a background from the color image acquired by the camera. The sign collector 307 may receive the color image from a depth sensor or an RGB camera. The sign collector 307 may remove unnecessary information which is unrelated to the sign, for example, the background, from the color image. That is, in the case in which the user is not wearing the gesture gloves, the sign collector 307 may collect the sign by extracting information related to the sign from the color image.
  • Conversely, in a case in which the user is wearing the gesture gloves, the sign collector 307 may collect the sign based on the finger motion of the user collected from the camera. In detail, the sign collector 307 may receive information related to a hand motion or a finger motion of a human using a device such as Magic Gloves. That is, the sign collector 307 may directly collect a gesture of the user as the information related to the sign using the gesture gloves in the case in which the user is wearing the gesture gloves.
  • The index generator 308 may generate a sign index to translate into the speech corresponding to the recognized sign. To achieve the foregoing, the index generator 308 may include an index determiner 309, a sign-text converter 310, and a sentence generator 311.
  • The index determiner 309 may determine the sign index with respect to the recognized sign using a predefined sign language dictionary. The index determiner 309 may recognize the sign of the user, and determine the sign index to generate a text using the sign language dictionary stored in a form of a database.
  • The sign-text converter 310 may convert the sign into a text based on the determined sign index.
  • The sentence generator 311 may generate a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user. Here, the sign may have a relatively narrow expression range when compared to a method of expressing using a speech or a text, and thus there is a limitation to expression using a sentence. Thus, to solve such an issue, a function to automatically generate a sentence based on a text may be provided.
  • In detail, the sentence generator 311 may generate the sentence associated with the sign by performing a keyword combination with everyday conversation sentences stored through the pattern analyzer 302 based on the text generated by the sign-text converter 310. In this example, the sentence generator 311 may predict information to be used for sign-speech translation by analyzing a life pattern of the user, and inputting/analyzing location information and surrounding environment information, thereby guaranteeing the accuracy of translation content and real-time sign-speech translation.
  • The speech outputter 312 may convert the sentence generated by the sentence generator 311 into a speech, and transfer the speech to the user. In a case of keyword-based operation, the speech outputter 312 may apply a sentence-speech conversion method such as a TTS engine to output the speech corresponding to the sentence. The speech outputter 312 may convert a digital speech generated by the sentence-speech conversion method into an analog speech through digital to audio (D/A) conversion, and output the analog speech to the user.
  • Here, smart glasses may be used as a body-attached device to transfer translated sign information to the user. In this example, since the smart glasses are a device to be attached to a body, the smart glasses may have a display of a restricted size to transfer sign information. Further, when a text and a sign are projected simultaneously on the display of the smart glasses, the user may experience confusion and fatigue rapidly.
  • Thus, to solve such issues, a type of information to be projected on the smart glasses, for example, a text or a sign, may be selected based on a user input, for example, a sign or a speech, and a result may be output accordingly.
  • Also, user convenience-related information, for example, sign information generation speed and period that the user may feel most comfortable with, may be set, and a result may be output accordingly.
  • FIG. 4 is a block diagram illustrating an operation of analyzing a used pattern of sign language by user according to an example embodiment.
  • Referring to FIG. 4, the pattern analyzer 102 included in the apparatus 101 for bi-directional sign language/speech translation in real time may analyze a used pattern of sign language by user. In general, the user is likely to be in a similar space at a similar time. Herein, the used pattern of sign language by the user may be analyzed in view of spatial/temporal correlation of the user.
  • In detail, the pattern analyzer 102 may infer a life pattern to be observed in the near future by analyzing a past life pattern of the user. The pattern analyzer 102 may prepare information to be used for sign translation based on a result of inference in advance, thereby increasing the speed of sign translation and guaranteeing real-time sign translation. Further, the pattern analyzer 102 may increase the accuracy of sign translation by correcting an uncertain speech signal based on related information.
  • To achieve the foregoing, the pattern analyzer 102 may include a time information collector 401, a life pattern analyzer 402, a location information collector 403, a surrounding environment information collector 404, a surrounding environment information analyzer 405, a sign category information generator 406, a sign category keyword comparator 407, and a sign category database (DB) 408.
  • The life pattern analyzer 402 may accumulate and manage time information input through the time information collector 401 and current location information of the user input through the location information collector 403. The life pattern analyzer 402 may analyze a past behavior pattern of the user based on accumulated information of the time information and the current location information of the user, and infer a life pattern expected to be observed in the near future based on the past behavior pattern.
  • The surrounding environment information analyzer 405 may infer a type of a current space in which the user is located by analyzing image and sound information of the current space collected from the surrounding environment information collector 404. For example, in a case in which “coffee” is extracted as a keyword by analyzing the image and sound information input from the surrounding environment information collector 404, the surrounding environment information analyzer 405 may infer that the current space corresponds to a café, a coffee shop, or a teahouse.
  • The sign category information generator 406 may extract a sign information category to be used in the near future by analyzing the past life pattern and the surrounding environment information of the user analyzed by the life pattern analyzer 402 and the surrounding environment information analyzer 405. In detail, in a case in which the user ordered a coffee at a café at the same time in the past and the current surrounding environment information indicates that the user orders a coffee, the sign category information generator 406 may generate a category of ordering a coffee at a café. The generated category information may be transmitted to the sign outputter 209 of FIG. 2 or the speech outputter 312 of FIG. 3 through the sign category DB 408, converted into a speech, a text, or a sign, and transferred to the user or a speaker.
  • The sign category keyword comparator 407 may perform an operation of solving an issue resulting from an error in a sign category. In detail, in a case in which the user does not show the same behavior pattern all the time and the analyzed surrounding environment information is unclear, a sign category to be inferred may include an error. To solve the issue, the sign category keyword comparator 407 may be provided.
  • The sign category keyword comparator 407 may compare the information input from the speech-text converter 207 of FIG. 2 or the sign-text converter 310 of FIG. 3 with the inferred sign category information. In a case in which the input information does not match the inferred sign category, the sign category keyword comparator 407 may determine the inferred sign category to be incorrect. The sign category keyword comparator 407 may transmit a signal related to the determination to the sign outputter 209 or the speech outputter 312, and block a speech, a text, or a sign image to be output from the sign category DB 408.
  • The sign outputter 209 or the speech outputter 312 receiving the signal related to the determination may block the information received from the sign category DB 408, and output information related to at least one of the speech, the text, or the sign.
  • Here, the information collected from the location information collector 403 and the surrounding environment information collector 404 may be reported continuously through a network connected to the apparatus 101 for bi-directional sign language/speech translation in real time.
  • FIG. 5 is a diagram illustrating an operation of an apparatus for bi-directional sign language/speech translation in real time connected to a network according to an example embodiment.
  • Referring to FIG. 5, an apparatus 502 for bi-directional sign language/speech translation in real time may be connected to a network to share information with a user behavior analyzing server 501 which manages image and sound information of a space in which a user is currently located and current location information of the user.
  • In detail, to perform a method for bi-directional sign language/speech translation in real time, the user behavior analyzing server 501, the apparatus 502 for bi-directional sign language/speech translation in real time, smart glasses 503, and gesture gloves 504 may be provided. The user behavior analyzing server 501, the apparatus 502 for bi-directional sign language/speech translation in real time, the smart glasses 503, and the gesture gloves 504 may be connected to one another through a personal area network (PAN). The apparatus 502 for bi-directional sign language/speech translation in real time, for example, a smart terminal, may form a body area network (BAN) or a PAN with the smart glasses 503 and the gesture gloves 504, for example, body-attached devices. The apparatus 502 for bi-directional sign language/speech translation in real time may be connected to the user behavior analyzing server 501 through the Internet, for example, wired/wireless network.
  • The user behavior analyzing server 501 may analyze a past behavior pattern of the user, and transmit the analyzed information to the apparatus 502 for bi-directional sign language/speech translation in real time, for example, the smart terminal. Information collected by the location information collector 403 and the surrounding environment information collector 404 of FIG. 4 may be retained in the apparatus 502 for bi-directional sign language/speech translation in real time, for example, the smart terminal, and also transmitted to the user behavior analyzing server 501. In this example, the apparatus 502 for bi-directional sign language/speech translation in real time may not store and analyze a large volume of data due to limited hardware resources such as a memory and a processor.
  • Thus, a long-term user past behavior analysis may be performed by the user behavior analyzing server 501, and a short-term user past behavior analysis may be performed by the apparatus 502 for bi-directional sign language/speech translation in real time. Past behavior information analyzed by the user behavior analyzing server 501 may vary depending on user settings. However, it may be basically set to transfer user past behavior analysis information corresponding to a day to the apparatus 502 for bi-directional sign language/speech translation in real time.
  • According to one or more example embodiments, an apparatus for bi-directional sign language/speech translation in real time and method may alleviate an inconvenience that a hearing-impaired person experiences during communication in everyday life, thereby inducing the hearing-impaired person to live a normal social life and reducing a social cost to be used to solve issues caused by hearing impairment.
  • According to one or more example embodiments, an apparatus for bi-directional sign language/speech translation in real time and method may be applicable to a wide range of environments where normal speech communication is impossible, for example, an environment where military operations are carried out in silence, or an environment in which communication is impossible due to serious noise.
  • The methods according to the above-described example embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations of the above-described example embodiments. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of example embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts.
  • A number of example embodiments have been described above. Nevertheless, it should be understood that various modifications may be made to these example embodiments. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.

Claims (20)

What is claimed is:
1. An apparatus for bi-directional sign language/speech translation in real time comprising:
a pattern analyzer configured to analyze a used pattern of sign language by user
a speech-sign outputter configured to recognize a speech externally made through a microphone and output a sign corresponding to the speech; and
a sign-speech outputter configured to recognize a sign sensed through a camera and output a speech corresponding to the sign.
2. The apparatus for bi-directional sign language/speech translation in real time of claim 1, wherein the speech-sign outputter comprises:
a speech recognizer configured to recognize the speech through the microphone and remove noise from the speech;
an index generator configured to generate a sign index to translate into a sign corresponding to the noise-removed speech; and
a sign outputter configured to output the sign corresponding to the speech based on the generated sign index.
3. The apparatus for bi-directional sign language/speech translation in real time of claim 2, wherein the index generator comprises:
a speech-text converter configured to convert the recognized speech into a text using a predefined sign language dictionary; and
an index determiner configured to determine a sign index with respect to the text based on the text and the used pattern of sing language by the user.
4. The apparatus for bi-directional sign language/speech translation in real time of claim 1, wherein the pattern analyzer is configured to analyze the used pattern of sign language by the user by analyzing at least one of location information of the user, surrounding environment information of the user corresponding to the location information, a life pattern of the user, or a behavior pattern of the user.
5. The apparatus for bi-directional sign language/speech translation in real time of claim 2, wherein the sign outputter comprises:
a display mode controller configured to control an output based on a display mode to display one of a sign and a text corresponding to the recognized speech; and
an outputter configured to output a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
6. The apparatus for bi-directional sign language/speech translation in real time of claim 5, wherein the display mode controller is configured to control the display mode of the outputter based on a sign display event or a generation period of the sign mapped to the sign index.
7. The apparatus for bi-directional sign language/speech translation in real time of claim 6, wherein the outputter is configured to synchronize a sign or a text to the sign index based on information transferred based on the display mode and output the synchronized sign or text on a display of the apparatus for bi-directional sign language/speech translation in real time.
8. The apparatus for bi-directional sign language/speech translation in real time of claim 1, wherein the sign-speech outputter comprises:
a sign recognizer configured to recognize the sign sensed through the camera;
an index generator configured to generate a sign index to translate into the speech corresponding to the recognized sign; and
a speech outputter configured to output the speech corresponding to the sign based on the generated sign index.
9. The apparatus for bi-directional sign language/speech translation in real time of claim 8, wherein the sign recognizer comprises:
a wearing sensor configured to sense whether the user is wearing gesture gloves; and
a sign collector configured to collect the sign based on whether the user is wearing the gesture gloves.
10. The apparatus for bi-directional sign language/speech translation in real time of claim 9, wherein the sign collector is configured to collect the sign by removing a background from a color image acquired by the camera when the user is not wearing the gesture gloves.
11. The apparatus for bi-directional sign language/speech translation in real time of claim 9, wherein the sign collector is configured to collect the sign based on a finger motion of the user collected from the camera when the user is wearing the gesture gloves.
12. The apparatus for bi-directional sign language/speech translation in real time of claim 8, wherein the index generator comprises:
an index determiner configured to determine the sign index with respect to the recognized sign using a predefined sign language dictionary;
a sign-text converter configured to convert the recognized sign to a text based on the determined sign index; and
a sentence generator configured to generate a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
13. The apparatus for bi-directional sign language/speech translation in real time of claim 11, wherein the speech outputter is configured to output a speech corresponding to the sentence associated with the sign with respect to the text.
14. A method for bi-directional sign language/speech translation in real time performed by an apparatus for bi-directional sign language/speech translation in real time, the method comprising:
analyzing a used pattern of sign language by user who uses the apparatus for bi-directional sign language/speech translation in real time;
recognizing a sign or speech externally made by the user through a camera or a microphone;
identifying the sign or speech of the user recognized through the camera or the microphone; and
outputting a speech corresponding to the sign or a sign corresponding to the speech through a different translation path based on a result of the identifying.
15. The method for bi-directional sign language/speech translation in real time of claim 14, wherein the outputting comprises:
removing noise from the speech when the speech of the user is identified;
generating a sign index to translate into a sign corresponding to the recognized speech; and
outputting the sign corresponding to the speech based on the generated sign index.
16. The method for bi-directional sign language/speech translation in real time of claim 15, wherein the generating comprises:
converting the recognized speech into a text using a predefined sign language dictionary; and
determining a sign index with respect to the text based on the text and the used pattern of sign language by the user.
17. The method for bi-directional sign language/speech translation in real time of claim 15, wherein the outputting of the sign corresponding to the speech comprises:
controlling whether to display a sign or a text based on a display mode to display one of a sign and a text corresponding to the recognized speech; and
outputting a sign mapped to the generated sign index or a text corresponding to the sign mapped to the generated sign index based on the display mode.
18. The method for bi-directional sign language/speech translation in real time of claim 14, wherein the outputting of the speech corresponding to the sign comprises:
recognizing the sign when the sign of the user is identified;
generating an sign index to translate into the speech corresponding to the recognized sign; and
outputting the speech corresponding to the sign based on the generated sign index.
19. The method for bi-directional sign language/speech translation in real time of claim 18, wherein the recognizing comprises:
sensing whether the user is wearing gesture gloves; and
collecting the sign based on whether the user is wearing the gesture gloves.
20. The method for bi-directional sign language/speech translation in real time of claim 18, wherein the generating comprises:
determining the sign index with respect to the recognized sign using a predefined sign language dictionary;
converting the recognized sign to a text based on the determined sign index; and
generating a sentence associated with the sign with respect to the text through a keyword combination corresponding to the text and the used pattern of sign language by the user.
US15/188,099 2016-02-11 2016-06-21 Apparatus for bi-directional sign language/speech translation in real time and method Active 2036-08-11 US10089901B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2016-0015726 2016-02-11
KR1020160015726A KR102450803B1 (en) 2016-02-11 2016-02-11 Duplex sign language translation apparatus and the apparatus for performing the duplex sign language translation method

Publications (2)

Publication Number Publication Date
US20170236450A1 true US20170236450A1 (en) 2017-08-17
US10089901B2 US10089901B2 (en) 2018-10-02

Family

ID=59561676

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/188,099 Active 2036-08-11 US10089901B2 (en) 2016-02-11 2016-06-21 Apparatus for bi-directional sign language/speech translation in real time and method

Country Status (2)

Country Link
US (1) US10089901B2 (en)
KR (1) KR102450803B1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075659A1 (en) * 2016-09-13 2018-03-15 Magic Leap, Inc. Sensory eyewear
US20180293986A1 (en) * 2017-04-11 2018-10-11 Sharat Chandra Musham Worn device for conversational-speed multimedial translation between two individuals and verbal representation for its wearer
CN108766434A (en) * 2018-05-11 2018-11-06 东北大学 A kind of Sign Language Recognition translation system and method
WO2019157344A1 (en) * 2018-02-12 2019-08-15 Avodah Labs, Inc. Real-time gesture recognition method and apparatus
US20190315227A1 (en) * 2018-04-17 2019-10-17 Hyundai Motor Company Vehicle including communication system for disabled person and control method of communication system for disabled person
US10521264B2 (en) 2018-02-12 2019-12-31 Avodah, Inc. Data processing architecture for improved data flow
US10599921B2 (en) 2018-02-12 2020-03-24 Avodah, Inc. Visual language interpretation system and user interface
US10776617B2 (en) * 2019-02-15 2020-09-15 Bank Of America Corporation Sign-language automated teller machine
US10902219B2 (en) 2018-11-21 2021-01-26 Accenture Global Solutions Limited Natural language processing based sign language generation
US20210043110A1 (en) * 2019-08-06 2021-02-11 Korea Electronics Technology Institute Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner
CN112861827A (en) * 2021-04-08 2021-05-28 中国科学技术大学 Sign language translation method and system using single language material translation
CN112906498A (en) * 2021-01-29 2021-06-04 中国科学技术大学 Sign language action recognition method and device
US11036973B2 (en) 2018-02-12 2021-06-15 Avodah, Inc. Visual sign language translation training device and method
US11087488B2 (en) 2018-02-12 2021-08-10 Avodah, Inc. Automated gesture identification using neural networks
CN113378586A (en) * 2021-07-15 2021-09-10 北京有竹居网络技术有限公司 Speech translation method, translation model training method, device, medium, and apparatus
CN113780013A (en) * 2021-07-30 2021-12-10 阿里巴巴(中国)有限公司 Translation method, translation equipment and readable medium
US20220327294A1 (en) * 2021-12-24 2022-10-13 Sandeep Dhawan Real-time speech-to-speech generation (rssg) and sign language conversion apparatus, method and a system therefore
WO2022254432A1 (en) * 2021-06-01 2022-12-08 Livne Nimrod Yaakov A sign language translation method and system thereof
GB2616719A (en) * 2022-01-27 2023-09-20 Snaggnificent Products Inc Wireless headset and tablet sign language communication system and method
US20230306207A1 (en) * 2022-03-22 2023-09-28 Charles University, Faculty Of Mathematics And Physics Computer-Implemented Method Of Real Time Speech Translation And A Computer System For Carrying Out The Method
WO2024015352A1 (en) * 2022-07-11 2024-01-18 Lucca Ventures, Inc. Methods and systems for real-time translation

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102023356B1 (en) * 2017-12-07 2019-09-23 한국생산기술연구원 Wearable sign language translation device
KR102115551B1 (en) 2019-08-06 2020-05-26 전자부품연구원 Sign language translation apparatus using gloss and translation model learning apparatus
KR102174922B1 (en) 2019-08-06 2020-11-05 한국전자기술연구원 Interactive sign language-voice translation apparatus and voice-sign language translation apparatus reflecting user emotion and intention
KR102314710B1 (en) * 2019-12-19 2021-10-19 이우준 System sign for providing language translation service for the hearing impaired person
CN111461005B (en) * 2020-03-31 2023-11-28 腾讯科技(深圳)有限公司 Gesture recognition method and device, computer equipment and storage medium
KR102406921B1 (en) * 2020-12-04 2022-06-10 (주)딥인사이트 Optical device with Monolithic Architecture, and production method thereof
US11587362B2 (en) * 2020-12-16 2023-02-21 Lenovo (Singapore) Pte. Ltd. Techniques for determining sign language gesture partially shown in image(s)
WO2022264165A1 (en) * 2021-06-13 2022-12-22 Karnataki Aishwarya A portable assistive device for challenged individuals
KR102395410B1 (en) 2021-09-02 2022-05-10 주식회사 라젠 System and method for providing sign language avatar using non-marker
KR102399683B1 (en) 2021-09-02 2022-05-20 주식회사 라젠 System for 3D sign language learning using depth information and sign language providing method using same

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059578A1 (en) * 2006-09-06 2008-03-06 Jacob C Albertson Informing a user of gestures made by others out of the user's line of sight
US20090254868A1 (en) * 2008-04-04 2009-10-08 International Business Machine Translation of gesture responses in a virtual world
US20090306981A1 (en) * 2008-04-23 2009-12-10 Mark Cromack Systems and methods for conversation enhancement
US20100199228A1 (en) * 2009-01-30 2010-08-05 Microsoft Corporation Gesture Keyboarding
US20110289456A1 (en) * 2010-05-18 2011-11-24 Microsoft Corporation Gestures And Gesture Modifiers For Manipulating A User-Interface
US20150379896A1 (en) * 2013-12-05 2015-12-31 Boe Technology Group Co., Ltd. Intelligent eyewear and control method thereof

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659764A (en) 1993-02-25 1997-08-19 Hitachi, Ltd. Sign language generation apparatus and sign language translation apparatus
JPH11184370A (en) * 1997-04-17 1999-07-09 Matsushita Electric Ind Co Ltd Finger language information presenting device
US5974116A (en) 1998-07-02 1999-10-26 Ultratec, Inc. Personal interpreter
AU2571900A (en) * 1999-02-16 2000-09-04 Yugen Kaisha Gm&M Speech converting device and method
JP4332649B2 (en) * 1999-06-08 2009-09-16 独立行政法人情報通信研究機構 Hand shape and posture recognition device, hand shape and posture recognition method, and recording medium storing a program for executing the method
KR100348823B1 (en) * 1999-11-12 2002-08-17 황병익 Apparatus for Translating of Finger Language
US6377925B1 (en) 1999-12-16 2002-04-23 Interactive Solutions, Inc. Electronic translator for assisting communications
JP4922504B2 (en) * 2001-06-29 2012-04-25 株式会社アミテック Glove-type input device
WO2003019495A2 (en) 2001-08-31 2003-03-06 Communication Service For The Deaf Enhanced communications services for the deaf and hard of hearing
US7774194B2 (en) * 2002-08-14 2010-08-10 Raanan Liebermann Method and apparatus for seamless transition of voice and/or text into sign language
TW200417228A (en) 2002-09-17 2004-09-01 Ginganet Corp Sign language image presentation apparatus, sign language image input/output apparatus, and system for sign language translation
US7277858B1 (en) 2002-12-20 2007-10-02 Sprint Spectrum L.P. Client/server rendering of network transcoded sign language content
KR20050000785A (en) * 2003-06-24 2005-01-06 강희복 Telephone system and Method for dumb persons and deaf persons
US20050033578A1 (en) 2003-08-07 2005-02-10 Mara Zuckerman Text-to-video sign language translator
US7746986B2 (en) 2006-06-15 2010-06-29 Verizon Data Services Llc Methods and systems for a sign language graphical interpreter
US20100023314A1 (en) * 2006-08-13 2010-01-28 Jose Hernandez-Rebollar ASL Glove with 3-Axis Accelerometers
US8566075B1 (en) 2007-05-31 2013-10-22 PPR Direct Apparatuses, methods and systems for a text-to-sign language translation platform
US9282377B2 (en) * 2007-05-31 2016-03-08 iCommunicator LLC Apparatuses, methods and systems to provide translations of information into sign language or other formats
KR20100026701A (en) * 2008-09-01 2010-03-10 한국산업기술대학교산학협력단 Sign language translator and method thereof
WO2010074529A1 (en) 2008-12-24 2010-07-01 (주)인디텍코리아 Precise critical temperature indicator and manufacturing method thereof
DE102010009738A1 (en) 2010-03-01 2011-09-01 Institut für Rundfunktechnik GmbH Arrangement for translating spoken language into a sign language for the deaf
KR101130276B1 (en) 2010-03-12 2012-03-26 주식회사 써드아이 System and method for interpreting sign language
US20120078628A1 (en) * 2010-09-28 2012-03-29 Ghulman Mahmoud M Head-mounted text display system and method for the hearing impaired
KR101151865B1 (en) 2010-11-24 2012-05-31 (주)엘피케이에스 Portable communication device for auditory disabled
US20120215520A1 (en) * 2011-02-23 2012-08-23 Davis Janel R Translation System
KR20130067639A (en) 2011-12-14 2013-06-25 한국전자통신연구원 System and method for providing sign language broadcasting service
US20140028538A1 (en) * 2012-07-27 2014-01-30 Industry-Academic Cooperation Foundation, Yonsei University Finger motion recognition glove using conductive materials and method thereof
US9280972B2 (en) * 2013-05-10 2016-03-08 Microsoft Technology Licensing, Llc Speech to text conversion
US9558756B2 (en) * 2013-10-29 2017-01-31 At&T Intellectual Property I, L.P. Method and system for adjusting user speech in a communication session
KR101542130B1 (en) * 2014-01-21 2015-08-06 박삼기 Finger-language translation providing system for deaf person
US20150220512A1 (en) * 2014-02-05 2015-08-06 Marco Álvarez Heinemeyer Language interface system, method and computer readable medium
US9400924B2 (en) * 2014-05-23 2016-07-26 Industrial Technology Research Institute Object recognition method and object recognition apparatus using the same
US20160062987A1 (en) * 2014-08-26 2016-03-03 Ncr Corporation Language independent customer communications
US9672418B2 (en) * 2015-02-06 2017-06-06 King Fahd University Of Petroleum And Minerals Arabic sign language recognition using multi-sensor data fusion
US10156908B2 (en) * 2015-04-15 2018-12-18 Sony Interactive Entertainment Inc. Pinch and hold gesture navigation on a head-mounted display
EP3284019A4 (en) * 2015-04-16 2018-12-05 Robert Bosch GmbH System and method for automated sign language recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059578A1 (en) * 2006-09-06 2008-03-06 Jacob C Albertson Informing a user of gestures made by others out of the user's line of sight
US20090254868A1 (en) * 2008-04-04 2009-10-08 International Business Machine Translation of gesture responses in a virtual world
US20090306981A1 (en) * 2008-04-23 2009-12-10 Mark Cromack Systems and methods for conversation enhancement
US20100199228A1 (en) * 2009-01-30 2010-08-05 Microsoft Corporation Gesture Keyboarding
US20110289456A1 (en) * 2010-05-18 2011-11-24 Microsoft Corporation Gestures And Gesture Modifiers For Manipulating A User-Interface
US20150379896A1 (en) * 2013-12-05 2015-12-31 Boe Technology Group Co., Ltd. Intelligent eyewear and control method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Elliott, Ralph, et al. "Linguistic modelling and language-processing technologies for Avatar-based sign language presentation." Universal Access in the Information Society 6.4, February 2008, pp. 375-391. *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10580213B2 (en) * 2016-09-13 2020-03-03 Magic Leap, Inc. Systems and methods for sign language recognition
US20180075659A1 (en) * 2016-09-13 2018-03-15 Magic Leap, Inc. Sensory eyewear
US11747618B2 (en) 2016-09-13 2023-09-05 Magic Leap, Inc. Systems and methods for sign language recognition
US20240061243A1 (en) * 2016-09-13 2024-02-22 Magic Leap, Inc. Systems and methods for sign language recognition
US11410392B2 (en) 2016-09-13 2022-08-09 Magic Leap, Inc. Information display in augmented reality systems
US10769858B2 (en) 2016-09-13 2020-09-08 Magic Leap, Inc. Systems and methods for sign language recognition
US20180293986A1 (en) * 2017-04-11 2018-10-11 Sharat Chandra Musham Worn device for conversational-speed multimedial translation between two individuals and verbal representation for its wearer
US11954904B2 (en) 2018-02-12 2024-04-09 Avodah, Inc. Real-time gesture recognition method and apparatus
US10599921B2 (en) 2018-02-12 2020-03-24 Avodah, Inc. Visual language interpretation system and user interface
US10521264B2 (en) 2018-02-12 2019-12-31 Avodah, Inc. Data processing architecture for improved data flow
US11928592B2 (en) 2018-02-12 2024-03-12 Avodah, Inc. Visual sign language translation training device and method
US10956725B2 (en) 2018-02-12 2021-03-23 Avodah, Inc. Automated sign language translation and communication using multiple input and output modalities
WO2019157344A1 (en) * 2018-02-12 2019-08-15 Avodah Labs, Inc. Real-time gesture recognition method and apparatus
US11557152B2 (en) 2018-02-12 2023-01-17 Avodah, Inc. Automated sign language translation and communication using multiple input and output modalities
US11036973B2 (en) 2018-02-12 2021-06-15 Avodah, Inc. Visual sign language translation training device and method
US11087488B2 (en) 2018-02-12 2021-08-10 Avodah, Inc. Automated gesture identification using neural networks
CN110390239A (en) * 2018-04-17 2019-10-29 现代自动车株式会社 The control method of vehicle and communication system including the communication system for disabled person
US20190315227A1 (en) * 2018-04-17 2019-10-17 Hyundai Motor Company Vehicle including communication system for disabled person and control method of communication system for disabled person
US10926635B2 (en) * 2018-04-17 2021-02-23 Hyundai Motor Company Vehicle including communication system for disabled person and control method of communication system for disabled person
CN108766434A (en) * 2018-05-11 2018-11-06 东北大学 A kind of Sign Language Recognition translation system and method
AU2019253839B2 (en) * 2018-11-21 2021-06-03 Accenture Global Solutions Limited Natural language processing based sign language generation
US10902219B2 (en) 2018-11-21 2021-01-26 Accenture Global Solutions Limited Natural language processing based sign language generation
US10776617B2 (en) * 2019-02-15 2020-09-15 Bank Of America Corporation Sign-language automated teller machine
US20210043110A1 (en) * 2019-08-06 2021-02-11 Korea Electronics Technology Institute Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner
US11482134B2 (en) * 2019-08-06 2022-10-25 Korea Electronics Technology Institute Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner
CN112906498A (en) * 2021-01-29 2021-06-04 中国科学技术大学 Sign language action recognition method and device
CN112861827A (en) * 2021-04-08 2021-05-28 中国科学技术大学 Sign language translation method and system using single language material translation
WO2022254432A1 (en) * 2021-06-01 2022-12-08 Livne Nimrod Yaakov A sign language translation method and system thereof
CN113378586A (en) * 2021-07-15 2021-09-10 北京有竹居网络技术有限公司 Speech translation method, translation model training method, device, medium, and apparatus
CN113780013A (en) * 2021-07-30 2021-12-10 阿里巴巴(中国)有限公司 Translation method, translation equipment and readable medium
US11501091B2 (en) * 2021-12-24 2022-11-15 Sandeep Dhawan Real-time speech-to-speech generation (RSSG) and sign language conversion apparatus, method and a system therefore
US20220327294A1 (en) * 2021-12-24 2022-10-13 Sandeep Dhawan Real-time speech-to-speech generation (rssg) and sign language conversion apparatus, method and a system therefore
GB2616719A (en) * 2022-01-27 2023-09-20 Snaggnificent Products Inc Wireless headset and tablet sign language communication system and method
US20230306207A1 (en) * 2022-03-22 2023-09-28 Charles University, Faculty Of Mathematics And Physics Computer-Implemented Method Of Real Time Speech Translation And A Computer System For Carrying Out The Method
WO2024015352A1 (en) * 2022-07-11 2024-01-18 Lucca Ventures, Inc. Methods and systems for real-time translation

Also Published As

Publication number Publication date
KR20170094668A (en) 2017-08-21
KR102450803B1 (en) 2022-10-05
US10089901B2 (en) 2018-10-02

Similar Documents

Publication Publication Date Title
US10089901B2 (en) Apparatus for bi-directional sign language/speech translation in real time and method
US11580983B2 (en) Sign language information processing method and apparatus, electronic device and readable storage medium
EP3616050B1 (en) Apparatus and method for voice command context
US9479911B2 (en) Method and system for supporting a translation-based communication service and terminal supporting the service
KR102002979B1 (en) Leveraging head mounted displays to enable person-to-person interactions
US20180047395A1 (en) Word flow annotation
CN110326300B (en) Information processing apparatus, information processing method, and computer-readable storage medium
US20140129207A1 (en) Augmented Reality Language Translation
KR20140120560A (en) Interpretation apparatus controlling method, interpretation server controlling method, interpretation system controlling method and user terminal
US10409324B2 (en) Glass-type terminal and method of controlling the same
CN107003823A (en) Wear-type display system and head-mounted display apparatus
JPWO2013077110A1 (en) Translation apparatus, translation system, translation method and program
US20230274740A1 (en) Arbitrating between multiple potentially-responsive electronic devices
Berger et al. Prototype of a smart google glass solution for deaf (and hearing impaired) people
JP2017146672A (en) Image display device, image display method, image display program, and image display system
JP4734446B2 (en) Television receiving apparatus and television receiving method
JP6832503B2 (en) Information presentation method, information presentation program and information presentation system
KR102300589B1 (en) Sign language interpretation system
WO2017029850A1 (en) Information processing device, information processing method, and program
US20200234187A1 (en) Information processing apparatus, information processing method, and program
CN112764549B (en) Translation method, translation device, translation medium and near-to-eye display equipment
CN113851029A (en) Barrier-free communication method and device
KR102570418B1 (en) Wearable device including user behavior analysis function and object recognition method using the same
EP4350690A1 (en) Artificial intelligence device and operating method thereof
WO2023058393A1 (en) Information processing device, information processing method, and program

Legal Events

Date Code Title Description
AS Assignment

Owner name: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTIT

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JUNG, WOO SUG;KIM, HWA SUK;JEON, JUN KI;AND OTHERS;REEL/FRAME:038972/0572

Effective date: 20160511

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

AS Assignment

Owner name: KIA CORPORATION, KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE;REEL/FRAME:061731/0759

Effective date: 20221104

Owner name: HYUNDAI MOTOR COMPANY, KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE;REEL/FRAME:061731/0759

Effective date: 20221104

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY